14 Qualitative and Limited Dependent Variable Models

Binary, Ordered, Multinomial, Count, Censored, and Truncated Outcomes

Limited Dependent Variables

Binary Choice

Count Data

Censored Data

Author

Jake Anderson

Published

March 3, 2026

Modified

May 17, 2026

Abstract

This chapter is a hub for the full range of models for qualitative and limited dependent variables. It introduces the common thread, why OLS fails for non-continuous outcomes, and provides a roadmap to the sub-pages that develop each model family in detail.

Prerequisites

You should be comfortable with OLS regression, conditional expectation, and the idea of maximum likelihood estimation before reading this chapter.

Everything we’ve done so far assumes the dependent variable is continuous and unbounded. But many economic outcomes aren’t like that:

People choose whether to work or not (binary)
Students pick a college major (unordered categorical)
Credit agencies assign bond ratings (ordered categorical)
Researchers count the number of patents a firm files (count)
Some people report zero hours worked because they don’t participate in the labor force (censored)

For all of these, OLS is the wrong tool. This chapter introduces the right ones.

14.1 Why OLS Fails for Non-Continuous Outcomes

Consider modeling whether someone drives to work (\(y = 1\)) or takes the bus (\(y = 0\)). If we just run OLS, we get the linear probability model (LPM):

\[ y_i = \beta_1 + \beta_2 x_i + e_i \]

The fitted values \(\hat{y}\) are interpreted as probabilities, but the model has structural problems:

Predictions outside \([0, 1]\): OLS can predict \(\hat{y} = -0.3\) or \(\hat{y} = 1.4\). Neither is a valid probability.
Constant marginal effects: OLS assumes a one-unit change in \(x\) always changes the probability by \(\beta_2\). But probabilities are bounded, the effect near \(P = 0.5\) should be larger than near \(P = 0\) or \(P = 1\).
Heteroskedasticity: When \(y\) is binary, \(\text{Var}(y \mid x) = P(1-P)\) depends on \(x\), violating homoskedasticity by construction.

These problems are not unique to binary outcomes. Similar structural mismatches arise across the board:

Count data: OLS can predict \(\hat{y} = -1.7\) doctor visits. Negative counts are impossible, and the additive marginal effect ignores the multiplicative structure of count processes.
Ordered categories: Coding “poor/fair/good” as 1/2/3 and running OLS assumes equal spacing between categories and can predict fractional or out-of-range values.
Censored data: When many observations pile up at zero (e.g., hours worked), OLS on the full sample flattens the slope, while OLS on positives only introduces selection bias.

The solution in each case is to specify a model that respects the structure of the dependent variable. The sub-pages below develop each one in detail.

Think: If you estimate \(\text{WORK} = -0.2 + 0.08 \times \text{EDUC}\), what’s the predicted probability for someone with EDUC = 2? What about EDUC = 20?

\(\hat{p} = -0.2 + 0.08 \times 2 = -0.04\). A negative probability. And for EDUC = 20: \(\hat{p} = -0.2 + 0.08 \times 20 = 1.4\). A probability above 1. Both are impossible. The LPM’s linear structure simply cannot respect the \([0, 1]\) bounds. Probit and logit solve this by passing the linear index through a CDF that maps any real number to a valid probability.

14.2 The Unifying Idea: Latent Variables

Many of these models share a common framework. Behind the observed (discrete or censored) outcome is an unobserved latent variable \(y^*\) that is continuous:

\[ y_i^* = x_i'\beta + e_i \]

We observe a transformation of \(y^*\):

Binary choice: We observe \(y = 1\) when \(y^* > 0\) (the person works when the net benefit is positive). The distribution of \(e\) determines the model: normal errors give probit, logistic errors give logit.
Ordered choice: We observe which interval \(y^*\) falls into, with cutpoints \(\mu_1 < \mu_2 < \ldots\) estimated from the data.
Tobit: We observe \(y^*\) directly when it is positive, but only observe \(y = 0\) when \(y^* \leq 0\).
Heckman selection: Two latent variables, one governing participation, one governing the outcome, with correlated errors.

Count data models (Poisson, negative binomial) use a different foundation, a log-link function and distributional assumptions on the count process, but share the same principle: match the statistical model to the data-generating process.

14.3 Estimation: Why Not OLS?

All of the models in this chapter are nonlinear in the parameters. We cannot estimate them with OLS because the relationship between the dependent variable and the regressors passes through a nonlinear function (a CDF, an exponential, or a censoring rule). Instead, we use maximum likelihood estimation (MLE): find the parameter values that make the observed data most likely given the model.

MLE has desirable large-sample properties, consistency, asymptotic normality, and efficiency, but comes with a cost: it requires specifying the distribution of the errors. If the distributional assumption is wrong (e.g., assuming normal errors in probit when the true distribution has heavier tails), the estimates may be inconsistent. This is a stronger requirement than OLS, where consistency holds without distributional assumptions. Each sub-page discusses the specific distributional assumptions for its model and what happens when they are violated.

14.4 Common Themes Across Models

Several ideas recur throughout the sub-pages:

Marginal effects are not coefficients. In every nonlinear model, the raw coefficient \(\beta_k\) does not directly tell you the effect of a one-unit change in \(x_k\) on the outcome. The marginal effect depends on where the observation sits in the distribution. Applied work reports either the average marginal effect (AME) or the marginal effect at the mean (MEM). Each sub-page derives the specific marginal effect formula for its model and explains how to interpret it.

Model selection follows the dependent variable. The table in the Model Selection Summary below maps each type of dependent variable to the appropriate model. The first step in any analysis is to characterize the outcome, binary, ordered, unordered categorical, count, or censored, and let that structure dictate the model.

Testing and diagnostics. Each model family has its own diagnostic tools:

Binary choice: Wald and likelihood ratio tests for coefficient significance; McFadden’s pseudo-\(R^2\) and percent correctly predicted for fit
Multinomial logit: Hausman-McFadden test for IIA
Count data: Overdispersion test (\(H_0: \alpha = 0\)); Vuong test for zero-inflation
Tobit: Comparison of OLS-all, OLS-positives, and Tobit slopes as an informal specification check
Heckman: Significance of the inverse Mills ratio coefficient as a test for selection bias

14.5 Chapter Roadmap

14.5.1 Binary Choice Models

When the outcome is yes/no (admitted/rejected, employed/unemployed, default/no default), logit and probit pass the linear index through an S-shaped CDF to constrain predictions to \([0, 1]\). This is the foundational model for the chapter, most of the other models generalize the same latent variable framework introduced here. The page builds the intuition visually, starting from the LPM’s three failures and moving through the latent variable framework. Topics include:

Maximum likelihood estimation and why OLS cannot be used
Marginal effects: average marginal effect (AME) vs. marginal effect at the mean (MEM)
Odds ratios and the log-odds interpretation of logit coefficients
The Wald test, likelihood ratio test, and McFadden’s pseudo-\(R^2\)
When the LPM is an acceptable approximation (Angrist and Pischke’s defense)

14.5.2 Ordered Choice Models

Outcomes like survey ratings (1–5), credit grades (AAA to D), or health status (poor/fair/good/excellent) have a natural ranking but unknown distances between categories. Coding them as integers and running OLS imposes equal spacing and can predict out-of-range values. Ordered probit and logit model a latent continuous variable partitioned by estimated cutpoints, respecting the ordinal structure without assuming cardinality. Topics include:

Why coding categories as integers and running OLS is wrong
The cutpoint mechanism: how covariates shift the distribution across all categories simultaneously
Marginal effects that sum to zero across categories, with middle categories that can move in either direction
Discrete differences for binary regressors

14.5.3 Multinomial Logit

When individuals choose among three or more unordered alternatives (bus, train, car), separate binary logits fail because probabilities won’t sum to one. Multinomial logit uses a softmax probability across all alternatives, with \(J - 1\) sets of coefficients relative to a base category. Topics include:

The random utility foundation and softmax probabilities
Individual-specific variables (MNL) vs. alternative-specific variables (conditional logit)
Marginal effects that depend on all probabilities and sum to zero across alternatives
The Independence of Irrelevant Alternatives (IIA) assumption and the red bus/blue bus problem
Alternatives when IIA fails: nested logit, mixed logit, multinomial probit

14.5.4 Count Data Models

Dependent variables that count events (doctor visits, patents filed, arrests made) are non-negative integers, often right-skewed with many zeros. OLS can predict negative counts and imposes additive effects where multiplicative effects are more natural. Poisson regression models the conditional mean through a log link (\(\mu = e^{X\beta}\)), guaranteeing non-negative predictions with a semi-elasticity interpretation: a one-unit increase in \(x_k\) multiplies the expected count by \(e^{\beta_k}\). Topics include:

The log link and semi-elasticity interpretation of Poisson coefficients
Equidispersion (\(E[Y] = \text{Var}(Y)\)) and why real data almost always violate it
Consequences of ignoring overdispersion: consistent point estimates but unreliable standard errors
The negative binomial model and its overdispersion parameter \(\alpha\)
Zero-inflated models for excess zeros

14.5.5 The Tobit Model

When a continuous outcome piles up at a boundary (hours worked at zero, charitable donations at zero, expenditure on luxury goods), the data are censored: we observe everyone in the sample, but the latent variable is clipped at the boundary. The zeros are not missing data, they represent a corner solution. OLS on all observations underestimates the slope because the zeros flatten the regression line; OLS on positives only suffers from selection bias because the subsample of positive observations is non-random. Topics include:

The censored data likelihood: normal density for positives, CDF for zeros
Three distinct marginal effects (latent variable, extensive margin, unconditional mean)
The McDonald-Moffitt decomposition into intensive and extensive margin components
Censoring vs. truncation: when the zeros are missing entirely, Tobit is inappropriate
The single-index restriction and when it fails

14.5.6 Heckman Selection

The Tobit model assumes the same parameters govern both the participation decision and the outcome. When this single-index restriction fails, the decision to work may depend on social norms and childcare availability, while hours worked depend on wages and commute time, the Heckman selection model separates the two equations. This connects the limited dependent variable framework back to the endogeneity and selection bias themes from earlier chapters. Topics include:

Sample selection bias: why OLS on the selected sample is inconsistent
The inverse Mills ratio as a sufficient statistic for selection bias
The two-step procedure: probit selection equation, then OLS with the Mills ratio correction
The exclusion restriction: why identification requires a variable that affects participation but not the outcome
Full information maximum likelihood as an alternative to two-step

14.6 Model Selection Summary

Choose the model that matches your dependent variable’s structure. The third column links to the sub-page with the full treatment.
Dependent Variable	Model	Sub-page
Binary (0/1)	LPM, Probit, Logit	Binary Choice
Ordered categories	Ordered Probit, Ordered Logit	Ordered Choice
Unordered categories (3+)	Multinomial Logit, Conditional Logit	Multinomial Logit
Count (0, 1, 2, …)	Poisson, Negative Binomial	Count Data
Censored continuous	Tobit	Tobit
Truncated continuous	Truncated Regression	,
Selected sample	Heckman Selection	Heckman

What’s next?

Follow the sub-page links in the roadmap above for the full treatment of each model family with visual intuition and worked examples. For regularization and prediction methods, see Regularization. To review the panel data methods that precede this chapter, see Panel Data.

Download slides (PDF)

Download presentation slides (with transitions) (PDF)