5  Instrumental Variables

Finding Exogenous Variation When OLS Fails

Endogeneity
Causal Inference
Cross-Section
Author

Jake Anderson

Published

March 3, 2026

Modified

March 4, 2026

Abstract

When an explanatory variable is correlated with the error term, OLS is biased and inconsistent. Instrumental variables provide a way to isolate exogenous variation and recover causal estimates. This chapter develops the IV estimator through two concrete examples, introduces 2SLS, and covers the Hausman test and overidentification tests for instrument validity.

NotePrerequisites

You should be comfortable with OLS estimation and omitted variable bias before reading this chapter. See Omitted Variable Bias and Measurement Error for background on why \(\text{Cov}(x_i, \varepsilon_i) \neq 0\) causes OLS to fail.

5.1 Motivation

Economists are always hunting for ways to estimate causal relationships. We envy colleagues in lab sciences who have the “gold standard” of random assignment. Instead, we often rely on “natural experiments” where something in the world affects two groups differently.

The problem: individuals are not randomly assigned, so our regressions conflate the effect we care about with other effects. This is the same issue we encountered with omitted variable bias.

Endogeneity means \(\text{Cov}(x_i, \varepsilon_i) \neq 0\). The regressor is correlated with the error term, so OLS is biased and inconsistent.

5.2 Example: Returns to Education

\[ \log(\text{wage}_i) = \beta_0 + \beta_1 \text{educ}_i + \varepsilon_i \tag{5.1}\]

This regression asks: “How much does education affect wages?” But people are not randomly assigned to education levels. If “ability” (or motivation, family resources, etc.) affects both education and wages but is unobserved:

  • \(\varepsilon_i\) contains ability
  • More able people get more education
  • \(\text{Cov}(\text{educ}_i, \varepsilon_i) > 0\)
  • \(\implies\) OLS overestimates \(\beta_1\)

This is the classic ability bias argument. The sign of the bias is positive because ability raises both education and wages.

We need a variable that shifts education but has no direct effect on wages. This is an instrument.

5.3 Two Concrete Instruments

Consider two students, Maya and Luis. They both apply to the same competitive charter school. The school uses a random lottery to decide who gets an offer. Maya’s number is drawn; Luis’s is not.

Why this works as an instrument:

  • The students start out equally prepared
  • The only difference is the random offer
  • The offer changes opportunity but doesn’t choose based on talent or effort

Imagine two students, Alex and Jordan. They grow up in similar families, earn similar grades, have the same motivation. The only difference: Alex lives one mile from a community college; Jordan lives twenty miles away.

Why this works as an instrument:

  • The students are similar in ability and background
  • The only meaningful difference is geography
  • Distance affects how easy college is to attend but doesn’t change underlying potential

5.4 Conditions for a Valid Instrument

An instrument \(z\) must satisfy four conditions to be valid for the endogenous regressor \(x\):

Rule of thumb: If you can tell a convincing story for why each condition holds, your instrument is probably defensible. If any condition requires hand-waving, be skeptical.

flowchart LR
    Z["Z<br/>(Instrument)"] -->|"Relevance"| X["X<br/>(Endogenous)"]
    X -->|"Causal effect β₁"| Y["Y<br/>(Outcome)"]
    U["U<br/>(Unobserved)"] -.->|"Endogeneity"| X
    U -.->|"Endogeneity"| Y
    Z -.-x|"Exclusion<br/>(must NOT exist)"| Y

    style Z fill:#1E5A96,color:#fff
    style X fill:#D4A84B,color:#fff
    style Y fill:#2E8B57,color:#fff
    style U fill:#888,color:#fff
Figure 5.1: The IV identification strategy. The instrument Z affects the outcome Y only through the treatment X. The dashed red line (exclusion restriction) must not exist.
  1. Relevance: \(z\) actually shifts \(x\). If \(z\) changes, the probability of treatment must change.
  2. Exclusion: \(z\) affects \(y\) only through \(x\). No direct path from \(z\) to \(y\).
  3. Exogeneity: \(z\) is uncorrelated with unobserved factors in \(\varepsilon\). No selection into \(z\).
  4. Monotonicity: \(z\) pushes everyone’s treatment in the same direction. No “defiers.”
  1. Relevance: Receiving an offer changes whether a student attends the charter school. ✓
  2. Exclusion: The offer itself doesn’t improve skills; only attending can change outcomes. ✓
  3. Exogeneity: The lottery result isn’t influenced by family background or ability. ✓
  4. Monotonicity: Getting an offer can only make someone more likely to attend. ✓
  1. Relevance: Being closer makes college easier to attend. ✓
  2. Exclusion: Living closer doesn’t directly boost earnings or skills. ✓ (debatable)
  3. Exogeneity: Where parents lived wasn’t chosen based on the child’s future ability. ✓ (debatable)
  4. Monotonicity: Shorter distance never makes someone less likely to attend. ✓

Charter lottery with free academic counseling: A city uses a random lottery (exogenous), but every lottery winner — even those who never enroll — automatically receives free academic counseling. The counseling directly improves test scores, creating a path from the instrument to the outcome that does not go through enrollment.

\(\implies\) Exogenous ✓, but exclusion ✗

Equestrian program: A private school offers academic enrichment exclusively to students in its equestrian program. Riding horses doesn’t directly improve test scores (exclusion ✓), but equestrian participation is extremely expensive and correlated with family wealth, which affects scores through many channels.

\(\implies\) Exclusion ✓, but exogeneity ✗

5.5 The IV Estimator

Now that we understand the logic, we can write the IV estimator. With one endogenous regressor and one instrument:

\[ \hat{\beta}_1^{IV} = \frac{\sum_i (z_i - \bar{z})(y_i - \bar{y})}{\sum_i (z_i - \bar{z})(x_i - \bar{x})} = \frac{\widehat{\text{Cov}}(z, y)}{\widehat{\text{Cov}}(z, x)} \tag{5.2}\]

Compare Equation 5.2 to the OLS formula: OLS uses \(\widehat{\text{Cov}}(x, y) / \widehat{\text{Var}}(x)\). IV replaces \(x\) with \(z\) in the numerator and denominator.

5.6 Two-Stage Least Squares (2SLS)

With outcome \(y\), endogenous regressor \(x\), and instrument \(z\):

Stage 1: Regress \(x\) on \(z\) to get the “clean” part of \(x\):

\[ x = \gamma_1 + \theta_1 z + v \tag{5.3}\]

\[ \hat{x} = \hat{\gamma}_1 + \hat{\theta}_1 z \tag{5.4}\]

Stage 2: Replace \(x\) with \(\hat{x}\) and run OLS:

\[ y = \beta_1 + \beta_2 \hat{x} + e^* \tag{5.5}\]

ImportantThe IV variance is larger than OLS

The estimated variance for \(\hat{\beta}_2^{IV}\) uses the original \(x\) (not \(\hat{x}\)) in the residual sum of squares, but uses \(\hat{x}\) in the denominator:

\[ \widehat{\text{var}}(\hat{\beta}_2) = \frac{\hat{\sigma}_{IV}^2}{\sum(\hat{x}_i - \bar{x})^2} \tag{5.6}\]

Since \(\hat{x}\) has less variation than \(x\), the denominator is smaller \(\implies\) IV standard errors are larger. This is the price of consistency.

TipWhat if we have more than one instrument?

If we have instruments \(z_1\) and \(z_2\) for \(x\), modify the first stage: \[ \hat{x} = \hat{\gamma}_1 + \hat{\theta}_1 z_1 + \hat{\theta}_2 z_2 \] This works for any number of instruments for any number of endogenous regressors (see HGL 10.3.8).

5.7 Specification Tests

Two questions to answer:

  1. Is \(x\) actually correlated with the error term? (Do we even need IV?)
  2. Are our instruments valid? (Is \(z\) uncorrelated with the error?)

5.7.1 The Hausman Test for Endogeneity

\[ H_0: \text{Cov}(x_i, e_i) = 0 \quad \iff \quad \text{OLS is consistent} \tag{5.7}\]

\[ H_1: \text{Cov}(x_i, e_i) \neq 0 \quad \iff \quad \text{OLS is biased; use IV} \tag{5.8}\]

Hausman test logic: Split \(x\) into a “clean” part (from the instrument) and a “dirty” part (the first-stage residual). If the dirty part predicts \(y\), then \(x\) is endogenous.

Steps:

  1. Estimate the first stage and get residuals: \(\hat{v} = x - \hat{\gamma}_1 - \hat{\theta}_1 z_1 - \hat{\theta}_2 z_2\)
  2. Add \(\hat{v}\) to the original regression: \(y = \beta_1 + \beta_2 x + \delta \hat{v} + e\)
  3. Test \(H_0: \delta = 0\) with the usual \(t\)-test

The test works by splitting \(x\) into two pieces via the first-stage regression: a “good” part explained by the instrument (variation we trust) and a “bad” part left in the residual (potential bias). If \(x\) is exogenous, the bad part is harmless noise. If \(x\) is endogenous, the bad part still contains the unobserved factors that cause bias.

The test simply checks: does this bad part help explain \(y\)? If yes \(\implies\) \(x\) is endogenous. If no \(\implies\) OLS and IV agree, so use OLS (more efficient).

5.7.2 Testing Instrument Validity (Overidentification)

If we have more instruments than endogenous regressors (\(L > B\)), we can test the surplus instruments for validity:

  1. Compute IV estimates using all instruments
  2. Get IV residuals: \(\hat{e}_{IV} = y - \hat{\beta}_1 - \hat{\beta}_2 x\)
  3. Regress \(\hat{e}_{IV}\) on all instruments and exogenous variables
  4. Test statistic: \(NR^2 \sim \chi^2_{(L-B)}\) under \(H_0\)
WarningThis test has a limitation

The overidentification test tells you there is a bad apple but not which apple is the bad one. If it rejects, you know at least one instrument is invalid, but you must use economic reasoning to determine which.

The Sargan test assumes homoskedasticity; the Hansen J-test is heteroskedasticity-robust.

5.8 Simulation: IV vs OLS with Endogeneity

Figure 5.2: OLS is biased upward when the regressor is endogenous. IV recovers the true slope. 500 simulation runs (N=200 each). The red dashed line marks the true \(\beta_1 = 0.5\).

Notice the IV distribution is wider (less efficient) but centered on the true value. OLS is tighter but shifted right — precise but wrong.

5.9 Worked Example: HGL 10.1

Using state-level data, a researcher examines median rent (\(RENT\)) as a function of median house values (\(MDHOUSE\), in $1000), controlling for urban population share (\(PCTURBAN\)). Instruments: median family income (\(FAMINC\)) and a regional dummy (\(REG4\)).

Table 10.3: Estimates for Exercise 10.1. Standard errors in parentheses.
(1) OLS (2) First Stage (3) Restricted (4) Hausman (5) IV/2SLS (6) Overid
C 125.9 (14.19) −19.78 (10.23) 7.225 (8.936) 121.1 (12.87) 121.1 (15.51) −53.50 (22.66)
PCTURBAN 0.525 (0.249) 0.205 (0.113) 0.616 (0.131) 0.116 (0.254) 0.116 (0.306) −0.257 (0.251)
MDHOUSE 1.521 (0.228) 2.184 (0.282) 2.184 (0.340)
FAMINC 2.584 (0.628) 3.851 (1.393)
REG4 15.89 (3.157) −16.87 (6.998)
\(\hat{v}\) −1.414 (0.411)
\(N\) 50 50 50 50 50 50
\(R^2\) 0.669 0.679 0.317 0.737 0.609 0.198
SSE 20259.6 3907.4 8322.2 16117.6 23925.6 19195.8

Staiger-Stock rule: First-stage \(F > 10\) indicates instruments are not weak. Our \(F \approx 26\) passes easily.

\(MDHOUSE\) is likely endogenous because it is simultaneously determined with \(RENT\). Unobservable factors — local amenities, school quality, zoning regulations — drive both house prices and rents.

\(\implies \text{Cov}(MDHOUSE_i, e_i) \neq 0\), so OLS (column 1) is biased and inconsistent.

Column (2) is the unrestricted first stage; column (3) excludes both instruments. The F-statistic:

\[ F = \frac{(SSE_R - SSE_U) / J}{SSE_U / (N - K)} = \frac{(8322.2 - 3907.4) / 2}{3907.4 / 46} = \frac{2207.4}{84.94} \approx 26.0 \tag{5.9}\]

\(F \approx 26 \gg 10 \implies\) instruments are not weak.

Column (4) adds the first-stage residuals \(\hat{v}\) to the structural equation — this is the Hausman test from Section 5.7.1.

\[ t = \frac{-1.414}{0.411} \approx -3.44 \]

Since \(|t| = 3.44 > 1.96\), we reject \(H_0\) at 5%. \(MDHOUSE\) is endogenous \(\implies\) use IV.

OLS (col 1) IV/2SLS (col 5)
\(\hat{\beta}_{MDHOUSE}\) 1.521 2.184
SE 0.228 0.340

The IV estimate is larger — OLS was underestimating the effect. The SE is also larger, consistent with Equation 5.6: IV trades efficiency for consistency.

Note

The point estimates in columns (4) and (5) are identical. This is not a mistake — including \(\hat{v}\) alongside the original regressors yields the same \(\hat{\beta}\) as 2SLS. The standard errors differ because they use different variance formulas.

With \(L = 2\) instruments and \(B = 1\) endogenous variable, we have \(L - B = 1\) testable restriction.

\[ NR^2 = 50 \times 0.198 = 9.9 \tag{5.10}\]

The 5% critical value for \(\chi^2_{(1)}\) is 3.84. Since \(9.9 > 3.84\), we reject \(H_0\).

Warning

At least one instrument may be invalid. Does \(FAMINC\) directly affect rent (beyond its effect through house values)? This is the most likely suspect — wealthier families may sort into higher-rent areas for reasons unrelated to house prices.

5.10 Decision Guide

flowchart TD
    A["Is X endogenous?<br/>(theory + Hausman test)"] -->|No| B["Use OLS"]
    A -->|Yes| C["Do you have instruments?"]
    C -->|No| D["Cannot estimate causal effect.<br/>Report OLS with caveats."]
    C -->|Yes| E["First-stage F > 10?"]
    E -->|No| F["Weak instruments.<br/>Consider LIML or other methods."]
    E -->|Yes| G["Exactly identified<br/>(L = B)?"]
    G -->|Yes| H["Use IV/2SLS.<br/>Cannot test validity."]
    G -->|No| I["Overidentified<br/>(L > B)"]
    I --> J["Run Sargan/Hansen J test"]
    J -->|Fail to reject| K["Use IV/2SLS.<br/>Instruments appear valid."]
    J -->|Reject| L["At least one instrument invalid.<br/>Investigate or find new instruments."]

    style A fill:#1E5A96,color:#fff
    style B fill:#2E8B57,color:#fff
    style K fill:#2E8B57,color:#fff
    style D fill:#C41E3A,color:#fff
    style F fill:#D4A84B,color:#fff
    style L fill:#C41E3A,color:#fff
Figure 5.3: Decision flowchart for IV estimation.

5.11 Summary

Summary of IV methods and when to use them.
Concept What it does When to use
IV/2SLS Isolates exogenous variation in \(x\) via instrument \(z\) When \(\text{Cov}(x, \varepsilon) \neq 0\)
Hausman test Tests whether \(x\) is endogenous Before choosing between OLS and IV
Weak instrument test Checks first-stage \(F > 10\) Always, before trusting IV results
Overidentification test Tests surplus instrument validity When \(L > B\) (more instruments than endogenous vars)
TipWhat’s next?

The Hausman test reappears in Panel Data as the FE vs RE decision test. The same logic applies: compare a consistent estimator (FE) to an efficient one (RE) and check if they agree.

For practice, try the Midterm 2 questions that cover IV.