flowchart LR
Z["Z<br/>(Instrument)"] -->|"Relevance"| X["X<br/>(Endogenous)"]
X -->|"Causal effect β₁"| Y["Y<br/>(Outcome)"]
U["U<br/>(Unobserved)"] -.->|"Endogeneity"| X
U -.->|"Endogeneity"| Y
Z -.-x|"Exclusion<br/>(must NOT exist)"| Y
style Z fill:#1E5A96,color:#fff
style X fill:#D4A84B,color:#fff
style Y fill:#2E8B57,color:#fff
style U fill:#888,color:#fff
5 Instrumental Variables
Finding Exogenous Variation When OLS Fails
When an explanatory variable is correlated with the error term, OLS is biased and inconsistent. Instrumental variables provide a way to isolate exogenous variation and recover causal estimates. This chapter develops the IV estimator through two concrete examples, introduces 2SLS, and covers the Hausman test and overidentification tests for instrument validity.
You should be comfortable with OLS estimation and omitted variable bias before reading this chapter. See Omitted Variable Bias and Measurement Error for background on why \(\text{Cov}(x_i, \varepsilon_i) \neq 0\) causes OLS to fail.
5.1 Motivation
Economists are always hunting for ways to estimate causal relationships. We envy colleagues in lab sciences who have the “gold standard” of random assignment. Instead, we often rely on “natural experiments” where something in the world affects two groups differently.
The problem: individuals are not randomly assigned, so our regressions conflate the effect we care about with other effects. This is the same issue we encountered with omitted variable bias.
Endogeneity means \(\text{Cov}(x_i, \varepsilon_i) \neq 0\). The regressor is correlated with the error term, so OLS is biased and inconsistent.
5.2 Example: Returns to Education
\[ \log(\text{wage}_i) = \beta_0 + \beta_1 \text{educ}_i + \varepsilon_i \tag{5.1}\]
This regression asks: “How much does education affect wages?” But people are not randomly assigned to education levels. If “ability” (or motivation, family resources, etc.) affects both education and wages but is unobserved:
- \(\varepsilon_i\) contains ability
- More able people get more education
- \(\text{Cov}(\text{educ}_i, \varepsilon_i) > 0\)
- \(\implies\) OLS overestimates \(\beta_1\)
This is the classic ability bias argument. The sign of the bias is positive because ability raises both education and wages.
We need a variable that shifts education but has no direct effect on wages. This is an instrument.
5.3 Two Concrete Instruments
Consider two students, Maya and Luis. They both apply to the same competitive charter school. The school uses a random lottery to decide who gets an offer. Maya’s number is drawn; Luis’s is not.
Why this works as an instrument:
- The students start out equally prepared
- The only difference is the random offer
- The offer changes opportunity but doesn’t choose based on talent or effort
Imagine two students, Alex and Jordan. They grow up in similar families, earn similar grades, have the same motivation. The only difference: Alex lives one mile from a community college; Jordan lives twenty miles away.
Why this works as an instrument:
- The students are similar in ability and background
- The only meaningful difference is geography
- Distance affects how easy college is to attend but doesn’t change underlying potential
5.4 Conditions for a Valid Instrument
An instrument \(z\) must satisfy four conditions to be valid for the endogenous regressor \(x\):
Rule of thumb: If you can tell a convincing story for why each condition holds, your instrument is probably defensible. If any condition requires hand-waving, be skeptical.
- Relevance: \(z\) actually shifts \(x\). If \(z\) changes, the probability of treatment must change.
- Exclusion: \(z\) affects \(y\) only through \(x\). No direct path from \(z\) to \(y\).
- Exogeneity: \(z\) is uncorrelated with unobserved factors in \(\varepsilon\). No selection into \(z\).
- Monotonicity: \(z\) pushes everyone’s treatment in the same direction. No “defiers.”
- Relevance: Receiving an offer changes whether a student attends the charter school. ✓
- Exclusion: The offer itself doesn’t improve skills; only attending can change outcomes. ✓
- Exogeneity: The lottery result isn’t influenced by family background or ability. ✓
- Monotonicity: Getting an offer can only make someone more likely to attend. ✓
- Relevance: Being closer makes college easier to attend. ✓
- Exclusion: Living closer doesn’t directly boost earnings or skills. ✓ (debatable)
- Exogeneity: Where parents lived wasn’t chosen based on the child’s future ability. ✓ (debatable)
- Monotonicity: Shorter distance never makes someone less likely to attend. ✓
Charter lottery with free academic counseling: A city uses a random lottery (exogenous), but every lottery winner — even those who never enroll — automatically receives free academic counseling. The counseling directly improves test scores, creating a path from the instrument to the outcome that does not go through enrollment.
\(\implies\) Exogenous ✓, but exclusion ✗
Equestrian program: A private school offers academic enrichment exclusively to students in its equestrian program. Riding horses doesn’t directly improve test scores (exclusion ✓), but equestrian participation is extremely expensive and correlated with family wealth, which affects scores through many channels.
\(\implies\) Exclusion ✓, but exogeneity ✗
5.5 The IV Estimator
Now that we understand the logic, we can write the IV estimator. With one endogenous regressor and one instrument:
\[ \hat{\beta}_1^{IV} = \frac{\sum_i (z_i - \bar{z})(y_i - \bar{y})}{\sum_i (z_i - \bar{z})(x_i - \bar{x})} = \frac{\widehat{\text{Cov}}(z, y)}{\widehat{\text{Cov}}(z, x)} \tag{5.2}\]
Compare Equation 5.2 to the OLS formula: OLS uses \(\widehat{\text{Cov}}(x, y) / \widehat{\text{Var}}(x)\). IV replaces \(x\) with \(z\) in the numerator and denominator.
5.6 Two-Stage Least Squares (2SLS)
With outcome \(y\), endogenous regressor \(x\), and instrument \(z\):
Stage 1: Regress \(x\) on \(z\) to get the “clean” part of \(x\):
\[ x = \gamma_1 + \theta_1 z + v \tag{5.3}\]
\[ \hat{x} = \hat{\gamma}_1 + \hat{\theta}_1 z \tag{5.4}\]
Stage 2: Replace \(x\) with \(\hat{x}\) and run OLS:
\[ y = \beta_1 + \beta_2 \hat{x} + e^* \tag{5.5}\]
The estimated variance for \(\hat{\beta}_2^{IV}\) uses the original \(x\) (not \(\hat{x}\)) in the residual sum of squares, but uses \(\hat{x}\) in the denominator:
\[ \widehat{\text{var}}(\hat{\beta}_2) = \frac{\hat{\sigma}_{IV}^2}{\sum(\hat{x}_i - \bar{x})^2} \tag{5.6}\]
Since \(\hat{x}\) has less variation than \(x\), the denominator is smaller \(\implies\) IV standard errors are larger. This is the price of consistency.
If we have instruments \(z_1\) and \(z_2\) for \(x\), modify the first stage: \[ \hat{x} = \hat{\gamma}_1 + \hat{\theta}_1 z_1 + \hat{\theta}_2 z_2 \] This works for any number of instruments for any number of endogenous regressors (see HGL 10.3.8).
5.7 Specification Tests
Two questions to answer:
- Is \(x\) actually correlated with the error term? (Do we even need IV?)
- Are our instruments valid? (Is \(z\) uncorrelated with the error?)
5.7.1 The Hausman Test for Endogeneity
\[ H_0: \text{Cov}(x_i, e_i) = 0 \quad \iff \quad \text{OLS is consistent} \tag{5.7}\]
\[ H_1: \text{Cov}(x_i, e_i) \neq 0 \quad \iff \quad \text{OLS is biased; use IV} \tag{5.8}\]
Hausman test logic: Split \(x\) into a “clean” part (from the instrument) and a “dirty” part (the first-stage residual). If the dirty part predicts \(y\), then \(x\) is endogenous.
Steps:
- Estimate the first stage and get residuals: \(\hat{v} = x - \hat{\gamma}_1 - \hat{\theta}_1 z_1 - \hat{\theta}_2 z_2\)
- Add \(\hat{v}\) to the original regression: \(y = \beta_1 + \beta_2 x + \delta \hat{v} + e\)
- Test \(H_0: \delta = 0\) with the usual \(t\)-test
The test works by splitting \(x\) into two pieces via the first-stage regression: a “good” part explained by the instrument (variation we trust) and a “bad” part left in the residual (potential bias). If \(x\) is exogenous, the bad part is harmless noise. If \(x\) is endogenous, the bad part still contains the unobserved factors that cause bias.
The test simply checks: does this bad part help explain \(y\)? If yes \(\implies\) \(x\) is endogenous. If no \(\implies\) OLS and IV agree, so use OLS (more efficient).
5.7.2 Testing Instrument Validity (Overidentification)
If we have more instruments than endogenous regressors (\(L > B\)), we can test the surplus instruments for validity:
- Compute IV estimates using all instruments
- Get IV residuals: \(\hat{e}_{IV} = y - \hat{\beta}_1 - \hat{\beta}_2 x\)
- Regress \(\hat{e}_{IV}\) on all instruments and exogenous variables
- Test statistic: \(NR^2 \sim \chi^2_{(L-B)}\) under \(H_0\)
The overidentification test tells you there is a bad apple but not which apple is the bad one. If it rejects, you know at least one instrument is invalid, but you must use economic reasoning to determine which.
The Sargan test assumes homoskedasticity; the Hansen J-test is heteroskedasticity-robust.
5.8 Simulation: IV vs OLS with Endogeneity
Notice the IV distribution is wider (less efficient) but centered on the true value. OLS is tighter but shifted right — precise but wrong.
5.9 Worked Example: HGL 10.1
Using state-level data, a researcher examines median rent (\(RENT\)) as a function of median house values (\(MDHOUSE\), in $1000), controlling for urban population share (\(PCTURBAN\)). Instruments: median family income (\(FAMINC\)) and a regional dummy (\(REG4\)).
| (1) OLS | (2) First Stage | (3) Restricted | (4) Hausman | (5) IV/2SLS | (6) Overid | |
|---|---|---|---|---|---|---|
| C | 125.9 (14.19) | −19.78 (10.23) | 7.225 (8.936) | 121.1 (12.87) | 121.1 (15.51) | −53.50 (22.66) |
| PCTURBAN | 0.525 (0.249) | 0.205 (0.113) | 0.616 (0.131) | 0.116 (0.254) | 0.116 (0.306) | −0.257 (0.251) |
| MDHOUSE | 1.521 (0.228) | 2.184 (0.282) | 2.184 (0.340) | |||
| FAMINC | 2.584 (0.628) | 3.851 (1.393) | ||||
| REG4 | 15.89 (3.157) | −16.87 (6.998) | ||||
| \(\hat{v}\) | −1.414 (0.411) | |||||
| \(N\) | 50 | 50 | 50 | 50 | 50 | 50 |
| \(R^2\) | 0.669 | 0.679 | 0.317 | 0.737 | 0.609 | 0.198 |
| SSE | 20259.6 | 3907.4 | 8322.2 | 16117.6 | 23925.6 | 19195.8 |
Staiger-Stock rule: First-stage \(F > 10\) indicates instruments are not weak. Our \(F \approx 26\) passes easily.
\(MDHOUSE\) is likely endogenous because it is simultaneously determined with \(RENT\). Unobservable factors — local amenities, school quality, zoning regulations — drive both house prices and rents.
\(\implies \text{Cov}(MDHOUSE_i, e_i) \neq 0\), so OLS (column 1) is biased and inconsistent.
Column (2) is the unrestricted first stage; column (3) excludes both instruments. The F-statistic:
\[ F = \frac{(SSE_R - SSE_U) / J}{SSE_U / (N - K)} = \frac{(8322.2 - 3907.4) / 2}{3907.4 / 46} = \frac{2207.4}{84.94} \approx 26.0 \tag{5.9}\]
\(F \approx 26 \gg 10 \implies\) instruments are not weak.
Column (4) adds the first-stage residuals \(\hat{v}\) to the structural equation — this is the Hausman test from Section 5.7.1.
\[ t = \frac{-1.414}{0.411} \approx -3.44 \]
Since \(|t| = 3.44 > 1.96\), we reject \(H_0\) at 5%. \(MDHOUSE\) is endogenous \(\implies\) use IV.
| OLS (col 1) | IV/2SLS (col 5) | |
|---|---|---|
| \(\hat{\beta}_{MDHOUSE}\) | 1.521 | 2.184 |
| SE | 0.228 | 0.340 |
The IV estimate is larger — OLS was underestimating the effect. The SE is also larger, consistent with Equation 5.6: IV trades efficiency for consistency.
The point estimates in columns (4) and (5) are identical. This is not a mistake — including \(\hat{v}\) alongside the original regressors yields the same \(\hat{\beta}\) as 2SLS. The standard errors differ because they use different variance formulas.
With \(L = 2\) instruments and \(B = 1\) endogenous variable, we have \(L - B = 1\) testable restriction.
\[ NR^2 = 50 \times 0.198 = 9.9 \tag{5.10}\]
The 5% critical value for \(\chi^2_{(1)}\) is 3.84. Since \(9.9 > 3.84\), we reject \(H_0\).
At least one instrument may be invalid. Does \(FAMINC\) directly affect rent (beyond its effect through house values)? This is the most likely suspect — wealthier families may sort into higher-rent areas for reasons unrelated to house prices.
5.10 Decision Guide
flowchart TD
A["Is X endogenous?<br/>(theory + Hausman test)"] -->|No| B["Use OLS"]
A -->|Yes| C["Do you have instruments?"]
C -->|No| D["Cannot estimate causal effect.<br/>Report OLS with caveats."]
C -->|Yes| E["First-stage F > 10?"]
E -->|No| F["Weak instruments.<br/>Consider LIML or other methods."]
E -->|Yes| G["Exactly identified<br/>(L = B)?"]
G -->|Yes| H["Use IV/2SLS.<br/>Cannot test validity."]
G -->|No| I["Overidentified<br/>(L > B)"]
I --> J["Run Sargan/Hansen J test"]
J -->|Fail to reject| K["Use IV/2SLS.<br/>Instruments appear valid."]
J -->|Reject| L["At least one instrument invalid.<br/>Investigate or find new instruments."]
style A fill:#1E5A96,color:#fff
style B fill:#2E8B57,color:#fff
style K fill:#2E8B57,color:#fff
style D fill:#C41E3A,color:#fff
style F fill:#D4A84B,color:#fff
style L fill:#C41E3A,color:#fff
5.11 Summary
| Concept | What it does | When to use |
|---|---|---|
| IV/2SLS | Isolates exogenous variation in \(x\) via instrument \(z\) | When \(\text{Cov}(x, \varepsilon) \neq 0\) |
| Hausman test | Tests whether \(x\) is endogenous | Before choosing between OLS and IV |
| Weak instrument test | Checks first-stage \(F > 10\) | Always, before trusting IV results |
| Overidentification test | Tests surplus instrument validity | When \(L > B\) (more instruments than endogenous vars) |
The Hausman test reappears in Panel Data as the FE vs RE decision test. The same logic applies: compare a consistent estimator (FE) to an efficient one (RE) and check if they agree.
For practice, try the Midterm 2 questions that cover IV.
