Midterm 2 Questions
Practice Problems for Midterm 2
Question 1
Which of the following is not a source of endogeneity that causes the OLS estimator to be biased and inconsistent?
- Measurement error in an explanatory variable
- Simultaneity between the dependent variable and an explanatory variable
- Heteroskedasticity of the error term
- Omitted variables that are correlated with an included explanatory variable
Correct Answer: (c)
The three sources of endogeneity (\(\text{Cov}(x, e) \neq 0\)) are:
- Measurement error: Using a proxy \(x = x^* + u\) introduces correlation between \(x\) and \(e\)
- Simultaneity: Feedback between variables (e.g., supply and demand) creates \(\text{Cov}(P, e) \neq 0\)
- Omitted variables: An omitted factor correlated with an included \(x\) enters the error term
Heteroskedasticity (\(\text{Var}(e_i) \neq \sigma^2\)) does not cause \(\text{Cov}(x, e) \neq 0\). It affects standard errors but does not bias or make the OLS coefficient estimates inconsistent.
Reference: Textbook §10.2 (all subsections); Prof. Notes Ch. 10, “KEY TAKEAWAY #2: Three Sources of Endogeneity.”
Question 2
Consider the following supply and demand model for truffles:
\[\text{Demand: } Q_i = \alpha_1 + \alpha_2 P_i + \alpha_3 PS_i + \alpha_4 DI_i + e_i^d\] \[\text{Supply: } Q_i = \beta_1 + \beta_2 P_i + \beta_3 PF_i + e_i^s\]
where \(P\) = price, \(Q\) = quantity, \(PS\) = price of a substitute, \(DI\) = disposable income, and \(PF\) = price of a factor of production. Which variables are endogenous?
- \(P\) and \(Q\)
- \(P\), \(Q\), and \(PS\)
- \(P\), \(Q\), \(PS\), \(DI\), and \(PF\)
- Only \(Q\)
Correct Answer: (a)
In a simultaneous equations model:
- Endogenous variables are determined within the system — their values are jointly determined by the interaction of the equations. Here, \(P\) and \(Q\) are determined by the intersection of supply and demand.
- Exogenous variables are determined outside the system. Here, \(PS\), \(DI\), and \(PF\) are exogenous — they affect the equilibrium but are not determined by it.
Endogenous variables appear as dependent variables in the system and are correlated with the error terms \(e^d\) and \(e^s\), which is why OLS fails.
Reference: Textbook §11.1; Prof. Notes Ch. 11, “§2 A Supply and Demand Model” and “KEY TAKEAWAY #1: Simultaneity Creates Endogeneity.”
Question 3
Consider the wage equation: \(\ln(WAGE) = \beta_1 + \beta_2 EDUC + \beta_3 EXPER + \beta_4 EXPER^2 + e\)
A researcher proposes using mother’s years of education (MOTHEREDUC) as an instrumental variable for EDUC. For MOTHEREDUC to be a valid instrument, which conditions must hold?
- \(\text{Cov}(MOTHEREDUC, EDUC) \neq 0\) and \(\text{Cov}(MOTHEREDUC, e) \neq 0\)
- \(\text{Cov}(MOTHEREDUC, EDUC) = 0\) and \(\text{Cov}(MOTHEREDUC, e) = 0\)
- \(\text{Cov}(MOTHEREDUC, EDUC) \neq 0\) and \(\text{Cov}(MOTHEREDUC, e) = 0\)
- \(\text{Cov}(MOTHEREDUC, EDUC) = 0\) and \(\text{Cov}(MOTHEREDUC, e) \neq 0\)
Correct Answer: (c)
A valid instrumental variable \(z\) must satisfy two conditions:
- Relevance: \(\text{Cov}(z, x) \neq 0\) — the instrument must be correlated with the endogenous variable. MOTHEREDUC is correlated with EDUC (\(r = 0.39\) in the mroz data).
- Exogeneity: \(\text{Cov}(z, e) = 0\) — the instrument must be uncorrelated with the error term. We assume MOTHEREDUC does not directly affect a daughter’s wage and is uncorrelated with omitted ability.
Relevance is testable (check the first-stage regression). Exogeneity requires economic reasoning and cannot be directly tested with just-identified models.
Reference: Textbook §10.3.3 (“Characteristics of a Good Instrumental Variable”); Prof. Notes Ch. 10, “KEY TAKEAWAY #3: Instrumental Variables Requirements.”
Question 4
Consider the truffle market model:
\[\text{Demand: } Q_i = \alpha_1 + \alpha_2 P_i + \alpha_3 PS_i + \alpha_4 DI_i + e_i^d\] \[\text{Supply: } Q_i = \beta_1 + \beta_2 P_i + \beta_3 PF_i + e_i^s\]
In a system of \(M = 2\) simultaneous equations, the necessary condition for identification requires that at least \(M - 1 = 1\) variable be excluded from each equation. Which statement is correct?
- Both equations are identified
- Only the demand equation is identified
- Only the supply equation is identified
- Neither equation is identified
Correct Answer: (a)
Check the order condition for each equation:
- Demand equation: \(PF\) is excluded (present in supply, absent from demand). One variable excluded \(\geq M - 1 = 1\). \(\checkmark\) Identified.
- Supply equation: \(PS\) and \(DI\) are excluded (present in demand, absent from supply). Two variables excluded \(\geq M - 1 = 1\). \(\checkmark\) Identified.
An equation is identified when enough variables are excluded from it to shift the other equation(s), tracing out the curve we want to estimate.
Reference: Textbook §11.4 (“The Identification Problem”); Prof. Notes Ch. 11, “§4.2 A Necessary Condition for Identification” and “KEY TAKEAWAY #3: Identification Requires Exclusion.”
Question 5
Below is the diagnostic output from an IV regression of \(\ln(WAGE)\) on \(EDUC\), \(EXPER\), and \(EXPER^2\), using MOTHEREDUC and FATHEREDUC as instruments for EDUC:
Diagnostic tests:
df1 df2 statistic p-value
Weak instruments 2 423 55.400 <2e-16 ***
Wu-Hausman 1 423 2.793 0.0954 .
Sargan 1 NA 0.378 0.5386
At the 5% significance level, which conclusion is correct?
- The instruments are weak, so IV estimation is unreliable
- There is strong evidence of endogeneity, and the surplus instruments are invalid
- The instruments are strong, there is marginal evidence of endogeneity, and the surplus instruments appear valid
- The instruments are strong, but the Sargan test rejects instrument validity
Correct Answer: (c)
Interpret each test:
- Weak instruments (\(F = 55.4 > 10\)): Reject weak instruments. The instruments are strong. \(\checkmark\)
- Wu-Hausman (\(p = 0.0954\)): \(H_0\): \(\text{Cov}(EDUC, e) = 0\) (OLS is consistent). At 5%, we fail to reject — but it’s close. This is “marginal” evidence of endogeneity.
- Sargan (\(p = 0.5386\)): \(H_0\): surplus instruments are valid (\(\text{Cov}(z, e) = 0\)). Fail to reject — instruments appear valid.
The Hausman test \(p\)-value of 0.0954 is borderline. Many economists would still use IV/2SLS as a precaution.
Reference: Textbook §10.4 (instrument strength, \(F > 10\) rule), §10.5 (Hausman & Sargan tests); Prof. Notes Ch. 10, “§4 Specification Tests” and the R Example with diagnostics=TRUE.
Question 6
A researcher estimates the wage equation \(\ln(WAGE) = \beta_1 + \beta_2 EDUC + \beta_3 EXPER + \beta_4 EXPER^2 + e\) using the mroz data (\(N = 428\)). The OLS estimate is \(\hat{\beta}_2^{OLS} = 0.1075\) and the IV/2SLS estimate (using MOTHEREDUC as an instrument) is \(\hat{\beta}_2^{IV} = 0.0493\).
If “ability” is an omitted variable that is positively correlated with both EDUC and WAGE, which statement is correct?
- OLS overestimates the return to education; this bias disappears as \(N \to \infty\)
- OLS overestimates the return to education; this bias persists even as \(N \to \infty\)
- OLS underestimates the return to education because ability is positively correlated with EDUC
- The IV estimate is biased because MOTHEREDUC is correlated with ability
Correct Answer: (b)
With omitted ability:
- Ability \(\uparrow\) \(\Rightarrow\) EDUC \(\uparrow\) and WAGE \(\uparrow\), so \(\text{Cov}(EDUC, e) > 0\)
- OLS attributes the wage effect of ability to education \(\Rightarrow\) positive bias
- \(b_2 \xrightarrow{p} \beta_2 + \frac{\text{Cov}(x, e)}{\text{Var}(x)} \neq \beta_2\) — this bias does not vanish as \(N \to \infty\)
- The IV estimator is consistent: \(\hat{\beta}_2^{IV} \xrightarrow{p} \beta_2\) (assuming valid instruments)
Endogeneity makes OLS both biased and inconsistent. This is worse than heteroskedasticity, which only affects standard errors.
Reference: Textbook §10.1.2–10.1.3 (bias & inconsistency), §10.2.4 (omitted variables), Example 10.1; Prof. Notes Ch. 10, “KEY TAKEAWAY #1: Bias vs. Inconsistency.”
Question 7
A researcher estimates the wage equation by 2SLS. The structural equation is:
\[\ln(WAGE) = \beta_1 + \beta_2 EDUC + \beta_3 EXPER + \beta_4 EXPER^2 + e\]
EDUC is endogenous. The instruments are MOTHEREDUC and FATHEREDUC. The estimated first-stage equation is:
EDUC = 9.10 + 0.05*EXPER - 0.001*EXPER^2
+ 0.16*MOTHEREDUC + 0.19*FATHEREDUC
What is the correct second-stage regression?
- \(\ln(WAGE) = \beta_1 + \beta_2 \widehat{EDUC} + \beta_3 \widehat{EXPER} + \beta_4 \widehat{EXPER^2} + e^*\)
- \(\ln(WAGE) = \beta_1 + \beta_2 \widehat{EDUC} + \beta_3 EXPER + \beta_4 EXPER^2 + e^*\)
- \(\ln(WAGE) = \beta_1 + \beta_2 \widehat{EDUC} + e^*\)
- \(\ln(WAGE) = \beta_1 + \beta_2 EDUC + \beta_3 EXPER + \beta_4 EXPER^2 + \beta_5 \hat{v} + e\)
Correct Answer: (b)
In the second stage of 2SLS:
- Only the endogenous variable (\(EDUC\)) is replaced by its fitted value \(\widehat{EDUC}\) from the first stage
- Exogenous variables (\(EXPER\), \(EXPER^2\)) remain unchanged — they are not endogenous
- Option (a) is wrong: replacing exogenous variables is unnecessary and incorrect
- Option (c) is wrong: omitting the exogenous variables changes the model
- Option (d) is the Hausman test regression, not the 2SLS second stage
Warning: If you run this as two separate OLS regressions, the standard errors from the second stage are incorrect. Always use proper IV software (e.g., ivreg()).
Reference: Textbook §10.3.5 (2SLS in the multiple regression model), Example 10.5; Prof. Notes Ch. 10, “§3 Estimators Based on the Method of Moments” and R Example with ivreg().
Question 8
Consider the following supply and demand model:
\[\text{Demand: } Q_i = \alpha_1 P_i + \alpha_2 X_i + e_{di}\] \[\text{Supply: } Q_i = \beta_1 P_i + e_{si}\]
where \(X\) is income (exogenous). To find the reduced-form equation for equilibrium price \(P\), set demand equal to supply and solve for \(P\). What is the reduced-form equation for \(P\)?
\(P_i = \dfrac{\alpha_2}{\alpha_1 - \beta_1} X_i + \dfrac{e_{di} - e_{si}}{\alpha_1 - \beta_1}\)
\(P_i = \dfrac{\alpha_2}{\beta_1 - \alpha_1} X_i + \dfrac{e_{di} - e_{si}}{\beta_1 - \alpha_1}\)
\(P_i = \dfrac{\beta_1}{\beta_1 - \alpha_1} X_i + \dfrac{e_{si}}{\beta_1 - \alpha_1}\)
\(P_i = \dfrac{\alpha_2}{\beta_1 + \alpha_1} X_i + \dfrac{e_{di} + e_{si}}{\beta_1 + \alpha_1}\)
Correct Answer: (b)
Set demand \(=\) supply and solve:
\[\alpha_1 P_i + \alpha_2 X_i + e_{di} = \beta_1 P_i + e_{si}\] \[(\alpha_1 - \beta_1) P_i = -\alpha_2 X_i + (e_{si} - e_{di})\] \[P_i = \frac{-\alpha_2}{\alpha_1 - \beta_1} X_i + \frac{e_{si} - e_{di}}{\alpha_1 - \beta_1} = \frac{\alpha_2}{\beta_1 - \alpha_1} X_i + \frac{e_{di} - e_{si}}{\beta_1 - \alpha_1}\]
This is the reduced-form equation \(P_i = \pi_1 X_i + v_{1i}\), where:
- \(\pi_1 = \dfrac{\alpha_2}{\beta_1 - \alpha_1}\) is the reduced-form parameter
- \(v_{1i} = \dfrac{e_{di} - e_{si}}{\beta_1 - \alpha_1}\) is the reduced-form error
The reduced form expresses endogenous variables as functions of exogenous variables only. OLS is valid for reduced-form equations.
Reference: Textbook §11.2 (“The Reduced-Form Equations,” Eqs. 11.4–11.5); Prof. Notes Ch. 11, “§3 The Reduced-Form Equations.”
Question 9
In the simple wage regression \(\ln(WAGE) = \beta_1 + \beta_2 EDUC + e\), using the mroz data, the sample correlation between the instrument MOTHEREDUC and the endogenous variable EDUC is \(r_{zx} = 0.39\).
If both OLS and IV are consistent (i.e., EDUC is actually exogenous), approximately how many times larger is the IV standard error compared to the OLS standard error?
- About 1.5 times larger
- About 2.6 times larger
- About 6.6 times larger
- About 0.39 times as large (smaller)
Correct Answer: (b)
When both estimators are consistent, the ratio of standard errors is:
\[\frac{se(\hat{\beta}_2^{IV})}{se(b_2^{OLS})} \approx \frac{1}{|r_{zx}|} = \frac{1}{0.39} \approx 2.56\]
This means the IV confidence interval is roughly 2.6 times wider than the OLS interval.
- The IV estimator is always less efficient than OLS (larger variance)
- If \(r_{zx} = 0.1\) (weak instrument): \(se\) ratio \(\approx 10\) — intervals are 10 times wider!
- The trade-off: IV gives consistency at the cost of efficiency
- Use IV only when you have strong instruments and genuine endogeneity concerns
Reference: Textbook §10.3.3 (IV variance formula), §10.4 (weak instruments); Prof. Notes Ch. 10, “§3.1 Properties of the New Estimators.”
Question 10
A researcher estimates the wage equation by IV/2SLS using \(L = 2\) instruments (MOTHEREDUC and FATHEREDUC) for \(B = 1\) endogenous variable (EDUC), with \(N = 428\). To test the validity of the surplus instrument, the IV residuals are regressed on all exogenous variables and both instruments. The \(R^2\) from this auxiliary regression is 0.0009.
What is the Sargan test statistic, what are the degrees of freedom, and what is the conclusion at the 5% level? (Use \(\chi^2_{1, 0.05} = 3.84\).)
- Test stat \(= 0.385\), \(df = 1\); fail to reject \(H_0\) — surplus instruments are valid
- Test stat \(= 0.385\), \(df = 2\); fail to reject \(H_0\) — surplus instruments are valid
- Test stat \(= 3.85\), \(df = 1\); reject \(H_0\) — at least one instrument is invalid
- Test stat \(= 0.0009\), \(df = 1\); fail to reject \(H_0\) — surplus instruments are valid
Correct Answer: (a)
The Sargan test for overidentifying restrictions:
- Test statistic: \(N \times R^2 = 428 \times 0.0009 = 0.385\)
- Degrees of freedom: \(L - B = 2 - 1 = 1\)
- Under \(H_0\): all surplus instruments are valid (\(\text{Cov}(z, e) = 0\))
- Compare: \(0.385 < 3.84 = \chi^2_{1, 0.05}\)
- Conclusion: fail to reject \(H_0\) — surplus instruments appear valid
- Only surplus instruments (\(L - B\)) can be tested — the minimum required instruments cannot
- Rejection means at least one instrument is correlated with the error term
- The test statistic is not the \(R^2\) itself — it is \(N \times R^2\)
Reference: Textbook §10.5 (testing surplus moment conditions), §10B.1.3; Prof. Notes Ch. 10, “§4.4 A test of the validity of the surplus moment conditions.”