Midterm 2 Questions

Practice Problems for Midterm 2

Exams
Midterm 2
Author

Jake Anderson

Published

March 3, 2026

Modified

March 4, 2026


Question 1

Which of the following is not a source of endogeneity that causes the OLS estimator to be biased and inconsistent?

  1. Measurement error in an explanatory variable
  2. Simultaneity between the dependent variable and an explanatory variable
  3. Heteroskedasticity of the error term
  4. Omitted variables that are correlated with an included explanatory variable

Correct Answer: (c)

The three sources of endogeneity (\(\text{Cov}(x, e) \neq 0\)) are:

  • Measurement error: Using a proxy \(x = x^* + u\) introduces correlation between \(x\) and \(e\)
  • Simultaneity: Feedback between variables (e.g., supply and demand) creates \(\text{Cov}(P, e) \neq 0\)
  • Omitted variables: An omitted factor correlated with an included \(x\) enters the error term

Heteroskedasticity (\(\text{Var}(e_i) \neq \sigma^2\)) does not cause \(\text{Cov}(x, e) \neq 0\). It affects standard errors but does not bias or make the OLS coefficient estimates inconsistent.

Reference: Textbook §10.2 (all subsections); Prof. Notes Ch. 10, “KEY TAKEAWAY #2: Three Sources of Endogeneity.”


Question 2

Consider the following supply and demand model for truffles:

\[\text{Demand: } Q_i = \alpha_1 + \alpha_2 P_i + \alpha_3 PS_i + \alpha_4 DI_i + e_i^d\] \[\text{Supply: } Q_i = \beta_1 + \beta_2 P_i + \beta_3 PF_i + e_i^s\]

where \(P\) = price, \(Q\) = quantity, \(PS\) = price of a substitute, \(DI\) = disposable income, and \(PF\) = price of a factor of production. Which variables are endogenous?

  1. \(P\) and \(Q\)
  2. \(P\), \(Q\), and \(PS\)
  3. \(P\), \(Q\), \(PS\), \(DI\), and \(PF\)
  4. Only \(Q\)

Correct Answer: (a)

In a simultaneous equations model:

  • Endogenous variables are determined within the system — their values are jointly determined by the interaction of the equations. Here, \(P\) and \(Q\) are determined by the intersection of supply and demand.
  • Exogenous variables are determined outside the system. Here, \(PS\), \(DI\), and \(PF\) are exogenous — they affect the equilibrium but are not determined by it.

Endogenous variables appear as dependent variables in the system and are correlated with the error terms \(e^d\) and \(e^s\), which is why OLS fails.

Reference: Textbook §11.1; Prof. Notes Ch. 11, “§2 A Supply and Demand Model” and “KEY TAKEAWAY #1: Simultaneity Creates Endogeneity.”


Question 3

Consider the wage equation: \(\ln(WAGE) = \beta_1 + \beta_2 EDUC + \beta_3 EXPER + \beta_4 EXPER^2 + e\)

A researcher proposes using mother’s years of education (MOTHEREDUC) as an instrumental variable for EDUC. For MOTHEREDUC to be a valid instrument, which conditions must hold?

  1. \(\text{Cov}(MOTHEREDUC, EDUC) \neq 0\) and \(\text{Cov}(MOTHEREDUC, e) \neq 0\)
  2. \(\text{Cov}(MOTHEREDUC, EDUC) = 0\) and \(\text{Cov}(MOTHEREDUC, e) = 0\)
  3. \(\text{Cov}(MOTHEREDUC, EDUC) \neq 0\) and \(\text{Cov}(MOTHEREDUC, e) = 0\)
  4. \(\text{Cov}(MOTHEREDUC, EDUC) = 0\) and \(\text{Cov}(MOTHEREDUC, e) \neq 0\)

Correct Answer: (c)

A valid instrumental variable \(z\) must satisfy two conditions:

  • Relevance: \(\text{Cov}(z, x) \neq 0\) — the instrument must be correlated with the endogenous variable. MOTHEREDUC is correlated with EDUC (\(r = 0.39\) in the mroz data).
  • Exogeneity: \(\text{Cov}(z, e) = 0\) — the instrument must be uncorrelated with the error term. We assume MOTHEREDUC does not directly affect a daughter’s wage and is uncorrelated with omitted ability.

Relevance is testable (check the first-stage regression). Exogeneity requires economic reasoning and cannot be directly tested with just-identified models.

Reference: Textbook §10.3.3 (“Characteristics of a Good Instrumental Variable”); Prof. Notes Ch. 10, “KEY TAKEAWAY #3: Instrumental Variables Requirements.”


Question 4

Consider the truffle market model:

\[\text{Demand: } Q_i = \alpha_1 + \alpha_2 P_i + \alpha_3 PS_i + \alpha_4 DI_i + e_i^d\] \[\text{Supply: } Q_i = \beta_1 + \beta_2 P_i + \beta_3 PF_i + e_i^s\]

In a system of \(M = 2\) simultaneous equations, the necessary condition for identification requires that at least \(M - 1 = 1\) variable be excluded from each equation. Which statement is correct?

  1. Both equations are identified
  2. Only the demand equation is identified
  3. Only the supply equation is identified
  4. Neither equation is identified

Correct Answer: (a)

Check the order condition for each equation:

  • Demand equation: \(PF\) is excluded (present in supply, absent from demand). One variable excluded \(\geq M - 1 = 1\). \(\checkmark\) Identified.
  • Supply equation: \(PS\) and \(DI\) are excluded (present in demand, absent from supply). Two variables excluded \(\geq M - 1 = 1\). \(\checkmark\) Identified.

An equation is identified when enough variables are excluded from it to shift the other equation(s), tracing out the curve we want to estimate.

Reference: Textbook §11.4 (“The Identification Problem”); Prof. Notes Ch. 11, “§4.2 A Necessary Condition for Identification” and “KEY TAKEAWAY #3: Identification Requires Exclusion.”


Question 5

Below is the diagnostic output from an IV regression of \(\ln(WAGE)\) on \(EDUC\), \(EXPER\), and \(EXPER^2\), using MOTHEREDUC and FATHEREDUC as instruments for EDUC:

Diagnostic tests:
                 df1 df2 statistic p-value
Weak instruments   2 423    55.400  <2e-16 ***
Wu-Hausman         1 423     2.793  0.0954 .
Sargan             1  NA     0.378  0.5386

At the 5% significance level, which conclusion is correct?

  1. The instruments are weak, so IV estimation is unreliable
  2. There is strong evidence of endogeneity, and the surplus instruments are invalid
  3. The instruments are strong, there is marginal evidence of endogeneity, and the surplus instruments appear valid
  4. The instruments are strong, but the Sargan test rejects instrument validity

Correct Answer: (c)

Interpret each test:

  • Weak instruments (\(F = 55.4 > 10\)): Reject weak instruments. The instruments are strong. \(\checkmark\)
  • Wu-Hausman (\(p = 0.0954\)): \(H_0\): \(\text{Cov}(EDUC, e) = 0\) (OLS is consistent). At 5%, we fail to reject — but it’s close. This is “marginal” evidence of endogeneity.
  • Sargan (\(p = 0.5386\)): \(H_0\): surplus instruments are valid (\(\text{Cov}(z, e) = 0\)). Fail to reject — instruments appear valid.

The Hausman test \(p\)-value of 0.0954 is borderline. Many economists would still use IV/2SLS as a precaution.

Reference: Textbook §10.4 (instrument strength, \(F > 10\) rule), §10.5 (Hausman & Sargan tests); Prof. Notes Ch. 10, “§4 Specification Tests” and the R Example with diagnostics=TRUE.


Question 6

A researcher estimates the wage equation \(\ln(WAGE) = \beta_1 + \beta_2 EDUC + \beta_3 EXPER + \beta_4 EXPER^2 + e\) using the mroz data (\(N = 428\)). The OLS estimate is \(\hat{\beta}_2^{OLS} = 0.1075\) and the IV/2SLS estimate (using MOTHEREDUC as an instrument) is \(\hat{\beta}_2^{IV} = 0.0493\).

If “ability” is an omitted variable that is positively correlated with both EDUC and WAGE, which statement is correct?

  1. OLS overestimates the return to education; this bias disappears as \(N \to \infty\)
  2. OLS overestimates the return to education; this bias persists even as \(N \to \infty\)
  3. OLS underestimates the return to education because ability is positively correlated with EDUC
  4. The IV estimate is biased because MOTHEREDUC is correlated with ability

Correct Answer: (b)

With omitted ability:

  • Ability \(\uparrow\) \(\Rightarrow\) EDUC \(\uparrow\) and WAGE \(\uparrow\), so \(\text{Cov}(EDUC, e) > 0\)
  • OLS attributes the wage effect of ability to education \(\Rightarrow\) positive bias
  • \(b_2 \xrightarrow{p} \beta_2 + \frac{\text{Cov}(x, e)}{\text{Var}(x)} \neq \beta_2\) — this bias does not vanish as \(N \to \infty\)
  • The IV estimator is consistent: \(\hat{\beta}_2^{IV} \xrightarrow{p} \beta_2\) (assuming valid instruments)

Endogeneity makes OLS both biased and inconsistent. This is worse than heteroskedasticity, which only affects standard errors.

Reference: Textbook §10.1.2–10.1.3 (bias & inconsistency), §10.2.4 (omitted variables), Example 10.1; Prof. Notes Ch. 10, “KEY TAKEAWAY #1: Bias vs. Inconsistency.”


Question 7

A researcher estimates the wage equation by 2SLS. The structural equation is:

\[\ln(WAGE) = \beta_1 + \beta_2 EDUC + \beta_3 EXPER + \beta_4 EXPER^2 + e\]

EDUC is endogenous. The instruments are MOTHEREDUC and FATHEREDUC. The estimated first-stage equation is:

EDUC = 9.10 + 0.05*EXPER - 0.001*EXPER^2
       + 0.16*MOTHEREDUC + 0.19*FATHEREDUC

What is the correct second-stage regression?

  1. \(\ln(WAGE) = \beta_1 + \beta_2 \widehat{EDUC} + \beta_3 \widehat{EXPER} + \beta_4 \widehat{EXPER^2} + e^*\)
  2. \(\ln(WAGE) = \beta_1 + \beta_2 \widehat{EDUC} + \beta_3 EXPER + \beta_4 EXPER^2 + e^*\)
  3. \(\ln(WAGE) = \beta_1 + \beta_2 \widehat{EDUC} + e^*\)
  4. \(\ln(WAGE) = \beta_1 + \beta_2 EDUC + \beta_3 EXPER + \beta_4 EXPER^2 + \beta_5 \hat{v} + e\)

Correct Answer: (b)

In the second stage of 2SLS:

  • Only the endogenous variable (\(EDUC\)) is replaced by its fitted value \(\widehat{EDUC}\) from the first stage
  • Exogenous variables (\(EXPER\), \(EXPER^2\)) remain unchanged — they are not endogenous
  • Option (a) is wrong: replacing exogenous variables is unnecessary and incorrect
  • Option (c) is wrong: omitting the exogenous variables changes the model
  • Option (d) is the Hausman test regression, not the 2SLS second stage

Warning: If you run this as two separate OLS regressions, the standard errors from the second stage are incorrect. Always use proper IV software (e.g., ivreg()).

Reference: Textbook §10.3.5 (2SLS in the multiple regression model), Example 10.5; Prof. Notes Ch. 10, “§3 Estimators Based on the Method of Moments” and R Example with ivreg().


Question 8

Consider the following supply and demand model:

\[\text{Demand: } Q_i = \alpha_1 P_i + \alpha_2 X_i + e_{di}\] \[\text{Supply: } Q_i = \beta_1 P_i + e_{si}\]

where \(X\) is income (exogenous). To find the reduced-form equation for equilibrium price \(P\), set demand equal to supply and solve for \(P\). What is the reduced-form equation for \(P\)?

  1. \(P_i = \dfrac{\alpha_2}{\alpha_1 - \beta_1} X_i + \dfrac{e_{di} - e_{si}}{\alpha_1 - \beta_1}\)

  2. \(P_i = \dfrac{\alpha_2}{\beta_1 - \alpha_1} X_i + \dfrac{e_{di} - e_{si}}{\beta_1 - \alpha_1}\)

  3. \(P_i = \dfrac{\beta_1}{\beta_1 - \alpha_1} X_i + \dfrac{e_{si}}{\beta_1 - \alpha_1}\)

  4. \(P_i = \dfrac{\alpha_2}{\beta_1 + \alpha_1} X_i + \dfrac{e_{di} + e_{si}}{\beta_1 + \alpha_1}\)

Correct Answer: (b)

Set demand \(=\) supply and solve:

\[\alpha_1 P_i + \alpha_2 X_i + e_{di} = \beta_1 P_i + e_{si}\] \[(\alpha_1 - \beta_1) P_i = -\alpha_2 X_i + (e_{si} - e_{di})\] \[P_i = \frac{-\alpha_2}{\alpha_1 - \beta_1} X_i + \frac{e_{si} - e_{di}}{\alpha_1 - \beta_1} = \frac{\alpha_2}{\beta_1 - \alpha_1} X_i + \frac{e_{di} - e_{si}}{\beta_1 - \alpha_1}\]

This is the reduced-form equation \(P_i = \pi_1 X_i + v_{1i}\), where:

  • \(\pi_1 = \dfrac{\alpha_2}{\beta_1 - \alpha_1}\) is the reduced-form parameter
  • \(v_{1i} = \dfrac{e_{di} - e_{si}}{\beta_1 - \alpha_1}\) is the reduced-form error

The reduced form expresses endogenous variables as functions of exogenous variables only. OLS is valid for reduced-form equations.

Reference: Textbook §11.2 (“The Reduced-Form Equations,” Eqs. 11.4–11.5); Prof. Notes Ch. 11, “§3 The Reduced-Form Equations.”


Question 9

In the simple wage regression \(\ln(WAGE) = \beta_1 + \beta_2 EDUC + e\), using the mroz data, the sample correlation between the instrument MOTHEREDUC and the endogenous variable EDUC is \(r_{zx} = 0.39\).

If both OLS and IV are consistent (i.e., EDUC is actually exogenous), approximately how many times larger is the IV standard error compared to the OLS standard error?

  1. About 1.5 times larger
  2. About 2.6 times larger
  3. About 6.6 times larger
  4. About 0.39 times as large (smaller)

Correct Answer: (b)

When both estimators are consistent, the ratio of standard errors is:

\[\frac{se(\hat{\beta}_2^{IV})}{se(b_2^{OLS})} \approx \frac{1}{|r_{zx}|} = \frac{1}{0.39} \approx 2.56\]

This means the IV confidence interval is roughly 2.6 times wider than the OLS interval.

  • The IV estimator is always less efficient than OLS (larger variance)
  • If \(r_{zx} = 0.1\) (weak instrument): \(se\) ratio \(\approx 10\) — intervals are 10 times wider!
  • The trade-off: IV gives consistency at the cost of efficiency
  • Use IV only when you have strong instruments and genuine endogeneity concerns

Reference: Textbook §10.3.3 (IV variance formula), §10.4 (weak instruments); Prof. Notes Ch. 10, “§3.1 Properties of the New Estimators.”


Question 10

A researcher estimates the wage equation by IV/2SLS using \(L = 2\) instruments (MOTHEREDUC and FATHEREDUC) for \(B = 1\) endogenous variable (EDUC), with \(N = 428\). To test the validity of the surplus instrument, the IV residuals are regressed on all exogenous variables and both instruments. The \(R^2\) from this auxiliary regression is 0.0009.

What is the Sargan test statistic, what are the degrees of freedom, and what is the conclusion at the 5% level? (Use \(\chi^2_{1, 0.05} = 3.84\).)

  1. Test stat \(= 0.385\), \(df = 1\); fail to reject \(H_0\) — surplus instruments are valid
  2. Test stat \(= 0.385\), \(df = 2\); fail to reject \(H_0\) — surplus instruments are valid
  3. Test stat \(= 3.85\), \(df = 1\); reject \(H_0\) — at least one instrument is invalid
  4. Test stat \(= 0.0009\), \(df = 1\); fail to reject \(H_0\) — surplus instruments are valid

Correct Answer: (a)

The Sargan test for overidentifying restrictions:

  1. Test statistic: \(N \times R^2 = 428 \times 0.0009 = 0.385\)
  2. Degrees of freedom: \(L - B = 2 - 1 = 1\)
  3. Under \(H_0\): all surplus instruments are valid (\(\text{Cov}(z, e) = 0\))
  4. Compare: \(0.385 < 3.84 = \chi^2_{1, 0.05}\)
  5. Conclusion: fail to reject \(H_0\) — surplus instruments appear valid
  • Only surplus instruments (\(L - B\)) can be tested — the minimum required instruments cannot
  • Rejection means at least one instrument is correlated with the error term
  • The test statistic is not the \(R^2\) itself — it is \(N \times R^2\)

Reference: Textbook §10.5 (testing surplus moment conditions), §10B.1.3; Prof. Notes Ch. 10, “§4.4 A test of the validity of the surplus moment conditions.”