Midterm 2 Answers

Worked Solutions with Explanations

Exams

Midterm 2

Author

Jake Anderson

Published

March 3, 2026

Modified

March 4, 2026

A Note on Versions

There are two versions. This follows the “purple” version numbering, and I will put the blue version question numbers and answers in parentheses. CAE exams are blue exams (though they were printed on white paper).

Question 1

(Blue Version Question 4)

In a regression model with random regressors, which assumption ensures that OLS is unbiased and consistent?

$E(x)=0$
$\operatorname{Cov}(x, e)=0$
$\operatorname{Var}(x)=\sigma^{2}$
$x$ and $y$ are independent

Show Answer

(b) $\operatorname{Cov}(x, e)=0$

This is the professor’s answer. $\operatorname{Cov}(x,e)=0$ ensures that OLS is consistent ($\hat{\beta} \xrightarrow{p} \beta$). The professor considers this sufficient for both unbiasedness and consistency. (Blue Version (c))

Correction: Zero Covariance Is Sufficient for Consistency but Not Unbiasedness

The professor’s answer is wrong on unbiasedness. This is not an edge case — it is a standard textbook distinction (Wooldridge Ch. 5, Greene Ch. 4):

Unbiasedness requires $E[u \mid X] = 0$ (mean independence, or fixed regressors)
Consistency requires only $\operatorname{Cov}(X, u) = 0$ (plus regularity conditions)

$\operatorname{Cov}(X, u) = 0$ is strictly weaker than $E[u \mid X] = 0$. The following counterexample shows the gap is real, not hypothetical.

Counterexample: Let

\[ \begin{aligned} X_i &\stackrel{iid}{\sim} \mathcal{N}(0,1) \\ y_i &= \beta X_i + u_i \\ u_i &= X_i^2 - 1 \end{aligned} \]

Note that $\operatorname{Cov}(u, X) = E[X^3] - E[X^2 - 1]E[X] = 0$, $E[X]=0$, and $E[u]=0$.

Bias

The OLS estimator is

\[\hat{\beta} = \frac{\sum_{i=1}^n X_i y_i}{\sum_{i=1}^n X_i^2} = \frac{\sum_{i=1}^n X_i (\beta X_i + X_i^2 - 1)}{\sum_{i=1}^n X_i^2} = \beta + \frac{\sum_{i=1}^n X_i^3}{\sum_{i=1}^n X_i^2} - \frac{\sum_{i=1}^n X_i}{\sum_{i=1}^n X_i^2}\]

Taking expectations conditional on $X$:

\[E[\hat{\beta} \mid X] = \beta + \frac{\sum_{i=1}^n X_i^3}{\sum_{i=1}^n X_i^2} - \frac{\sum_{i=1}^n X_i}{\sum_{i=1}^n X_i^2}\]

This is not equal to $\beta$ for generic $X$. For example, if $X = (0.50, -0.14, 0.65)$ (a random draw from $\mathcal{N}(0,1)$), then

\[E[\hat{\beta} \mid X] = \beta + \frac{0.39}{0.69} - \frac{1.01}{0.69} \approx \beta - 0.90 \neq \beta\]

Consistency

By the law of large numbers,

\[\hat{\beta} = \beta + \frac{\sum_{i=1}^n X_i^3}{\sum_{i=1}^n X_i^2} - \frac{\sum_{i=1}^n X_i}{\sum_{i=1}^n X_i^2} \xrightarrow{p} \beta + \frac{E[X^3]}{E[X^2]} - \frac{E[X]}{E[X^2]} = \beta + \frac{0}{1} - \frac{0}{1} = \beta\]

Conclusion

$\operatorname{Cov}(u, X) = 0$ is sufficient for consistency but not for unbiasedness. The correct answer to the question as stated (unbiased and consistent) is (d) $x$ and $y$ are independent, which implies $E[u \mid X] = 0$ and thus guarantees both properties. None of the listed options cleanly states $E[u \mid X] = 0$, but (d) is the only one that implies it. $\square$

Question 2

(Blue Version Question 22)

Below is output from an IV regression estimating the effect of education (EDUC) on log wages, using distance to college (DISTANCE) as an instrument:

Call:
ivreg(formula = log(wage) ~ educ + exper | exper + distance)

Coefficients:
              Estimate  Std. Error  t value  Pr(>|t|)
(Intercept)    3.5820      0.4250    8.427   <2e-16 ***
educ           0.1320      0.0520    2.538    0.0118 *
exper          0.0450      0.0085    5.294   2.1e-07 ***

Diagnostic tests:
                 df1  df2  statistic  p-value
Weak instruments   1  497      8.234  0.00425 **
Wu-Hausman         1  496      0.892  0.34520
Sargan             0   NA         NA       NA

What is the primary concern with this IV estimation?

The instrument likely fails the weak instruments test ($F<10$)
The Wu-Hausman test suggests EDUC is not endogenous
The Sargan test indicates invalid instruments
The coefficient on EXPER is too large

Show Answer

(a) The instrument likely fails the weak instruments test ($F<10$)

The F-statistic for weak instruments is $8.234 < 10$ (rule-of-thumb threshold), so DISTANCE may be a weak instrument.

Wu-Hausman: $p = 0.345$ (not significant), but this test has low power with weak instruments, so the result is unreliable.
Sargan: NA because the model is exactly identified (1 instrument, 1 endogenous variable).
EXPER coefficient: $0.045$ is a reasonable return to experience.

Key Point: Always check the weak instruments F-statistic first — if $F < 10$, all other diagnostics are unreliable. (Blue Version (b))

Question 3

(Blue Version Question 23)

Why is the Sargan test “NA” in the output from Question 2?

The sample size is too small
The first-stage is too weak
There is only one instrument and one endogenous variable (exactly identified)
The instruments are perfectly correlated

Show Answer

(c) There is only one instrument and one endogenous variable (exactly identified)

The Sargan test checks the validity of overidentifying restrictions. It requires more instruments than endogenous variables.

Endogenous variables: 1 (EDUC)
Excluded instruments: 1 (DISTANCE)
$1 = 1 \implies$ exactly identified $\implies$ Sargan is undefined

Key Point: The Sargan test needs at least one “extra” instrument beyond what is needed for identification. Degrees of freedom $= L - K$, where $L$ = number of instruments, $K$ = number of endogenous variables. (Blue Version (d))

Question 4

(Blue Version Question 5)

Suppose we have a system of two structural equations, where $y_1$ and $y_2$ are endogenous and $x_1$, $x_2$ are exogenous:

\[y_1 = \alpha_1 y_2 + \beta_1 x_1 + e_1, \qquad y_2 = \alpha_2 y_1 + \beta_2 x_2 + e_2\]

Which equation represents a correct reduced-form equation for $\widehat{y}_1$?

$\widehat{y}_1 = \widehat{\pi}_1 x_1 + \widehat{\pi}_2 x_2 + v_1$
$\widehat{y}_1 = \widehat{\pi}_1 \widehat{x}_1 + \widehat{\pi}_2 \widehat{x}_2 + v_1$
$\widehat{y}_1 = \widehat{\theta}_1 \widehat{y}_2 + \widehat{\pi}_1 x_1 + \widehat{\pi}_1 x_2 + v_1$
$\widehat{y}_1 = \widehat{\theta}_1 \widehat{y}_2 + \widehat{\pi}_1 \widehat{x}_1 + \widehat{\pi}_2 \widehat{x}_2 + v_1$

Show Answer

(a) $\widehat{y}_1 = \widehat{\pi}_1 x_1 + \widehat{\pi}_2 x_2 + v_1$

A reduced-form equation expresses an endogenous variable as a function of only exogenous variables. Substituting Eq. 2 into Eq. 1 and solving:

\[y_1 = \underbrace{\frac{\beta_1}{1 - \alpha_1\alpha_2}}_{\pi_1} x_1 + \underbrace{\frac{\alpha_1 \beta_2}{1 - \alpha_1\alpha_2}}_{\pi_2} x_2 + v_1\]

1. Wrong: $x$ variables are exogenous — should not have hats
(c), (d) Wrong: include $\widehat{y}_2$ on RHS — not a reduced form

Key Point: Reduced form = only exogenous variables on the RHS. (Blue Version (b))

Question 5

(Blue Version Question 20)

Below is R code computing the correlation between potential instruments and an endogenous variable (PRICE):

> cor(housing_data$property_tax, housing_data$price)
[1] 0.7823
> cor(housing_data$mortgage_rate, housing_data$price)
[1] 0.0234

Which variable would likely be a better instrument for PRICE based on the relevance condition?

MORTGAGE_RATE because the correlation is close to zero
PROPERTY_TAX because it has a strong correlation with PRICE
Both are equally good
Cannot determine without testing exogeneity

Show Answer

(b) PROPERTY_TAX because it has a strong correlation with PRICE

The relevance condition requires $\operatorname{Cov}(z, x) \neq 0$. A stronger correlation means a stronger instrument.

PROPERTY_TAX: $r = 0.7823$ (very strong)
MORTGAGE_RATE: $r = 0.0234$ (essentially zero)

Common Error: (a) confuses relevance with exogeneity. A correlation close to zero means the instrument is irrelevant, not exogenous.

Key Point: The question asks about relevance only. A valid instrument needs both relevance and exogeneity, but here we evaluate relevance alone. (Blue Version (b))

Question 6

(Blue Version Question 21)

The instrumental variables (IV) estimator for the simple regression model is:

\[\widehat{\beta}_{2,IV} = \frac{\sum_i (z_i - \bar{z})(y_i - \bar{y})}{\sum_i (z_i - \bar{z})(x_i - \bar{x})}\]

What would cause this estimator to be undefined or unreliable?

$z$ is not correlated with $x$ (weak instrument)
$z$ is correlated with $e$
The sample size is too large
Both (a) and (b)

Show Answer

(d) Both (a) and (b)

Both (a) and (b) cause problems, but they break the estimator in different ways:

(a) Relevance failure: If $\operatorname{Cov}(z,x) \approx 0$, the denominator $\to 0$, making $\widehat{\beta}_{2,IV}$ undefined (or explosive in finite samples). This is a weak/irrelevant instrument problem.
(b) Exogeneity failure: If $\operatorname{Cov}(z,e) \neq 0$, the estimator remains numerically well-defined but converges to the wrong value: $\hat{\beta}_{2,IV} \xrightarrow{p} \beta_2 + \frac{\operatorname{Cov}(z,e)}{\operatorname{Cov}(z,x)}$. This makes it inconsistent.
(c) A large sample size actually improves the estimator (consistency is asymptotic).

Since the question asks about “undefined or unreliable,” both (a) and (b) qualify — (a) makes it undefined, (b) makes it unreliable — so (d) is correct.

Key Point: A valid instrument must satisfy both relevance ($\operatorname{Cov}(z,x) \neq 0$) and exogeneity ($\operatorname{Cov}(z,e) = 0$). Violating relevance destroys the denominator; violating exogeneity contaminates the numerator. (Blue Version (d))

Question 7

(Blue Version Question 24)

Below is output from an IV regression with TWO instruments (TAX1 and TAX2) for one endogenous variable (PRICE):

Diagnostic tests:
                 df1  df2  statistic  p-value
Weak instruments   2  345     156.89  < 2e-16 ***
Wu-Hausman         1  344       5.23  0.0227 *
Sargan             1   NA       8.45  0.0037 **

What should you conclude from the Sargan test result?

The instruments pass the validity test
At least one instrument appears to be invalid (fails exogeneity)
Both instruments are weak
The model is not overidentified

Show Answer

(b) At least one instrument appears to be invalid (fails exogeneity)

Sargan test: $H_0$: All instruments are valid. $H_1$: At least one is invalid.

Sargan statistic $= 8.45$, $p = 0.0037 < 0.05$ $\implies$ Reject $H_0$
At least one instrument fails the exogeneity requirement

Why others are wrong:

1. Instruments are strong: $F = 156.89 \gg 10$
1. Model is overidentified: $2 - 1 = 1$ overidentifying restriction

Key Point: Sargan $p < 0.05$ means reject validity. Investigate which instrument is problematic. (Blue Version (b))

Question 8

(Blue Version Question 25)

Below are diagnostics from an IV regression with three instruments for one endogenous variable:

Diagnostic tests:
                 df1   df2   statistic   p-value
Weak instruments   3   246     245.67    <2e-16 ***
Wu-Hausman         1   245      12.45   0.00048 ***
Sargan             2    NA       1.23   0.54120

Based on these results, what should you conclude?

Use OLS instead of IV (Wu-Hausman not significant)
Do not use IV; instruments do not pass the Sargan test
Use IV; instruments are strong, endogeneity is present, and overidentifying restrictions are valid
Need more instruments

Show Answer

(c) Use IV; instruments are strong, endogeneity is present, and overidentifying restrictions are valid

Interpret each diagnostic in order:

Weak instruments: $F = 245.67 \gg 10$ $\implies$ instruments are strong
Wu-Hausman: $p = 0.00048 < 0.05$ $\implies$ reject exogeneity of the endogenous regressor $\implies$ IV is needed (rules out OLS)
Sargan: $p = 0.541 > 0.05$ $\implies$ fail to reject $H_0$ $\implies$ instruments pass validity

Key Point: This is the ideal scenario for IV: strong instruments, confirmed endogeneity, and valid overidentifying restrictions. Proceed with confidence. (Blue Version (a))

Question 9

(Blue Version Question 7)

Consider a house price model: PRICE $= \beta_1 + \beta_2\,\text{SQFT} + \beta_3\,\text{BATHS} + e$. Suppose SQFT is measured with error. If we have an instrument for SQFT, the IV estimator will:

Have smaller standard errors than OLS
Be identical to OLS if the measurement error is small
Always be more efficient than OLS
Have larger standard errors than OLS but be consistent

Show Answer

(d) Have larger standard errors than OLS but be consistent

When SQFT is measured with error, OLS suffers from attenuation bias (coefficient biased toward zero). OLS is inconsistent.

IV removes this bias using a valid instrument but at a cost:

IV uses only the variation in $x$ explained by $z$, discarding some information
Result: larger standard errors (less efficient) but consistent estimates

Why others are wrong:

(a)/(c) IV is always less efficient (larger SEs) than OLS
1. OLS is inconsistent regardless of how small the measurement error is

Key Point: The IV trade-off is always consistency for efficiency. IV standard errors > OLS standard errors, but IV is consistent when OLS is not. (Blue Version (a))

Question 10

(Blue Version Question 6)

Below is output from a first-stage regression for a housing price model where LOT_SIZE is endogenous and LOCAL_TAX is the instrument:

Call: lm(lot_size ~ bedrooms + local_tax)
Coefficients:
              Estimate  Std. Error  t value  Pr(>|t|)
(Intercept)   2450.33     325.67    7.524   1.2e-12 ***
bedrooms       245.89      45.23    5.437   8.9e-08 ***
local_tax      -15.67      18.92   -0.828     0.409
---
Residual standard error: 1250 on 247 DF
F-statistic: 12.45 on 2 and 247 DF

Based on this output, what can you conclude about LOCAL_TAX as an instrument?

It is likely a weak instrument because it is not significant ($p=0.409$)
It cannot be evaluated without the second-stage results
It is valid because the coefficient is negative
It is a strong instrument because the F-statistic is above 10

Show Answer

(a) It is likely a weak instrument because it is not significant ($p=0.409$)

To evaluate instrument strength, look at the excluded instrument’s significance, not the overall F-statistic.

LOCAL_TAX: $t = -0.828$, $p = 0.409$ — not significant
Partial F on LOCAL_TAX: $(-0.828)^2 \approx 0.686 \ll 10$

Note: the equivalence $F = t^2$ holds here because there is a single excluded instrument. With multiple excluded instruments, you would need a joint F-test on all excluded instruments together.

Common Mistake (d): The overall F-statistic of $12.45$ tests whether all regressors jointly predict LOT_SIZE. BEDROOMS is an included exogenous variable, not an instrument. The relevant test is LOCAL_TAX alone.

Key Point: Instrument strength = significance of the excluded instrument in the first stage, not the overall F-stat. (Blue Version (b))

Question 11

(Blue Version Question 10)

When you have more instruments than endogenous regressors (overidentification), you can:

Choose the best instrument and discard the others
Use all instruments and test their validity with a Sargan/Hansen test
Only use 2SLS if you have exactly as many instruments as endogenous variables
Average the results from using each instrument separately

Show Answer

(b) Use all instruments and test their validity with a Sargan/Hansen test

When overidentified (more instruments than endogenous variables):

Use all instruments simultaneously in 2SLS — more efficient than any single instrument
Test validity with the Sargan/Hansen J-test of overidentifying restrictions

Why others are wrong:

1. Discarding instruments wastes information
1. 2SLS is designed for overidentification
1. Averaging separate IV estimates is not standard or efficient

Key Point: Overidentification is desirable — it allows both efficiency gains and validity testing. (Blue Version (c))

Question 12

(Blue Version Question 13)

In R, to estimate a 2SLS model using the ivreg function from the AER package:

model <- ivreg(wage ~ educ + exper | exper + sibling_educ)

What is the endogenous variable and what is the instrument?

Endogenous: wage, Instrument: sibling_educ
Endogenous: exper, Instrument: sibling_educ
Endogenous: educ, Instrument: sibling_educ
Endogenous: sibling_educ, Instrument: educ

Show Answer

(c) Endogenous: educ, Instrument: sibling_educ

The ivreg formula syntax: y ~ x1 + x2 | z1 + z2

Left of |: structural equation regressors
Right of |: all exogenous variables (instruments + included exogenous)

exper appears on both sides $\implies$ exogenous (instruments for itself). educ appears on left but not right $\implies$ endogenous variable. sibling_educ appears on right but not left $\implies$ excluded instrument.

Key Point: Variables on the left of | but absent from the right are endogenous. Variables on the right but absent from the left are excluded instruments. (Blue Version (d))

Question 13

(Blue Version Question 14)

A researcher estimates a wage equation and finds that the 2SLS estimate of the returns to schooling is $0.06$ (6%), while the OLS estimate is $0.11$ (11%). If ability is an omitted variable that positively affects both schooling and wages, this result is:

Surprising, because IV should give a larger estimate
Expected, because OLS has upward bias from omitted ability
Evidence that the instrument is invalid
Evidence that schooling is not endogenous

Show Answer

(b) Expected, because OLS has upward bias from omitted ability

Classic omitted variable bias. Ability is positively correlated with both schooling and wages:

\[\text{bias} = \frac{\operatorname{Cov}(\text{schooling}, \text{ability})}{\operatorname{Var}(\text{schooling})} \cdot \gamma_{\text{ability}} > 0\]

\[\hat{\beta}_{\text{OLS}} = \beta_{\text{true}} + \text{positive bias} \implies \hat{\beta}_{\text{OLS}} > \beta_{\text{true}}\]

IV removes this bias: $\hat{\beta}_{2SLS} = 0.06 < \hat{\beta}_{OLS} = 0.11$ is exactly what we expect.

Key Point: When OVB is positive, OLS overestimates the effect. IV gives the true causal effect (assuming valid instrument). (Blue Version (c))

Question 14

(Blue Version Question 15)

In a simultaneous equations system, endogenous variables are:

Determined outside the system
Determined jointly within the system
Always equal to the error terms
Independent of each other

Show Answer

(b) Determined jointly within the system

By definition, endogenous variables are those whose values are determined jointly within the system. Example: in supply-demand, $P$ and $Q$ are both determined by the intersection.

1. Describes exogenous variables
1. Error terms are random disturbances, not endogenous variables
1. Endogenous variables are not independent — joint determination is what makes them endogenous

Key Point: Endogenous = jointly determined within the model. Exogenous = determined outside. (Blue Version (c))

Question 15

(Blue Version Question 16)

Consider a supply and demand model for corn:

\[\text{Demand: } Q = \alpha_1 + \alpha_2 P + \alpha_3\,\text{INCOME} + e_d\] \[\text{Supply: } Q = \beta_1 + \beta_2 P + \beta_3\,\text{RAINFALL} + e_s\]

In this system, which variables are endogenous?

INCOME and RAINFALL
$P$ and $Q$
$e_d$ and $e_s$
INCOME, RAINFALL, $P$, and $Q$

Show Answer

(b) $P$ and $Q$

$P$ and $Q$ are determined jointly by the intersection of supply and demand — they are endogenous.

INCOME and RAINFALL are exogenous (determined outside the model; they shift the curves)
$e_d$ and $e_s$ are random disturbances, not variables

Key Point: In supply-demand models, price and quantity are always the endogenous variables. Demand/supply shifters are exogenous.

Note

The professor’s purple answer key lists (d), which includes INCOME and RAINFALL as endogenous. This is incorrect — INCOME and RAINFALL are exogenous by the structure of the model. They serve as the excluded instruments that identify each equation.

(Blue Version (a))

Question 16

(Blue Version Question 17)

Why does OLS fail when estimating a single equation from a simultaneous system?

The sample size is too small
The endogenous right-hand side variables are correlated with the error term
The exogenous variables are correlated with each other
The errors are heteroskedastic

Show Answer

(b) The endogenous right-hand side variables are correlated with the error term

In a simultaneous system, the endogenous RHS variable (e.g., $P$ in the demand equation) is determined jointly with $Q$. Because $P$ depends on $e_d$ through the equilibrium:

\[\operatorname{Cov}(P, e_d) \neq 0\]

This violates the key OLS assumption and causes simultaneity bias: OLS is both biased and inconsistent.

Why others are wrong:

1. OLS fails in simultaneous systems regardless of sample size — this is a structural problem, not a finite-sample one
1. Multicollinearity among exogenous variables inflates standard errors but does not cause bias or inconsistency
1. Heteroskedasticity affects efficiency and standard errors but does not cause OLS to be biased or inconsistent

Key Point: Simultaneity $\implies$ endogenous RHS variable is correlated with the error $\implies$ need IV/2SLS. (Blue Version (a))

Question 17

(Blue Version Question 2)

The reduced-form equation for price in a supply-demand system expresses:

$P$ as a function of $Q$ only
$P$ as a function of all exogenous variables only
$P$ as a function of $e_d$ and $e_s$ only
$P$ as a function of both endogenous and exogenous variables

Show Answer

(b) $P$ as a function of all exogenous variables only

A reduced-form equation is obtained by solving the simultaneous system so that each endogenous variable is expressed as a function of only exogenous variables (plus a composite error):

\[P = \pi_0 + \pi_1\,\text{INCOME} + \pi_2\,\text{RAINFALL} + v\]

1. Wrong: $Q$ is endogenous
1. Wrong: reduced form depends on exogenous variables, not just errors
1. Wrong: the whole point is to eliminate endogenous variables from the RHS

Key Point: Reduced form = endogenous variable as a function of exogenous variables only. (Blue Version (c))

Question 18

(Blue Version Question 1)

If you regress quantity on price using market equilibrium data, you are likely estimating:

The demand curve
The supply curve
Neither curve — just the equilibrium relationship
Both curves simultaneously

Show Answer

(c) Neither curve — just the equilibrium relationship

Market equilibrium data consists of $(P, Q)$ pairs at the intersection of supply and demand. Both curves shift over time, so:

OLS traces out shifting equilibrium points
These points do not lie along any single curve
The estimate is neither the demand nor supply elasticity — just a meaningless hybrid

Key Point: Without instruments to isolate shifts in one curve, OLS cannot identify either structural relationship. This is why we need 2SLS for simultaneous systems. (Blue Version (d))

Question 19

(Blue Version Question 3)

Suppose we estimate: price $= \beta_0 + \beta_1\,\text{sqft} + \beta_2\,\text{bdrms} + e$, where price is in $1000s. We run 5-fold CV:

fit = lm(price ~ sqft + bdrms, x=TRUE, y=TRUE, data=hprice1)
cv.lm(fit, k = 5)
Mean absolute error     : 48.25429
Sample standard deviation : 7.6221
Mean squared error       : 4343.904
Sample standard deviation : 1182.484
Root mean squared error  : 65.40264
Sample standard deviation : 9.110364

Which is the correct interpretation of the RMSE?

On average, the estimates of home prices are off by $65.40k
On average, the estimates of home prices are off by $65.0\%$
On average, the estimates of home prices are off by $65.40
On average, the estimates of home prices are off by $654.0

Show Answer

(a) On average, the estimates of home prices are off by $65.40k

RMSE is measured in the same units as the dependent variable. Since price is in $1000s:

\[\text{RMSE} = 65.40 \implies \text{typical prediction error} \approx 65.40 \times \$1{,}000 = \$65{,}400 = \$65.40\text{k}\]

1. Wrong: RMSE is not a percentage; it is in the units of $y$
1. Wrong: ignores that price is in $1000s
1. Wrong: no reason to multiply by 10

Technical note: RMSE ($\sqrt{\frac{1}{n}\sum e_i^2}$) is not the same as MAE ($\frac{1}{n}\sum |e_i|$). RMSE penalizes large errors more heavily due to the squaring, so it is always $\geq$ MAE. Here, MAE $= 48.25$k while RMSE $= 65.40$k. Strictly speaking, RMSE measures the “typical” prediction error in a root-mean-square sense, not the simple arithmetic average of absolute errors.

Key Point: RMSE inherits the units of the dependent variable. Always check what units $y$ is measured in. (Blue Version (b))

Question 20

(Blue Version Question 12)

Consider the agricultural market model:

\[\text{Demand: } Q = \alpha_1 + \alpha_2 P + \alpha_3\,\text{PERCAPINCOME} + e_d\] \[\text{Supply: } Q = \beta_1 + \beta_2 P + \beta_3\,\text{WEATHER} + e_s\]

Which equation (if any) is identified?

Only the demand equation
Only the supply equation
Both equations
Neither equation

Show Answer

(c) Both equations

Check the order condition: excluded exogenous variables $\geq$ endogenous RHS variables.

Demand: Endogenous RHS: $P$ (1). Excluded from demand: WEATHER (1). $1 \geq 1$ $\implies$ identified. WEATHER serves as instrument for $P$.

Supply: Endogenous RHS: $P$ (1). Excluded from supply: PERCAPINCOME (1). $1 \geq 1$ $\implies$ identified. PERCAPINCOME serves as instrument for $P$.

Key Point: Each equation needs at least as many excluded exogenous variables as endogenous RHS variables (order condition for identification). (Blue Version (d))

Question 21

(Blue Version Question 11)

Below is output from estimating the housing supply equation with 2SLS:

2SLS estimates for 'supply'
Model Formula: quantity ~ price + labor_cost + trend
Instruments: ~income + labor_cost + trend

              Estimate  Std. Error  t value  Pr(>|t|)
(Intercept)    45.234     15.678    2.885    0.0067 **
price           0.542      0.156    3.474    0.0015 **
labor_cost     -0.287      0.092   -3.120    0.0038 **
trend           1.234      0.345    3.577    0.0011 **

Which coefficient has the wrong expected sign based on economic theory?

price (should be negative)
labor_cost (should be positive)
trend (should be negative)
None of the above — all coefficients have expected signs

Show Answer

(d) None of the above — all coefficients have expected signs

This is a supply equation with quantity on the LHS:

price ($+0.542$): higher price $\implies$ more supply. Positive is correct.
labor_cost ($-0.287$): higher input costs $\implies$ less supply (supply shifts left). Negative is correct.
trend ($+1.234$): positive time trend reflects development/technology over time. Positive is reasonable.

Note: Option (b) claims labor_cost “should be positive.” This would be true in a supply-price function ($P$ on LHS), but with $Q$ on the LHS, higher costs reduce quantity supplied.

Key Point: Always check whether quantity or price is on the LHS before evaluating expected signs. (Blue Version (d))

Question 22

(Blue Version Question 18)

Consider a wage-employment model:

\[\text{Labor Demand: } L = \beta_1 + \beta_2 W + \beta_3\,\text{OUTPUT} + e_d\] \[\text{Labor Supply: } L = \alpha_1 + \alpha_2 W + \alpha_3\,\text{UNEMP} + e_s\]

To estimate the labor demand equation using 2SLS, which variable would you use as an instrument for $W$?

OUTPUT (it’s in the demand equation)
UNEMP (it shifts supply but not demand)
$e_d$ (the error term)
No instrument is needed

Show Answer

(b) UNEMP (it shifts supply but not demand)

A valid instrument for $W$ in the demand equation must be:

Relevant: correlated with $W$
Exogenous: not in the demand equation, uncorrelated with $e_d$

1. OUTPUT is already in the demand equation — cannot be an excluded instrument
1. UNEMP: in supply but not in demand. It shifts supply (affecting $W$) without directly entering demand
1. $e_d$ is unobservable — cannot be used as an instrument
1. $W$ is endogenous, so an instrument is needed

Key Point: Use variables from the other equation as instruments (exclusion restriction). (Blue Version (c))

Question 23

(Blue Version Question 19)

The R code below uses the systemfit package to estimate a simultaneous equations system:

library(systemfit)
demand_eq <- quantity ~ price + income
supply_eq <- quantity ~ price + cost
system_eqs <- list(demand_eq, supply_eq)
instruments <- ~ income + cost
result <- systemfit(system_eqs, method="2SLS",
    inst=instruments, data=market_data)

What are the endogenous variables in this system?

quantity and price
income and cost
quantity, price, income, and cost
Only quantity

Show Answer

(a) quantity and price

From the systemfit code:

Demand: quantity ~ price + income
Supply: quantity ~ price + cost
Instruments: ~ income + cost

quantity and price appear as dependent/RHS variables jointly determined by the system $\implies$ endogenous.

income and cost appear in the instrument list $\implies$ exogenous.

Key Point: Variables in the instrument list are exogenous. Variables that appear as both dependent and RHS variables across equations are endogenous. (Blue Version (b))

Question 24

(Blue Version Question 9)

Consider a supply-demand system for rental apartments. Below is the reduced-form regression for RENT:

Call: lm(rent ~ income + construction_cost)
Coefficients:
                    Estimate  Std. Error  t value  Pr(>|t|)
(Intercept)          450.23     125.45     3.588   0.00048 ***
income                 2.15       0.35     6.143   3.2e-09 ***
construction_cost      1.85       0.28     6.607   6.1e-10 ***
---
F-statistic: 89.34 on 2 and 297 DF, p-value: < 2.2e-16

The positive coefficient on INCOME in the reduced form for RENT indicates:

Higher income shifts supply right
The demand equation is not identified
Higher income shifts demand right, increasing equilibrium rent
INCOME is endogenous

Show Answer

(c) Higher income shifts demand right, increasing equilibrium rent

The reduced-form coefficient on INCOME is $+2.15$ ($p < 0.001$):

Higher INCOME shifts the demand curve right (people can afford more)
With supply unchanged, increased demand drives up equilibrium RENT
This is a reduced-form (equilibrium) effect

Why others are wrong:

1. Wrong: INCOME is a demand shifter, not supply
1. Wrong: demand is identified (CONSTRUCTION_COST is excluded from demand)
1. Wrong: INCOME is exogenous (used as instrument)

Key Point: Reduced-form coefficients capture total equilibrium effects of exogenous variables. (Blue Version (d))

Question 25

(Blue Version Question 8)

A researcher estimates both demand and supply using 2SLS and finds that the price elasticity of demand is $-0.8$ and the price elasticity of supply is $+1.2$. If a policy increases production costs (shifting supply left), what happens to equilibrium price and quantity?

Price increases, quantity decreases
Price decreases, quantity increases
Both price and quantity increase
Both price and quantity decrease

Show Answer

(a) Price increases, quantity decreases

When production costs increase, the supply curve shifts left:

Price increases: reduced supply creates excess demand, bidding price up
Quantity decreases: at the higher price, consumers demand less (moving along the demand curve)

The specific elasticities ($-0.8$ for demand, $+1.2$ for supply) determine the magnitudes of changes, but not the directions — those depend only on the signs of the slopes.

Key Point: Leftward supply shift + downward-sloping demand $\implies$ price up, quantity down. This is standard comparative statics. (Blue Version (a))

Answer Key Table

Purple Q#	Purple Answer	Blue Q#	Blue Answer
1	B*	4	C*
2	A	22	B
3	C	23	D
4	A	5	B
5	B	20	B
6	D	21	D
7	B	24	B
8	C	25	A
9	D	7	A
10	A	6	B
11	B	10	C
12	C	13	D
13	B	14	C
14	B	15	C
15	B**	16	A
16	B**	17	A
17	B	2	C
18	C	1	D
19	A	3	B
20	C	12	D
21	D	11	D
22	B	18	C
23	A	19	B
24	C	9	D
25	A	8	A

* Question 1 is disputed. Professor’s answer is (b)/(c). See the correction note above for why $\operatorname{Cov}(x,e)=0$ guarantees consistency but not unbiasedness.

** Questions 15 and 16: Professor’s purple answer key listed (d) for both. The correct answers are (b) for both — see explanations above.