Final Exam Questions

Practice Problems for the Final Exam

Exams

Final

Author

Jake Anderson

Published

March 10, 2026

Modified

March 10, 2026

Ch 8–9: Heteroskedasticity & Time Series

Question 1

Consider the infinite lag representation $y_t = \alpha + \sum_{s=0}^{\infty} \beta_s x_{t-s} + e_t$ for the ARDL model:

\[y_t = \delta + \theta_1 y_{t-1} + \theta_3 y_{t-3} + \delta_1 x_{t-1} + v_t\]

Find an expression for $\alpha$.

$0$
$\delta_1$
$\dfrac{\delta}{1 - \theta_1}$
$\dfrac{\delta}{1 - \theta_1 - \theta_3}$
None of the above

Answer

Correct Answer: (d)

Write the ARDL model using the lag operator $L$:

\[(1 - \theta_1 L - \theta_3 L^3) y_t = \delta + \delta_1 L x_t + v_t\]

The infinite lag representation is obtained by inverting:

\[y_t = (1 - \theta_1 L - \theta_3 L^3)^{-1} (\delta + \delta_1 L x_t)\]

Equating constant terms between the two representations: set $L = 1$ (steady state) in the lag polynomial:

\[\delta = (1 - \theta_1 - \theta_3)\alpha \implies \alpha = \frac{\delta}{1 - \theta_1 - \theta_3}\]

Reference: Textbook §9.1 (ARDL models); Prof. Notes Ch. 9, “Infinite Distributed Lag Representation.” Adapted from Spring 2024 Q5.

Question 2

Given a time series plot, ACF, and PACF where:

The time series oscillates around a constant mean with no apparent trend
The ACF decays to zero quickly (within 2–3 lags)
The PACF shows significant spikes at lags 1, 2, and 3, then cuts off

Which statements are correct?

The ACF shows persistence, suggesting non-stationarity
The PACF spikes at lags 1, 2, and 3 indicate an AR(3) component
The time series plot suggests stationarity (no trend or seasonality)
The series appears stationary as the ACF decays to zero quickly

1. and (ii) only
1. and (iii) only
1. and (iv) only
1. and (iv) only
(ii), (iii), and (iv) only

Answer

Correct Answer: (e)

(i) is wrong: Quick ACF decay means no persistence — the series is stationary
(ii) is correct: PACF cuts off after lag 3 — indicates an AR(3) process
(iii) is correct: No trend or seasonality in the time series plot — stationary
(iv) is correct: Fast ACF decay is a hallmark of stationarity

Reading ACF/PACF plots:

ACF decays slowly $\implies$ non-stationary or near-unit-root
PACF cuts off at lag $p$ $\implies$ AR($p$) model
ACF cuts off at lag $q$ $\implies$ MA($q$) model

Reference: Textbook §9.3 (ACF/PACF interpretation); Prof. Notes Ch. 9, “§3 Identifying Time Series Models.” Adapted from Spring 2024 Q6.

Question 3

Given the following sample autocorrelations with $T = 680$ observations:

Lag	$r_k$
1	0.32
2	$-0.91$
3	0.08
4	$-0.01$

Which lags are statistically significant at the 5% level? (Use $z_{0.975} = 1.96$.)

Lag 1 only
Lags 1 and 2 only
Lags 1 and 3 only
Lags 1, 2, and 3 only
Lags 1, 2, 3, and 4

Answer

Correct Answer: (d)

The test statistic for each lag is $r_k \times \sqrt{T}$, compared against $\pm 1.96$:

Lag	$r_k$	$r_k \times \sqrt{680} \approx r_k \times 26.08$	Significant?
1	0.32	8.35	Yes
2	$-0.91$	$-23.73$	Yes
3	0.08	2.09	Yes ($> 1.96$)
4	$-0.01$	$-0.26$	No

Lags 1, 2, and 3 are significant; lag 4 is not. Note that lag 3 barely exceeds the critical value.

Reference: Textbook §9.3 (testing autocorrelation significance); Prof. Notes Ch. 9, “§3 Testing Individual Autocorrelations.” From Spring 2024 Q7.

Ch 10–11: IV/Endogeneity & Simultaneous Equations

Question 4

Consider the system of simultaneous equations for the demand and supply of a panini:

\[Q_d = \alpha_0 + \alpha_1 P + \alpha_2 Y + \alpha_3 Z + \alpha_4 F + u_d\] \[Q_s = \beta_0 + \beta_1 P + \beta_2 W + \beta_3 S + u_s\]

where $P$ = price, $Y$ = spending limit, $W$ = cost of production, $Z$ = price of a poke bowl, $S$ = salmon fishing season, $F$ = final exam time.

Given the reduced form equations:

\[P = \gamma_0 + \gamma_1 Y + \gamma_2 W + \gamma_3 Z + \gamma_4 S + \gamma_5 F + v_1\]

What is the functional form of $\gamma_5$?

$\dfrac{-\alpha_4}{\alpha_1 - \beta_1}$
$\dfrac{-\alpha_3}{\alpha_1 - \beta_1}$
$\dfrac{\beta_2}{\alpha_1 - \beta_1}$
$\dfrac{\beta_0}{\alpha_1 - \beta_1}$
None of the given answers are correct

Answer

Correct Answer: (a)

Set $Q_d = Q_s$ and solve for $P$:

\[\alpha_0 + \alpha_1 P + \alpha_2 Y + \alpha_3 Z + \alpha_4 F + u_d = \beta_0 + \beta_1 P + \beta_2 W + \beta_3 S + u_s\]

\[(\alpha_1 - \beta_1) P = (\beta_0 - \alpha_0) + \beta_2 W + \beta_3 S - \alpha_2 Y - \alpha_3 Z - \alpha_4 F + (u_s - u_d)\]

The coefficient on $F$ in the reduced form for $P$ is:

\[\gamma_5 = \frac{-\alpha_4}{\alpha_1 - \beta_1}\]

Each reduced-form coefficient is a ratio of structural parameters.

Reference: Textbook §11.2 (“The Reduced-Form Equations”); Prof. Notes Ch. 11, “§3 The Reduced-Form Equations.” Adapted from Spring 2024 Q11.

Question 5

Suppose the regression of Price on no independent variables (intercept-only model) has $SSE = 1800$. The first-stage regression (Price on all exogenous variables and instruments) has $SSE = 1750$. The sample size is $N = 400$, and there are $J = 5$ instruments and $K = 6$ total parameters in the first-stage regression.

What can be said about the instruments?

The F-statistic is 2.25 and the instruments are strong
The F-statistic is 2.25 and the instruments are weak
The F-statistic is 12.52 and the instruments are strong
The F-statistic is 12.52 and the instruments are weak
We need more information to solve the question

Answer

Correct Answer: (b)

Using the F-statistic formula:

\[F = \frac{(SSE_r - SSE_u)/J}{SSE_u/(N - K)} = \frac{(1800 - 1750)/5}{1750/(400 - 6)} = \frac{50/5}{1750/394} = \frac{10}{4.44} = 2.25\]

Since $F = 2.25 < 10$, the instruments are jointly weak.

The $F > 10$ rule of thumb (Staiger & Stock):

$F > 10$: instruments are strong enough for reliable IV estimation
$F < 10$: weak instruments — IV estimates are biased toward OLS, confidence intervals are unreliable

Reference: Textbook §10.4 (“Weak Instruments,” Staiger & Stock rule); Prof. Notes Ch. 10, “§4.1 The Strength of Instruments.” From Spring 2024 Q12.

Question 6

A researcher estimates a model by IV/2SLS using $L = 3$ instruments for $B = 1$ endogenous variable. The diagnostic output is:

Diagnostic tests:
                 df1 df2 statistic p-value
Weak instruments   3 420    42.100  <2e-16 ***
Wu-Hausman         1 420     5.831  0.0162 *
Sargan             2  NA     1.204  0.5478

At the 5% significance level, which set of conclusions is correct?

Instruments are weak; endogeneity is present; surplus instruments are valid
Instruments are strong; no evidence of endogeneity; surplus instruments are invalid
Instruments are strong; endogeneity is present; surplus instruments appear valid
Instruments are strong; endogeneity is present; surplus instruments are invalid

Answer

Correct Answer: (c)

Interpret each test:

Weak instruments ($F = 42.1 \gg 10$): Strong instruments. $\checkmark$
Wu-Hausman ($p = 0.0162 < 0.05$): Reject $H_0$: OLS is consistent $\implies$ evidence of endogeneity. Use IV. $\checkmark$
Sargan ($p = 0.5478 > 0.05$, $df = L - B = 3 - 1 = 2$): Fail to reject $H_0$: surplus instruments are valid. $\checkmark$

The 3-test decision tree for IV:

Are instruments strong? ($F > 10$)
Is endogeneity present? (Hausman $p < 0.05$ $\implies$ use IV)
Are surplus instruments valid? (Sargan $p > 0.05$ $\implies$ valid)

Reference: Textbook §10.4–10.5 (instrument strength, Hausman, Sargan); Prof. Notes Ch. 10, “§4 Specification Tests.” Original question combining all three IV diagnostics.

Question 7

Consider a three-equation simultaneous system ($M = 3$):

\[Y_1 = \alpha_0 + \alpha_1 Y_2 + \alpha_2 Y_3 + \alpha_3 X_1 + u_1\] \[Y_2 = \beta_0 + \beta_1 Y_1 + \beta_2 X_1 + \beta_3 X_2 + u_2\] \[Y_3 = \gamma_0 + \gamma_1 Y_1 + \gamma_2 X_2 + \gamma_3 X_3 + u_3\]

where $Y_1, Y_2, Y_3$ are endogenous and $X_1, X_2, X_3$ are exogenous. Using the order condition ($\geq M - 1 = 2$ excluded exogenous variables per equation), which equations are identified?

Only equation 1
Only equations 2 and 3
All three equations
None of the equations

Answer

Correct Answer: (a)

Check the order condition for each equation (need $\geq M - 1 = 2$ excluded exogenous variables):

Eq. 1: Includes $X_1$. Excludes $X_2, X_3$ $\implies$ 2 excluded $\geq 2$. $\checkmark$ Just-identified.
Eq. 2: Includes $X_1, X_2$. Excludes $X_3$ only $\implies$ 1 excluded $< 2$. $\times$ Under-identified.
Eq. 3: Includes $X_2, X_3$. Excludes $X_1$ only $\implies$ 1 excluded $< 2$. $\times$ Under-identified.

Only equation 1 satisfies the order condition. Equations 2 and 3 are under-identified because they each exclude only 1 exogenous variable, but with $M = 3$ equations we need at least 2 exclusions.

Reference: Textbook §11.4 (“Order Condition”); Prof. Notes Ch. 11, “§4.2 A Necessary Condition for Identification.” Adapted from Spring 2021 Q25–26.

Ch 15: Panel Data

Question 8

A researcher estimates a panel data model and runs an F-test for individual effects. The output is:

       F test for individual effects

data:  lsales ~ lcapital + llabor
F = 14.386, df1 = 999, df2 = 1998,
p-value < 2.2e-16
alternative hypothesis: significant effects

At the 5% significance level, what is the conclusion?

Since the p-value is small, we reject $H_0$ of no fixed effects $\implies$ individual effects exist
Since the p-value is small, we fail to reject $H_0$ of no fixed effects
Since the p-value is small, we reject $H_0$ of zero variance of individual-specific errors
Since the p-value is small, we fail to reject $H_0$ of zero variance of individual-specific errors

Answer

Correct Answer: (a)

The F-test (pFtest) compares pooled OLS vs. fixed effects:

$H_0$: All individual effects are zero (pooled OLS is adequate)
$H_1$: At least some individual effects are non-zero (need FE or RE)

Since $p < 2.2 \times 10^{-16} < 0.05$, we reject $H_0$ $\implies$ individual effects exist, so pooled OLS is not appropriate.

This is Step 1 of the panel model selection workflow:

F-test: Pooled OLS vs. FE $\implies$ Do individual effects exist?
LM test: Pooled OLS vs. RE $\implies$ Is there random variation?
Hausman test: FE vs. RE $\implies$ Is there endogeneity?

Reference: Textbook §15.4 (“Testing for Fixed Effects”); Prof. Notes Ch. 15, “§4 Testing for Individual Effects.” Adapted from Fall 2022 Q15.

Question 9

A researcher runs two additional tests on the same panel data:

  Lagrange Multiplier Test - (Honda)
  data:  lsales ~ lcapital + llabor
  normal = 44.064, p-value < 2.2e-16
  alternative hypothesis: significant effects

  Hausman Test
  data:  lsales ~ lcapital + llabor
  chisq = 98.817, df = 2, p-value < 2.2e-16
  alternative hypothesis: one model is inconsistent

Given these results (and the F-test from Q8), which model should we use?

Pooled OLS
Fixed Effects
Random Effects
Hausman-Taylor Estimator

Answer

Correct Answer: (b)

Walk through the decision tree:

F-test ($p < 2.2 \times 10^{-16}$): Reject $H_0$ $\implies$ individual effects exist. Rule out pooled OLS.
LM test ($p < 2.2 \times 10^{-16}$): Reject $H_0$: $\sigma_u^2 = 0$ $\implies$ random effects are present.
Hausman test ($p < 2.2 \times 10^{-16}$): Reject $H_0$: $\text{Cov}(u_i, x_{it}) = 0$ $\implies$ individual effects are correlated with regressors $\implies$ RE is inconsistent.

Both tests confirm individual effects, and the Hausman test tells us there is endogeneity, so fixed effects is the correct model.

Rejecting the Hausman test always points to FE (or Hausman-Taylor if you need time-invariant covariates).

Reference: Textbook §15.4–15.5 (F, LM, Hausman tests); Prof. Notes Ch. 15, “§4–5 Model Selection Tests.” Adapted from Fall 2022 Q15–17.

Question 10

In a panel data model, the variance of the individual heterogeneity is $\sigma_u^2 = 0.8$ and the variance of the idiosyncratic error is $\sigma_e^2 = 0.05$.

What is the intra-class correlation $\rho$, i.e., the correlation between the composite error of two observations from the same individual in different time periods?

\[\rho = \text{Corr}(w_{it}, w_{is}) \quad \text{where } w_{it} = u_i + e_{it}\]

0.059
0.941
0.484
None of the above

Answer

Correct Answer: (b)

The intra-class correlation formula:

\[\rho = \frac{\sigma_u^2}{\sigma_u^2 + \sigma_e^2} = \frac{0.8}{0.8 + 0.05} = \frac{0.8}{0.85} = 0.941\]

Interpretation: 94.1% of the total error variance is due to individual heterogeneity, and only 5.9% is idiosyncratic. Observations within the same individual are highly correlated.

High $\rho$ $\implies$ strong individual effects $\implies$ pooled OLS is badly inefficient
$\rho$ close to 1 $\implies$ most variation is between individuals, not within
$\rho = 0$ $\implies$ no individual effects, pooled OLS is fine

Reference: Textbook §15.2 (“The Error Components Model,” intra-class correlation); Prof. Notes Ch. 15, “§2 Error Components.” From Spring 2024 Q19 / Fall 2022 Q19.

Question 11

Under what conditions would it be appropriate to use a Hausman-Taylor estimator?

When the independent variables are strictly exogenous and there are no individual effects
When we need to estimate the effects of both time-changing and time-invariant variables in a panel data setting, especially when some of the time-changing variables are endogenous
When the data is purely cross-sectional with no time component
When all variables are time-varying and there is no correlation with individual effects
When using pooled OLS to estimate a panel data model with no fixed effects

Answer

Correct Answer: (b)

The Hausman-Taylor estimator addresses a specific dilemma:

Fixed effects removes all time-invariant variables (including ones we care about, like gender or race)
Random effects keeps time-invariant variables but is inconsistent if $\text{Cov}(u_i, x_{it}) \neq 0$

Hausman-Taylor combines both approaches: it uses the time-varying exogenous variables as instruments for the endogenous time-invariant variables.

Use Hausman-Taylor when:

The Hausman test rejects RE (endogeneity present)
You want to estimate coefficients on time-invariant variables
Some time-varying regressors are exogenous (to serve as instruments)

Reference: Textbook §15.6 (“Hausman-Taylor Estimator”); Prof. Notes Ch. 15, “§6 The Hausman-Taylor Estimator.” From Spring 2024 Q23.

Question 12

Which of the following statements about fixed effects estimation is false?

The within estimator and the LSDV (Least Squares Dummy Variable) estimator produce identical slope coefficients
A disadvantage of LSDV is that estimating many dummy variable coefficients uses up degrees of freedom
Fixed effects models can estimate the effect of time-invariant variables like gender or race
The within transformation subtracts each group’s mean from each observation, eliminating the individual effect $u_i$

Answer

Correct Answer: (c)

(a) is true: Within estimator and LSDV are algebraically equivalent for slope coefficients. LSDV additionally produces estimates of the individual intercepts.
(b) is true: With $N$ individuals, LSDV adds $N - 1$ dummy variables $\implies$ large loss of degrees of freedom.
(c) is FALSE: The within transformation subtracts group means, which eliminates all time-invariant variables along with $u_i$. This is the fundamental limitation of FE.
(d) is true: $y_{it} - \bar{y}_i = \beta(x_{it} - \bar{x}_i) + (e_{it} - \bar{e}_i)$ — the individual effect $u_i$ drops out.

If you need coefficients on time-invariant variables, use RE (if exogenous) or Hausman-Taylor (if endogenous).

Reference: Textbook §15.3–15.4 (within estimator, LSDV); Prof. Notes Ch. 15, “§3 The Fixed Effects Estimator.” Original question synthesizing panel FE concepts.

Ch 16: Qualitative & Limited Dependent Variables

Question 13

We model college attendance ($\text{psechoice\_b} = 1$ if attends) using parcoll (parent graduated college) and faminc (family income in $1000s). From $N = 749$ observations:

LPM: $\hat{P} = 0.546 + 0.256 \cdot parcoll + 0.00134 \cdot faminc$

Logit: $\hat{z} = -0.070 + 1.515 \cdot parcoll + 0.0124 \cdot faminc$

What is the predicted probability for a student whose family earns $100,000 and no parent graduated from college ($parcoll = 0$)?

LPM: 67.0%; Logit: 76.3%
LPM: 68.0%; Logit: 76.3%
LPM: 80.2%; Logit: 82.4%
LPM: 67.0%; Logit: 12.4%

Answer

Correct Answer: (b)

LPM (probability is the linear prediction directly):

\[\hat{P} = 0.546 + 0.256(0) + 0.00134(100) = 0.546 + 0.134 = 0.680 = 68.0\%\]

Logit (apply the logistic CDF $\Lambda(z) = \frac{1}{1 + e^{-z}}$):

\[z = -0.070 + 1.515(0) + 0.0124(100) = -0.070 + 1.24 = 1.17\] \[\hat{P} = \frac{1}{1 + e^{-1.17}} = \frac{1}{1 + 0.310} = 0.763 = 76.3\%\]

In the LPM, the coefficient is the marginal effect. In the logit, you must transform through $\Lambda(\cdot)$ to get a probability.

Reference: Textbook §16.1–16.3 (LPM, logit); Prof. Notes Ch. 16, “§1–3 Binary Choice Models.” Adapted from Fall 2022 Q2 and Q4.

Question 14

We model whether someone buys an item ($Y = 1$) based on advertising exposure (in minutes). The logit model estimates are:

\[\log\left(\frac{P}{1-P}\right) = -0.8 + 0.15 \times Advertising\]

What is the marginal effect of advertising on the probability of buying at $Advertising = 30$ minutes?

0.0035
0.0033
0.0048
0.0042
None of the given answers are correct

Hint: You can compute this two ways: (1) $\Lambda'(z) \cdot \beta$ or (2) $P(31) - P(30)$.

Answer

Correct Answer: (a) or (b) — both accepted

Method 1: Analytical marginal effect $= \Lambda'(z) \cdot \beta$

$z = -0.8 + 0.15(30) = 3.7$

\[ME = \frac{e^{-3.7}}{(1 + e^{-3.7})^2} \times 0.15 = \frac{0.02472}{(1.02472)^2} \times 0.15 \approx 0.00353\]

Method 2: Discrete difference $P(31) - P(30)$

$P(30) = \frac{1}{1 + e^{-3.7}} = 0.97589$; $\quad P(31) = \frac{1}{1 + e^{-3.85}} = 0.97920$

\[P(31) - P(30) = 0.97920 - 0.97589 = 0.0033\]

Both methods are valid. The analytical derivative gives $\approx 0.0035$; the discrete change gives $\approx 0.0033$. The marginal effect is small because $P$ is already close to 1 (the logistic curve is flat at extreme values).

Reference: Textbook §16.3.1 (“Marginal Effects in the Logit Model”); Prof. Notes Ch. 16, “§3 Marginal Effects.” From Spring 2024 Q29.

Question 15

In the logistic regression model, the log odds of buying an item are given by:

\[\log\left(\frac{P}{1-P}\right) = 0.8 + 0.15 \times Advertising\]

What does a log odds value of 2 mean in terms of the probability of buying the item?

The probability of buying the item is 0.88
The probability of buying the item is 0.90
The probability of buying the item is 0.95
The probability of buying the item is 0.97
We cannot find the solution from the information given

Answer

Correct Answer: (a)

If $\log\left(\frac{P}{1-P}\right) = 2$, then:

\[\frac{P}{1-P} = e^2 \approx 7.389\]

Solve for $P$:

\[P = e^2 \cdot (1 - P) \implies P(1 + e^2) = e^2 \implies P = \frac{e^2}{1 + e^2} = \frac{7.389}{8.389} \approx 0.881\]

This is the same as applying $\Lambda(2) = \frac{1}{1 + e^{-2}} = \frac{e^2}{1 + e^2} = 0.88$.

Reference: Textbook §16.2 (“The Logistic Distribution,” logit link function); Prof. Notes Ch. 16, “§2 The Logistic Model.” From Spring 2024 Q30.

Question 16

For each scenario, which model is most appropriate?

Scenario A: UCLA surveys student satisfaction on a scale of 1–5 (Very Dissatisfied to Very Satisfied) and wants to model it based on facility usage and year of study.

Scenario B: A researcher studies the duration of unemployment (weeks). Some individuals find jobs during the study; others are still unemployed when it ends.

Ordered Logit; LPM
Multinomial Logit; Ordered Logit
Tobit; LPM
Ordered Logit; Tobit
None of the given combinations are sufficient

Answer

Correct Answer: (d)

Scenario A — Ordered Logit:

DV has a natural ranking (1 < 2 < 3 < 4 < 5)
Not multinomial: the categories have a meaningful order
Not LPM: the outcome is not binary

Scenario B — Tobit:

The DV (weeks unemployed) is continuous but censored — for those still unemployed, we observe a lower bound, not the true duration
Not OLS: censoring causes OLS estimates to be biased
Not logit: the outcome is not binary or categorical

Match the model to the data structure: ordered categories $\implies$ ordered logit; censored continuous data $\implies$ tobit.

Reference: Textbook §16.4 (Ordered Logit), §16.6 (Tobit); Prof. Notes Ch. 16, “§4–6 Extensions of Binary Choice.” Adapted from Spring 2024 Q33–34.

Question 17

Suppose we are given the following confusion matrix from a Logit model with $N = 10{,}000$:

	Predicted: Buy (1)	Predicted: Not Buy (0)
Actual: Buy (1)	4000	1000
Actual: Not Buy (0)	500	4500

If the accuracy of the Probit model is 0.80 and the accuracy of the LPM is 0.75, what is the accuracy of the Logit model and which model should be chosen?

0.50; Logit
0.75; Probit
0.80; Probit or Logit
0.85; Logit
0.90; Logit

Answer

Correct Answer: (d)

Accuracy = (correct predictions) / (total predictions):

\[\text{Accuracy} = \frac{TP + TN}{TP + FP + FN + TN} = \frac{4000 + 4500}{4000 + 1000 + 500 + 4500} = \frac{8500}{10000} = 0.85\]

Comparing all three models:

LPM: 0.75
Probit: 0.80
Logit: 0.85 $\leftarrow$ highest accuracy

Choose the Logit model based on accuracy.

Confusion matrix components:

$TP = 4000$ (correctly predicted buy), $TN = 4500$ (correctly predicted not buy)
$FP = 500$ (predicted buy but didn’t), $FN = 1000$ (predicted not buy but did)

Reference: Prof. Notes Ch. 17, “Model Evaluation: Confusion Matrix and Accuracy.” Original question combining confusion matrix with model comparison.

Ch 17: Regularization & Machine Learning

Question 18

Which of the following best defines bias in the context of machine learning models?

The error introduced in a model due to excessive sensitivity to small fluctuations in the training data
The variability of model predictions for a given data point or value, indicating the spread of model predictions
The error that occurs when a model is too simple to capture the underlying patterns in the data
The ability of a model to perform well on unseen data by balancing complexity and simplicity
The tendency of a model to memorize the training data rather than generalize from it

Answer

Correct Answer: (c)

The three components of expected prediction error:

\[E[(y - \hat{f}(x))^2] = \text{Bias}^2 + \text{Variance} + \text{Irreducible Error}\]

Bias (c): Error from simplifying assumptions. A model that is too simple misses real patterns $\implies$ underfitting.
Variance (a)/(b): Error from sensitivity to training data. A model that is too complex fits noise $\implies$ overfitting.
Irreducible error: Random noise in the data that no model can eliminate.
1. describes the goal of balancing bias and variance (generalization).
1. describes overfitting (high variance, low bias).

The bias-variance tradeoff: increasing model complexity decreases bias but increases variance. The optimal model minimizes total error.

Reference: Textbook §17.1 (“Bias-Variance Tradeoff”); Prof. Notes Ch. 17, “§1 The Bias-Variance Decomposition.” From Spring 2024 Q36.

Question 19

Which of the following statements correctly describe LASSO regression?

It reduces variance at the expense of higher bias
It reduces bias at the expense of higher variance
It uses a penalty term of the form $\lambda \sum |\beta_j|$
It uses a penalty term of the form $\lambda \sum \beta_j^2$

1. and (iii) only
1. and (iv) only
1. and (iv) only
1. and (iii) only
None of the given combinations correctly describe LASSO regression

Answer

Correct Answer: (a)

LASSO (Least Absolute Shrinkage and Selection Operator):

(i) Correct: Shrinking coefficients reduces variance but introduces bias (tradeoff)
(iii) Correct: LASSO penalty is $\lambda \sum |\beta_j|$ (L1 norm / absolute value)

LASSO vs. Ridge comparison:

	LASSO	Ridge
Penalty	$\lambda \sum \|\beta_j\|$ (L1)	$\lambda \sum \beta_j^2$ (L2)
Feature selection?	Yes (sets coefficients to exactly 0)	No (shrinks toward 0)
Best when	Many irrelevant predictors	Many small effects

Both LASSO and Ridge reduce variance at the cost of bias. The difference is that LASSO performs automatic feature selection.

Reference: Textbook §17.3 (“LASSO and Ridge Regression”); Prof. Notes Ch. 17, “§3 Regularization Methods.” Adapted from Spring 2024 Q38–39.

Question 20

For LASSO Regression, if the tuning parameter $\lambda = 0$, what does it mean?

The loss function is the same as the ordinary least squares loss function
The LASSO regression turns into a Ridge regression model
It shrinks the coefficients of less important features to exactly 0
The regularization term becomes infinitely large, eliminating all features
None of the given answers are true of LASSO if the tuning parameter $(\lambda) = 0$

Answer

Correct Answer: (a)

The LASSO loss function is:

\[\min_\beta \sum_{i=1}^n (y_i - x_i'\beta)^2 + \lambda \sum_{j=1}^p |\beta_j|\]

When $\lambda = 0$:

The penalty term $\lambda \sum |\beta_j| = 0 \cdot \sum |\beta_j| = 0$
The loss function becomes $\sum (y_i - x_i'\beta)^2$ $\implies$ ordinary least squares
No shrinkage occurs $\implies$ all coefficients are unrestricted

What happens as $\lambda$ changes:

$\lambda = 0$: OLS (no regularization)
Small $\lambda$: slight shrinkage, a few coefficients may hit zero
Large $\lambda$: heavy shrinkage, most/all coefficients forced to zero
$\lambda \to \infty$: all coefficients $= 0$ (intercept-only model)

Reference: Textbook §17.3 (“The Tuning Parameter $\lambda$”); Prof. Notes Ch. 17, “§3 Regularization Methods.” From Spring 2024 Q40.

Lag	\(r_k\)
1	0.32
2	\(-0.91\)
3	0.08
4	\(-0.01\)