Every multiple regression coefficient is a partial effect. This chapter explains what “holding constant” means computationally via the Frisch-Waugh-Lovell theorem, walks through assumptions MR1 through MR6, and distinguishes between perfect multicollinearity (a showstopper) and near multicollinearity (a nuisance).
15.1 Ceteris Paribus: Partial Effects
In the model \(E(y \mid x_2, \ldots, x_K) = \beta_1 + \beta_2 x_2 + \cdots + \beta_K x_K\), each slope coefficient is a partial derivative:
The change in \(E(y)\) when \(x_k\) increases by one unit, holding all other explanatory variables fixed.
Compare this to simple regression, where the slope captures the total association. In multiple regression, the qualifier “holding \(x_3, x_4, \ldots\) constant” is essential; without it, the interpretation is incomplete.
Total vs. partial effect: In SLR, \(\beta_2\) captures everything that moves with \(x\). In MR, \(\beta_k\) captures only the part of \(x_k\)’s association with \(y\) that is independent of the other regressors.
If you omit a variable from the model, you cannot claim to hold it constant. The omitted variable’s influence leaks into the included coefficients via omitted variable bias.
15.2 The Frisch-Waugh-Lovell Theorem
How does OLS actually “hold other variables constant”? The Frisch-Waugh-Lovell (FWL) theorem provides a precise answer. To isolate \(b_3\) (the advertising coefficient) in a model with both price and advertising, FWL says: (1) regress SALES on PRICE and save the residuals \(\widetilde{\text{SALES}}\); (2) regress ADVERT on PRICE and save the residuals \(\widetilde{\text{ADVERT}}\); (3) regress \(\widetilde{\text{SALES}}\) on \(\widetilde{\text{ADVERT}}\). The slope from step 3 is exactly \(b_3\) from the full regression.
FWL in one sentence: OLS “holds PRICE constant” by stripping out what PRICE explains from both sides, then measuring the leftover association.
The residuals from steps 1 and 2 represent the parts of SALES and ADVERT that cannot be predicted by PRICE. Step 3 then asks: once we strip out everything that PRICE explains about both variables, does the leftover variation in advertising still predict the leftover variation in sales?
WarningFWL gives correct coefficients but wrong standard errors
The partialled-out regression does not account for parameters estimated in the earlier steps, so its standard errors are too small. Always use the full model for inference.
Interactive: Ceteris Paribus Visualizer
This visualizer shows wage data colored by experience level. Use the slider to fix experience at different values and see how the education-wage relationship looks when experience is held constant. Observations near the chosen experience level are highlighted; others fade out.
Figure 15.1: Ceteris paribus visualizer. Fixing experience at different levels reveals the partial effect of education on wages. Points near the chosen experience level are highlighted.
As you move the experience slider, the highlighted subset changes and so does the fitted line. The slope of education should remain roughly stable if the model is correctly specified; this is the partial effect.
15.3 Assumptions MR1 through MR6
The assumptions generalize directly from simple regression. Five carry over unchanged; only one is genuinely new.
MR1 fails: Wrong functional form (e.g., linear when the true relationship is quadratic) or omitted relevant variables. Residual plots show systematic patterns.
MR2 fails: Endogeneity. Coefficients are biased; no fix within OLS.
MR3 fails: Heteroskedasticity. Coefficients are unbiased, but standard errors and \(t\)-tests are wrong.
MR4 fails: Autocorrelation. Same consequences as MR3.
MR5 fails: Perfect multicollinearity. Software throws an error or drops a variable.
MR6 fails: Non-normal errors. In large samples (\(N > 30\)), the CLT covers you. In small samples, exact \(t\)- and \(F\)-distributions are not valid.
Under MR1 through MR5, the Gauss-Markov theorem guarantees OLS is Best Linear Unbiased (BLUE). Adding MR6 gives exact \(t\)- and \(F\)-distributions in finite samples. Without MR6, the Central Limit Theorem still provides approximate normality in large samples.
flowchart TD
A["MR1: Correct specification"] --> B["MR2: Exogeneity"]
B --> C["MR1-MR2: OLS is unbiased"]
C --> D["MR3: Homoskedasticity<br/>MR4: No autocorrelation<br/>MR5: No perfect collinearity"]
D --> E["MR1-MR5: OLS is BLUE<br/>(Gauss-Markov)"]
E --> F["MR6: Normal errors"]
F --> G["MR1-MR6: Exact t and F<br/>distributions"]
style C fill:#2E8B57,color:#fff
style E fill:#1E5A96,color:#fff
style G fill:#D4A84B,color:#fff
Figure 15.2: Hierarchy of MR assumptions. Each level builds on the previous one, adding stronger guarantees.
15.4 Perfect vs. Near Multicollinearity
MR5 requires that no regressor is an exact linear function of the others. Classic violations include including both \(\text{age}\) and \(\text{birth\_year}\) (since one is a deterministic function of the other), including all \(g\) category dummies plus an intercept (the dummy variable trap), or including budget shares that sum to 1. When MR5 fails, OLS cannot be computed: the normal equations have no unique solution.
WarningPerfect vs. near collinearity: different problems entirely
Perfect collinearity violates MR5 and is a showstopper; OLS cannot run. Near collinearity is a nuisance; OLS runs and remains BLUE, but standard errors inflate. Do not confuse the two.
Near multicollinearity is different. Regressors are highly correlated but not perfectly so. MR5 is not violated, OLS is still BLUE, and the coefficients are still unbiased. The problem is purely about precision. The variance formula for \(b_2\) in a three-variable model is:
As \(|r_{23}| \to 1\), the factor \((1 - r_{23}^2)\) shrinks toward zero and the variance explodes. At \(r_{23} = 0.9\), the variance is 5.3 times what it would be with uncorrelated regressors. At \(r_{23} = 0.99\), it is 50 times larger.
Collinearity is a data problem, not a model problem. The model is fine; the data do not contain enough independent variation to pin down individual coefficients precisely.
\(\implies\) Near collinearity does not bias OLS, but it inflates standard errors and makes individual coefficients hard to pin down. We cover diagnosis (via the Variance Inflation Factor) and remedies in Model Specification.
15.5 Practice
A researcher estimates \(\text{wage}_i = \beta_1 + \beta_2 \text{educ}_i + \beta_3 \text{exper}_i + \beta_4 \text{female}_i + e_i\) using cross-sectional data. Which MR assumption is most likely violated, and why?
TipShow Solution
MR2 (strict exogeneity) is the hardest to defend. The error term \(e_i\) contains unobserved factors like ability and motivation. If ability is correlated with education (more able people get more schooling), then \(\text{Cov}(\text{educ}_i, e_i) \neq 0\), and the education coefficient absorbs part of the ability effect. This is omitted variable bias. MR3 might also be suspect (wage variability could differ by education level), but MR2 is the most consequential violation because it causes bias, not just inefficiency.