14 The Multiple Regression Model

What Happens When One Regressor Isn’t Enough

Multiple Regression

OVB

Goodness of Fit

Author

Jake Anderson

Published

March 21, 2026

Modified

March 26, 2026

Abstract

Simple regression cannot separate the effect of education from the confounding influence of experience. Multiple regression solves this by including several explanatory variables, yielding partial effects that hold other variables constant. This chapter introduces the model, derives the omitted variable bias formula, and explains when to use adjusted \(R^2\).

14.1 Motivation: The Missing Variable Problem

A simple regression of \(\log(\text{wage})\) on education gives a slope of 0.076: each year of education is associated with about a 7.6% wage increase. But more educated workers tend to have less labor market experience (they spent those years in school), and experience independently raises wages. The simple regression estimate mixes the true return to education with the confounding influence of experience.

The simple regression coefficient on education is 0.076. After adding experience, it rises to 0.092. The simple estimate was biased downward because education and experience are negatively correlated.

Imagine two workers with the same experience but different education levels. The wage gap between them is purely due to education. That is what we want \(\beta_2\) to measure: the effect of education holding experience constant. Simple regression cannot hold anything constant because it has only one regressor.

\(\implies\) We need a model with multiple regressors.

14.2 The Multiple Regression Model

Definition 14.1 (Multiple Regression Model) The general model with \(K\) parameters (\(K - 1\) regressors plus an intercept):

\[ y_i = \beta_1 + \beta_2 x_{i2} + \beta_3 x_{i3} + \cdots + \beta_K x_{iK} + e_i \tag{14.1}\]

OLS still minimizes \(\sum \hat{e}_i^2\), but now the “normal equations” form a system of \(K\) equations in \(K\) unknowns. Software handles the algebra; what changes for us is the interpretation and the degrees of freedom (\(N - K\) instead of \(N - 2\)).

Each coefficient \(\beta_k\) is a partial effect: the change in \(E(y)\) when \(x_k\) increases by one unit, holding all other \(x\)’s constant. In simple regression, \(\beta_2\) captures the total association between \(x\) and \(y\). In multiple regression, it isolates the partial contribution of \(x_k\) after accounting for the other regressors. When interpreting a coefficient, always include the qualifier “holding \(x_3, x_4, \ldots\) constant.”

Partial effect: the change in \(E(y)\) per unit change in \(x_k\), holding all other regressors fixed. This is the defining feature of multiple regression.

14.3 Omitted Variable Bias

Why does the education coefficient change from 0.076 (simple regression) to 0.092 (multiple regression) when we add experience? The simple regression omits a relevant variable that is correlated with the included regressor. The Omitted Variable Bias (OVB) formula makes this precise.

If the true model is \(y = \beta_1 + \beta_2 x_2 + \beta_3 x_3 + e\) but we estimate the short regression \(y = \gamma_1 + \gamma_2 x_2 + u\), then:

Theorem 14.1 (Omitted Variable Bias Formula) \[ \hat{\gamma}_2 = \beta_2 + \beta_3 \times \delta_1 \tag{14.2}\]

where \(\delta_1\) is the slope from regressing \(x_3\) on \(x_2\). The bias is \(\beta_3 \times \delta_1\).

The bias vanishes only when \(\beta_3 = 0\) (the omitted variable does not affect \(y\)) or \(\delta_1 = 0\) (the omitted variable is uncorrelated with the included regressor).

For the wage equation: \(\beta_{\text{exper}} > 0\) (experience raises wages) and \(\delta_1 < 0\) (education and experience are negatively correlated). The product is negative, so the SLR estimate understates the true return to education: \(0.076 < 0.092\).

flowchart LR
    X["x<br/>(Education)"] -->|"β₂ (partial effect)"| Y["y<br/>(Wage)"]
    Z["z (omitted)<br/>(Experience)"] -->|"β₃"| Y
    Z -->|"δ₁ = Cov(x,z)/Var(x)"| X

    style X fill:#1E5A96,color:#fff
    style Y fill:#2E8B57,color:#fff
    style Z fill:#C41E3A,color:#fff

Figure 14.1: How omitted variable bias works. The omitted variable z affects y (through β₃) and is correlated with x (through δ₁). Both links must be nonzero for bias to exist.

OVB does not vanish with more data

The bias formula holds in the population, not just in small samples. As \(N \to \infty\), the estimator converges to the wrong value. The model is wrong, and no amount of data corrects a wrong model.

Interactive: OVB Simulator

Toggle “Include experience” to see how omitting a correlated variable biases the education coefficient. Adjust the correlation between education and the omitted variable to see how bias depends on \(\delta_1\).

Show code

viewof includeExper = Inputs.toggle({label: "Include experience", value: false})
viewof corrEdOmit = Inputs.range([-0.9, 0.9], {value: -0.5, step: 0.05, label: "Corr(educ, exper)"})

ovb_sim = {
  const N = 500;
  const rng = d3.randomLcg(77);
  const rnorm = d3.randomNormal.source(rng)(0, 1);

  const beta1 = 5, beta2_true = 0.9, beta3 = 0.3;
  const rho = corrEdOmit;

  // Generate correlated educ and exper
  const educ = Array.from({length: N}, () => 8 + rnorm() * 3);
  const exper = educ.map(e => rho * (e - 14)/3 * 5 + Math.sqrt(1 - rho*rho) * rnorm() * 5 + 15);
  const wage = educ.map((e, i) => beta1 + beta2_true * e + beta3 * exper[i] + rnorm() * 3);

  // OLS helper
  function ols2(xs, ys) {
    const xbar = d3.mean(xs), ybar = d3.mean(ys);
    const Sxx = d3.sum(xs.map(x => (x-xbar)**2));
    const Sxy = d3.sum(xs.map((x,j) => (x-xbar)*(ys[j]-ybar)));
    return {slope: Sxy/Sxx, intercept: ybar - (Sxy/Sxx)*xbar};
  }

  // Simple regression (omit exper)
  const simple = ols2(educ, wage);

  // Multiple regression (include exper) using partial regression
  // Regress wage on exper, get residuals
  const mExper = ols2(exper, wage);
  const wageResid = wage.map((y,i) => y - (mExper.intercept + mExper.slope * exper[i]));
  // Regress educ on exper, get residuals
  const mEdExp = ols2(exper, educ);
  const educResid = educ.map((e,i) => e - (mEdExp.intercept + mEdExp.slope * exper[i]));
  // Regress residuals
  const fwl = ols2(educResid, wageResid);

  const beta2_hat = includeExper ? fwl.slope : simple.slope;
  const bias = simple.slope - beta2_true;

  // Auxiliary regression for delta_1
  const auxReg = ols2(educ, exper);
  const delta1 = auxReg.slope;
  const predictedBias = beta3 * delta1;

  return {
    beta2_hat,
    beta2_true,
    beta2_simple: simple.slope,
    beta2_mr: fwl.slope,
    bias,
    predictedBias,
    delta1,
    includeExper
  };
}

html`<div style="display:flex; gap:2em; flex-wrap:wrap; align-items:start">
<div style="flex:1; min-width:250px">
  <h4>Coefficient on Education</h4>
  ${Plot.plot({
    width: 350, height: 200,
    x: {label: "β̂₂", domain: [0, 1.8]},
    y: {label: "", domain: [0, 2], ticks: []},
    marks: [
      Plot.ruleX([ovb_sim.beta2_true], {stroke: "#2E8B57", strokeWidth: 2, strokeDasharray: "6,4"}),
      Plot.ruleX([ovb_sim.beta2_hat], {stroke: "#1E5A96", strokeWidth: 3}),
      Plot.text([{x: ovb_sim.beta2_true, y: 1.7}], {x: "x", y: "y", text: d => `True: ${ovb_sim.beta2_true}`, fill: "#2E8B57", fontSize: 12}),
      Plot.text([{x: ovb_sim.beta2_hat, y: 1.3}], {x: "x", y: "y", text: d => `Est: ${ovb_sim.beta2_hat.toFixed(3)}`, fill: "#1E5A96", fontSize: 12})
    ]
  })}
</div>
<div style="flex:1; min-width:250px; padding:1em; background:#f8f8f8; border-radius:8px">
  <strong>True β₂:</strong> ${ovb_sim.beta2_true.toFixed(3)}<br/>
  <strong>Simple regression β̂₂:</strong> ${ovb_sim.beta2_simple.toFixed(3)}<br/>
  <strong>Multiple regression β̂₂:</strong> ${ovb_sim.beta2_mr.toFixed(3)}<br/>
  <strong>Bias (simple):</strong> ${ovb_sim.bias.toFixed(3)}<br/>
  <strong>δ₁ (aux. regression slope):</strong> ${ovb_sim.delta1.toFixed(3)}<br/>
  <strong>β₃ × δ₁ (predicted bias):</strong> ${ovb_sim.predictedBias.toFixed(3)}<br/>
  <strong>Model:</strong> ${ovb_sim.includeExper ? "Multiple regression (experience included)" : "Simple regression (experience omitted)"}
</div>
</div>`

(a)

(b)

(c)

(d)

Figure 14.2: Omitted variable bias simulator. When experience is omitted and correlated with education, the education coefficient absorbs part of the experience effect.

Try setting the correlation to zero. With \(\delta_1 = 0\), the omitted variable is uncorrelated with education, so OVB disappears even though the omitted variable affects wages.

14.4 \(R^2\) vs. Adjusted \(R^2\)

The coefficient of determination has the same formula as in simple regression: \(R^2 = 1 - SSE/SST\). But adding any variable to a regression can never decrease \(R^2\), even if the variable is completely irrelevant. OLS has more flexibility with an extra regressor, so \(SSE\) can only stay the same or shrink.

Definition 14.2 (Adjusted \(R^2\)) The adjusted \(R^2\) penalizes for additional regressors:

\[ \bar{R}^2 = 1 - \frac{SSE/(N-K)}{SST/(N-1)} = 1 - \frac{N-1}{N-K}(1 - R^2) \tag{14.3}\]

Since \((N-1)/(N-K) > 1\) when \(K > 1\), we always have \(\bar{R}^2 \leq R^2\). Adding a useful variable causes \(SSE\) to drop enough to offset the penalty, so \(\bar{R}^2\) increases. Adding a useless variable barely changes \(SSE\), but the penalty grows, so \(\bar{R}^2\) decreases.

\(\implies\) Use \(\bar{R}^2\) when comparing models with different numbers of regressors.

When to use which measure

Use \(R^2\) to describe fit of a single model. Use \(\bar{R}^2\) when comparing models with different numbers of regressors. Use the generalized \(R^2 = [\text{Corr}(y, \hat{y})]^2\) when comparing models with different dependent variables (e.g., \(y\) vs. \(\ln y\)).

14.5 Practice

A researcher estimates \(\text{crime}_i = \beta_1 + \beta_2 \text{police}_i + e_i\) and finds \(\hat{\beta}_2 > 0\): more police is associated with more crime. A colleague argues this is because the researcher omitted population density, which increases both crime and police presence. Sign the omitted variable bias.

Show Solution

Let the omitted variable be population density. Then: \(\beta_3 > 0\) (higher density increases crime) and \(\delta_1 > 0\) (denser areas hire more police). The bias is \((+)(+) = (+)\), upward. The simple regression overestimates the effect of police on crime. After controlling for density, the coefficient on police should be smaller (and potentially negative, consistent with a deterrent effect).

Slides

Download handout slides (PDF)

Download presentation slides with transitions (PDF)