9 Estimating Error Variance and Prediction

How Precise Are Our Estimates, and What Can We Predict?

Regression

Standard Errors

Prediction

Author

Jake Anderson

Published

March 21, 2026

Modified

March 26, 2026

Abstract

The variance formula for the OLS slope contains the unknown error variance sigma-squared. This chapter shows how to estimate it from the residuals (dividing by N minus 2 to correct for degrees of freedom), construct standard errors, make point predictions, and compute elasticities that convert the slope into percentage terms.

Prerequisites

You should know the OLS formulas and variance result from Chapter 6 and Chapter 7.

9.1 The Problem

We derived $\operatorname{Var}(b_2) = \sigma^2 / \sum(x_i - \bar{x})^2$ in Theorem 8.2, but $\sigma^2$ is unknown. We cannot compute the variance of $b_2$ (or construct confidence intervals, or test hypotheses) without it. We need to estimate $\sigma^2$ from the data.

9.2 Estimating $\sigma^2$

Recall $\sigma^2 = \operatorname{Var}(e_i) = E(e_i^2)$ (since $E(e_i) = 0$ by SR2). A natural estimator would be $\sum e_i^2 / N$, but we never observe the true errors $e_i$. We observe the residuals $\hat{e}_i = y_i - b_1 - b_2 x_i$, which are not the same thing.

Degrees of freedom: You lose one degree of freedom for each parameter estimated. In simple linear regression, you estimate $b_1$ and $b_2$, so $df = N - 2$.

Why not divide $\sum \hat{e}_i^2$ by $N$? Because OLS chooses $b_1$ and $b_2$ to minimize $\sum \hat{e}_i^2$. Any other coefficients, including the true $\beta_1$ and $\beta_2$, would produce a larger sum. The squared residuals underestimate the true squared errors on average, so dividing by $N$ gives an estimator biased downward.

The correction: divide by $N - 2$ instead of $N$. We estimated two parameters ($b_1$ and $b_2$), which “uses up” 2 degrees of freedom. OLS forces two constraints on the residuals ($\sum \hat{e}_i = 0$ and $\sum x_i \hat{e}_i = 0$), so only $N - 2$ residuals are free to vary.

Theorem 9.1 (Unbiased Estimator of $\sigma^2$) \[ \hat{\sigma}^2 = \frac{\sum_{i=1}^{N} \hat{e}_i^2}{N - 2} \tag{9.1}\]

This is unbiased: $E(\hat{\sigma}^2) = \sigma^2$.

The quantity $\hat{\sigma} = \sqrt{\hat{\sigma}^2}$ is the standard error of the regression, which estimates the typical deviation of actual food spending from the regression line.

For the food expenditure data: $\hat{\sigma}^2 = 304{,}505.2 / 38 = 8{,}013.29$, so $\hat{\sigma} = 89.52$. Typical deviation of actual food spending from the fitted line is about $89.52.

9.3 Standard Errors of $b_2$

Now replace $\sigma^2$ with $\hat{\sigma}^2$ in the variance formula:

\[ \widehat{\operatorname{Var}}(b_2) = \frac{\hat{\sigma}^2}{\sum(x_i - \bar{x})^2} \tag{9.2}\]

The standard error is the square root:

\[ \operatorname{se}(b_2) = \frac{\hat{\sigma}}{\sqrt{\sum(x_i - \bar{x})^2}} \tag{9.3}\]

Two pieces, each with a clear interpretation. The numerator $\hat{\sigma}$ captures how noisy the data are around the regression line; the denominator $\sqrt{\sum(x_i - \bar{x})^2}$ captures how spread out the $x$-values are. More noise increases the standard error; more variation in $x$ decreases it; larger $N$ increases $\sum(x_i - \bar{x})^2$ and therefore also decreases the standard error.

Reporting convention: $\hat{y} = \underset{(43.41)}{83.42} + \underset{(2.093)}{10.21}\, x$. Numbers in parentheses below the coefficients are standard errors.

For the food expenditure data: $\widehat{\operatorname{Var}}(b_2) = 8{,}013.29 / 1{,}828.79 = 4.382$, so $\operatorname{se}(b_2) = 2.093$. The regression is reported as:

\[ \hat{y} = \underset{(43.41)}{83.42} + \underset{(2.093)}{10.21}\, x \]

The numbers in parentheses below the coefficients are standard errors; this is standard notation in econometrics. The standard error tells us: if we repeated this study many times, the estimates of $b_2$ would have a standard deviation of approximately 2.09.

Interactive: standard error anatomy

Adjust the three factors that determine $\operatorname{se}(b_2)$ and watch how the standard error responds. The bar chart shows which factor is most responsible for the current value.

Show code

viewof se_sigma = Inputs.range([10, 200], {value: 90, step: 5, label: "σ̂ (noise level)"})
viewof se_xspread = Inputs.range([5, 100], {value: 43, step: 1, label: "√Σ(xᵢ−x̄)² (x-spread)"})
viewof se_N = Inputs.range([10, 500], {value: 40, step: 10, label: "N (sample size)"})

se_val = se_sigma / se_xspread

se_components = {
  // Show impact of each factor by computing se at baseline and varying one at a time
  const baseline_sigma = 90;
  const baseline_xspread = 43;
  const se_current = se_sigma / se_xspread;
  // Contribution bars: show each factor's magnitude
  return [
    {factor: `σ̂ = ${se_sigma}`, value: se_sigma / 100, desc: "Noise (numerator)"},
    {factor: `1/√Σ(x−x̄)² = 1/${se_xspread}`, value: 1 / se_xspread * 10, desc: "x-spread (denominator)"},
    {factor: `se(b₂) = ${se_current.toFixed(3)}`, value: se_current, desc: "Result"}
  ];
}

Plot.plot({
  width: 640,
  height: 280,
  x: {label: ""},
  y: {label: "Value", domain: [0, Math.max(3, se_val + 1)]},
  marks: [
    Plot.barY(
      [
        {label: `σ̂ / 100 = ${(se_sigma/100).toFixed(2)}`, value: se_sigma / 100, color: "#C41E3A"},
        {label: `10 / √Σ(x−x̄)² = ${(10/se_xspread).toFixed(3)}`, value: 10 / se_xspread, color: "#1E5A96"},
        {label: `se(b₂) = ${se_val.toFixed(3)}`, value: se_val, color: "#2E8B57"}
      ],
      {x: "label", y: "value", fill: "color"}
    ),
    Plot.ruleY([0])
  ],
  color: {legend: false},
  caption: `se(b₂) = σ̂ / √Σ(xᵢ−x̄)² = ${se_sigma} / ${se_xspread} = ${se_val.toFixed(3)}. Increase N to grow Σ(xᵢ−x̄)² and shrink the standard error.`
})

(a)

(b)

(c)

(d)

(e)

(f)

Figure 9.1: Standard error anatomy: three factors control the precision of b₂. Adjust each to see its impact.

In practice: You cannot control $\sigma$ (it is a property of the population). You can control $N$ and the range of $x$-values you sample. Larger $N$ and wider $x$-spread both improve precision.

flowchart LR
    A["σ² (error variance)<br/>Numerator"] --> D["se(b₂) = σ̂ / √Σ(xᵢ−x̄)²"]
    B["Σ(xᵢ−x̄)²<br/>(x-spread)"] --> D
    C["N (sample size)"] --> B
    D --> E["Confidence intervals<br/>Hypothesis tests"]

    style A fill:#C41E3A,color:#fff
    style B fill:#1E5A96,color:#fff
    style C fill:#1E5A96,color:#fff
    style D fill:#D4A84B,color:#fff
    style E fill:#2E8B57,color:#fff

Figure 9.2: How the three components flow into the standard error formula.

9.4 Point Prediction

A household earns $2,000 per week ($x_0 = 20$). The point prediction is $\hat{y}_0 = b_1 + b_2 x_0 = 83.42 + 10.21 \times 20 = 287.62$. We predict this household spends approximately $287.62 per week on food.

We estimated $b_1$ and $b_2$ from a sample of 40 households, and a different sample would give a different line. This source of prediction error shrinks with more data.

Even if we knew the true line perfectly, any individual household’s spending differs from $E(y \mid x)$ by the error $e_i$. This source of prediction error does not shrink with more data.

Prediction is always less precise than estimation

Predicting a single household’s spending involves both estimation uncertainty and inherent variability. Estimating the average spending at a given income only involves estimation uncertainty. $\implies$ Prediction intervals are always wider than confidence intervals.

9.5 Elasticity at the Means

The slope $b_2 = 10.21$ is in dollar terms. Economists often prefer percentage terms: the elasticity.

\[ \hat{\varepsilon} = b_2 \cdot \frac{\bar{x}}{\bar{y}} = 10.21 \times \frac{19.60}{283.57} = 0.71 \tag{9.4}\]

A 1% increase in income leads to approximately a 0.71% increase in food expenditure (at the mean income level). Since $\hat{\varepsilon} < 1$, food is a necessity: spending grows, but slower than income. In a linear model, elasticity varies along the regression line (it depends on where you evaluate it), so always specify the evaluation point.

Elasticity is not the slope

A common mistake: reporting $b_2 = 10.21$ as “the elasticity.” The slope is constant in a linear model ($10.21 more food per $100 more income); the elasticity is not (it changes as $x$ and $E(y \mid x)$ change). Always specify where you evaluate the elasticity (“at the means,” “at $x = 25$,” etc.).

Connection: In a log-log model ($\ln y = \beta_1 + \beta_2 \ln x + e$), the slope is the elasticity, constant along the line. This is one reason economists use log transformations.

9.6 Practice

A researcher reports $\hat{y} = \underset{(12.5)}{50.0} + \underset{(0.8)}{3.2}\, x$ with $N = 100$. (a) How many standard errors is $b_2$ from zero? (b) Without formal testing, does the slope appear statistically significant?

Show Solution

$t = 3.2 / 0.8 = 4.0$; the estimate is 4 standard errors from zero. (b) With $N - 2 = 98$ degrees of freedom, the 5% critical value for a two-sided test is approximately 1.98. Since $4.0 \gg 1.98$, the slope is highly significant. We will formalize this in Chapter 10.

Slides

Download handout slides (PDF)

Download presentation slides with transitions (PDF)

9 Estimating Error Variance and Prediction

9.1 The Problem

9.2 Estimating \(\sigma^2\)

9.3 Standard Errors of \(b_2\)

Interactive: standard error anatomy

9.4 Point Prediction

9.5 Elasticity at the Means

9.6 Practice

Slides