10  Confidence Intervals

From Point Estimates to Ranges of Plausible Values

Inference
Confidence Intervals
t-distribution
Author

Jake Anderson

Published

March 21, 2026

Modified

March 26, 2026

Abstract

A point estimate without a confidence interval is incomplete. This chapter derives the confidence interval for regression coefficients by showing how replacing the unknown sigma with its estimate changes the standard normal to a t-distribution, constructs the interval formula, clarifies the repeated sampling interpretation, and extends the method to linear combinations of parameters.

NotePrerequisites

You should know the standard error formulas from Chapter 8 and the \(t\)-distribution preview from Chapter 4.

10.1 The Precision Problem

From the food expenditure regression: \(\hat{y} = \underset{(43.41)}{83.42} + \underset{(2.09)}{10.21}\, \text{income}\). We know \(b_2 = 10.21\) with \(\operatorname{se}(b_2) = 2.09\). But what does that tell us? Could the true return to income be $5? (That is \(5.21 / 2.09 = 2.49\) standard errors away.) Could it be $0? (That is \(10.21 / 2.09 = 4.89\) standard errors away.) A point estimate and a standard error are the ingredients; a confidence interval is the recipe that turns them into a range of plausible values for \(\beta_2\).

Why not just report \(b_2\)? Because a single number without context is meaningless. Is \(b_2 = 10.21\) precise? Noisy? Consistent with zero? The standard error and CI answer these questions.

10.2 From Normal to \(t\)

Under assumptions SR1 through SR6 (including normality of errors), the OLS slope is normally distributed: \(b_2 \sim N(\beta_2, \sigma^2 / \sum(x_i - \bar{x})^2)\). Standardizing:

\[ Z = \frac{b_2 - \beta_2}{\sqrt{\sigma^2 / \sum(x_i - \bar{x})^2}} \sim N(0, 1) \tag{10.1}\]

If we knew \(\sigma^2\), we could use \(P(-1.96 \le Z \le 1.96) = 0.95\) and rearrange to get an interval for \(\beta_2\). But we do not know \(\sigma^2\).

Replacing \(\sigma\) with \(\hat{\sigma}\) in the denominator introduces additional estimation uncertainty. The resulting statistic follows a \(t\)-distribution rather than the standard normal:

Theorem 10.1 (The \(t\)-statistic for OLS Coefficients) \[ t = \frac{b_2 - \beta_2}{\operatorname{se}(b_2)} \sim t_{(N-2)} \tag{10.2}\]

The degrees of freedom are \(N - 2\) because we used \(N\) residuals to estimate \(\hat{\sigma}^2\) but spent 2 degrees of freedom estimating \(b_1\) and \(b_2\).

The \(t\)-distribution has heavier tails than \(N(0,1)\), reflecting our uncertainty about \(\sigma\). As \(N \to \infty\), \(\hat{\sigma}^2 \to \sigma^2\) and \(t_{(N-2)} \to N(0,1)\).

\(\implies\) With small samples, we need wider intervals to compensate for estimating \(\sigma\).

\(t\) vs \(Z\): For \(N > 100\), the difference is negligible (\(t_{(98)} \approx 1.984\) vs \(z = 1.960\) for 95% CI). For small samples, \(t\) critical values are noticeably larger.

10.3 The Confidence Interval Formula

Start with \(P(-t_c \le t \le t_c) = 1 - \alpha\) where \(t_c = t_{(1-\alpha/2, \; N-2)}\) is the critical value from the \(t\)-table. Multiply through by \(\operatorname{se}(b_k)\) and rearrange:

Theorem 10.2 (Confidence Interval for \(\beta_k\)) \[ \text{CI for } \beta_k: \qquad b_k \pm t_c \cdot \operatorname{se}(b_k) \tag{10.3}\]

where \(b_k\) is the OLS point estimate (center), \(\operatorname{se}(b_k)\) is the standard error (measures precision), and \(t_c = t_{(1-\alpha/2, \; N-2)}\) is the critical value.

Start from the probability statement: \[P(-t_c \le t \le t_c) = 1 - \alpha\]

Substitute \(t = (b_k - \beta_k) / \operatorname{se}(b_k)\): \[P\left(-t_c \le \frac{b_k - \beta_k}{\operatorname{se}(b_k)} \le t_c\right) = 1 - \alpha\]

Multiply all three parts by \(\operatorname{se}(b_k)\): \[P\left(-t_c \cdot \operatorname{se}(b_k) \le b_k - \beta_k \le t_c \cdot \operatorname{se}(b_k)\right) = 1 - \alpha\]

Subtract \(b_k\) and multiply by \(-1\) (flipping the inequalities): \[P\left(b_k - t_c \cdot \operatorname{se}(b_k) \le \beta_k \le b_k + t_c \cdot \operatorname{se}(b_k)\right) = 1 - \alpha\]

This gives the interval \(b_k \pm t_c \cdot \operatorname{se}(b_k)\). \(\square\)

10.4 Interpretation: The Repeated Sampling View

WarningWhat “95% confidence” does and does not mean

“95% confidence” does not mean “there is a 95% probability that \(\beta_2\) is in the interval.” The parameter \(\beta_2\) is a fixed, unknown constant; it is either inside the interval or it is not. What is random is the interval itself: \(b_2\) changes from sample to sample (so the center moves), and \(\operatorname{se}(b_2)\) changes from sample to sample (so the width changes).

The correct interpretation: if we drew many samples and built a 95% CI from each one, approximately 95% of those intervals would contain the true \(\beta_2\). Our confidence is in the procedure, not in any single interval.

Interactive: CI coverage simulator

This widget draws 50 confidence intervals from repeated samples. Each interval either captures the true \(\beta_2 = 10\) (green) or misses it (red). Adjust the confidence level to see coverage change.

Show code
viewof ci_level = Inputs.range([0.80, 0.99], {value: 0.95, step: 0.01, label: "Confidence level"})

ci_true_beta = 10

ci_data = {
  const rng = d3.randomLcg(2026);
  const normal_gen = d3.randomNormal.source(rng)(0, 80);
  const n = 40;
  const nci = 50;
  const intervals = [];

  // t critical value approximation (for df=38, close enough for visualization)
  const alpha = 1 - ci_level;
  // rough approximation of t critical value
  const z_approx = -Math.log(2 * Math.min(alpha/2, 1 - alpha/2));
  // better: use normal approximation adjusted for small df
  const df = n - 2;
  // Simple Cornish-Fisher approximation
  const z = (() => {
    const p = 1 - alpha/2;
    const a = 0.3989422804;
    // inverse normal via rational approximation
    const t_val = Math.sqrt(-2 * Math.log(1 - p));
    const c0 = 2.515517, c1 = 0.802853, c2 = 0.010328;
    const d1 = 1.432788, d2 = 0.189269, d3 = 0.001308;
    return t_val - (c0 + c1*t_val + c2*t_val*t_val) / (1 + d1*t_val + d2*t_val*t_val + d3*t_val*t_val*t_val);
  })();
  // Adjust for t distribution (add small correction for finite df)
  const tc = z * (1 + 1/(4*df));

  for (let r = 0; r < nci; r++) {
    const data = [];
    for (let i = 0; i < n; i++) {
      const x = 5 + 30 * rng();
      const y = 83 + ci_true_beta * x + normal_gen();
      data.push({x, y});
    }
    const xbar = d3.mean(data, d => d.x);
    const ybar = d3.mean(data, d => d.y);
    const num = d3.sum(data, d => (d.x - xbar) * (d.y - ybar));
    const den = d3.sum(data, d => (d.x - xbar) ** 2);
    const b2 = num / den;
    const sse = d3.sum(data, d => {
      const yhat = (ybar - b2 * xbar) + b2 * d.x;
      return (d.y - yhat) ** 2;
    });
    const sigma_hat = Math.sqrt(sse / (n - 2));
    const se = sigma_hat / Math.sqrt(den);
    const lo = b2 - tc * se;
    const hi = b2 + tc * se;
    const covers = (lo <= ci_true_beta && ci_true_beta <= hi);
    intervals.push({sample: r + 1, b2, lo, hi, covers});
  }
  return intervals;
}

ci_coverage_rate = (d3.sum(ci_data, d => d.covers) / ci_data.length * 100).toFixed(0)

Plot.plot({
  width: 640,
  height: 500,
  x: {label: "β₂", domain: [ci_true_beta - 12, ci_true_beta + 12]},
  y: {label: "Sample #", domain: [0, 51], reverse: true},
  marks: [
    Plot.ruleX([ci_true_beta], {stroke: "#C41E3A", strokeWidth: 2, strokeDasharray: "6 3"}),
    Plot.link(ci_data, {
      x1: "lo", x2: "hi", y1: "sample", y2: "sample",
      stroke: d => d.covers ? "#2E8B57" : "#C41E3A",
      strokeWidth: 2
    }),
    Plot.dot(ci_data, {x: "b2", y: "sample", fill: d => d.covers ? "#2E8B57" : "#C41E3A", r: 3}),
    Plot.text([`Coverage: ${ci_coverage_rate}% (nominal: ${(ci_level*100).toFixed(0)}%)`],
      {x: ci_true_beta, y: 51, textAnchor: "middle", fill: "#1E5A96", fontWeight: "bold", fontSize: 13})
  ],
  caption: `50 confidence intervals at the ${(ci_level*100).toFixed(0)}% level. Red dashed line = true β₂ = ${ci_true_beta}.`
})
(a)
(b)
(c)
(d)
(e)
Figure 10.1: CI coverage simulator. Each horizontal bar is a confidence interval from one sample. Green = captures β₂; red = misses. The coverage rate tracks the nominal level.

Try it: Lower the confidence level from 95% to 80%. More intervals miss the true value; coverage drops from about 95% to about 80%. This is the tradeoff: narrower intervals give less coverage.

10.5 Food Expenditure Example

With \(N = 40\), \(df = 38\), \(b_2 = 10.21\), \(\operatorname{se}(b_2) = 2.09\), and \(t_c = t_{(0.975, 38)} = 2.024\):

\[ 10.21 \pm 2.024 \times 2.09 = 10.21 \pm 4.23 = [5.98, \; 14.44] \]

We estimate with 95% confidence that an additional $100 of weekly income increases food expenditure by between $5.98 and $14.44. There is a tradeoff: more confidence requires a wider interval. At 90% the interval narrows to \([6.69, 13.73]\); at 99% it widens to \([4.54, 15.88]\).

flowchart LR
    A["Point estimate b₂"] --> D["CI = b₂ ± tс · se(b₂)"]
    B["Standard error se(b₂)"] --> D
    C["Critical value tс<br/>from t(N−2)"] --> D
    D --> E["Interpretation:<br/>procedure captures β₂<br/>in (1−α)% of samples"]

    style A fill:#1E5A96,color:#fff
    style B fill:#1E5A96,color:#fff
    style C fill:#1E5A96,color:#fff
    style D fill:#D4A84B,color:#fff
    style E fill:#2E8B57,color:#fff
Figure 10.2: Constructing a confidence interval: three ingredients combine into a range of plausible values.

10.6 Confidence Intervals for Linear Combinations

What if we want a CI for expected food expenditure at a specific income, say \(x_0 = 20\)? This is \(\lambda = \beta_1 + 20\beta_2\), a linear combination of both parameters. The point estimate is \(\hat{\lambda} = b_1 + 20 b_2 = 287.61\), but its variance requires the full covariance matrix:

\[ \operatorname{Var}(\hat{\lambda}) = c_1^2 \operatorname{Var}(b_1) + c_2^2 \operatorname{Var}(b_2) + 2c_1 c_2 \operatorname{Cov}(b_1, b_2) \tag{10.4}\]

WarningDo not forget the covariance term

The estimators \(b_1\) and \(b_2\) are correlated (both come from the same data), so you cannot just add the individual variances. This is the same principle as Theorem 16.1 from Chapter 3.

From the food expenditure data, \(\operatorname{se}(\hat{\lambda}) = 14.18\), and the 95% CI is \(287.61 \pm 2.024 \times 14.18 = [258.91, \; 316.31]\). We estimate with 95% confidence that a household earning $2,000 per week spends between $258.91 and $316.31 on food.

Connection to Chapter 10: If the CI for \(\beta_2\) excludes a value \(c\), then the hypothesis test \(H_0: \beta_2 = c\) rejects at the same significance level. This link between CIs and hypothesis tests is formalized in the next chapter.

10.7 Practice

A researcher estimates \(b_2 = 5.0\) with \(\operatorname{se}(b_2) = 2.0\) and \(N = 25\) (so \(df = 23\)). The critical value \(t_{(0.95, 23)} = 1.714\). Construct a 90% confidence interval for \(\beta_2\). Does the interval contain zero?

\(5.0 \pm 1.714 \times 2.0 = 5.0 \pm 3.43 = [1.57, \; 8.43]\). Zero is not in the interval, so at the 10% significance level, we would reject \(H_0: \beta_2 = 0\). This connection between confidence intervals and hypothesis testing is developed in Chapter 10.

Slides