From Point Estimates to Ranges of Plausible Values
Inference
Confidence Intervals
t-distribution
Author
Jake Anderson
Published
March 21, 2026
Modified
March 26, 2026
Abstract
A point estimate without a confidence interval is incomplete. This chapter derives the confidence interval for regression coefficients by showing how replacing the unknown sigma with its estimate changes the standard normal to a t-distribution, constructs the interval formula, clarifies the repeated sampling interpretation, and extends the method to linear combinations of parameters.
NotePrerequisites
You should know the standard error formulas from Chapter 8 and the \(t\)-distribution preview from Chapter 4.
10.1 The Precision Problem
From the food expenditure regression: \(\hat{y} = \underset{(43.41)}{83.42} + \underset{(2.09)}{10.21}\, \text{income}\). We know \(b_2 = 10.21\) with \(\operatorname{se}(b_2) = 2.09\). But what does that tell us? Could the true return to income be $5? (That is \(5.21 / 2.09 = 2.49\) standard errors away.) Could it be $0? (That is \(10.21 / 2.09 = 4.89\) standard errors away.) A point estimate and a standard error are the ingredients; a confidence interval is the recipe that turns them into a range of plausible values for \(\beta_2\).
Why not just report \(b_2\)? Because a single number without context is meaningless. Is \(b_2 = 10.21\) precise? Noisy? Consistent with zero? The standard error and CI answer these questions.
10.2 From Normal to \(t\)
Under assumptions SR1 through SR6 (including normality of errors), the OLS slope is normally distributed: \(b_2 \sim N(\beta_2, \sigma^2 / \sum(x_i - \bar{x})^2)\). Standardizing:
If we knew \(\sigma^2\), we could use \(P(-1.96 \le Z \le 1.96) = 0.95\) and rearrange to get an interval for \(\beta_2\). But we do not know \(\sigma^2\).
Replacing \(\sigma\) with \(\hat{\sigma}\) in the denominator introduces additional estimation uncertainty. The resulting statistic follows a \(t\)-distribution rather than the standard normal:
Theorem 10.1 (The \(t\)-statistic for OLS Coefficients)\[
t = \frac{b_2 - \beta_2}{\operatorname{se}(b_2)} \sim t_{(N-2)}
\tag{10.2}\]
The degrees of freedom are \(N - 2\) because we used \(N\) residuals to estimate \(\hat{\sigma}^2\) but spent 2 degrees of freedom estimating \(b_1\) and \(b_2\).
The \(t\)-distribution has heavier tails than \(N(0,1)\), reflecting our uncertainty about \(\sigma\). As \(N \to \infty\), \(\hat{\sigma}^2 \to \sigma^2\) and \(t_{(N-2)} \to N(0,1)\).
\(\implies\) With small samples, we need wider intervals to compensate for estimating \(\sigma\).
\(t\) vs \(Z\): For \(N > 100\), the difference is negligible (\(t_{(98)} \approx 1.984\) vs \(z = 1.960\) for 95% CI). For small samples, \(t\) critical values are noticeably larger.
10.3 The Confidence Interval Formula
Start with \(P(-t_c \le t \le t_c) = 1 - \alpha\) where \(t_c = t_{(1-\alpha/2, \; N-2)}\) is the critical value from the \(t\)-table. Multiply through by \(\operatorname{se}(b_k)\) and rearrange:
Theorem 10.2 (Confidence Interval for \(\beta_k\))\[
\text{CI for } \beta_k: \qquad b_k \pm t_c \cdot \operatorname{se}(b_k)
\tag{10.3}\]
where \(b_k\) is the OLS point estimate (center), \(\operatorname{se}(b_k)\) is the standard error (measures precision), and \(t_c = t_{(1-\alpha/2, \; N-2)}\) is the critical value.
NoteDerivation of the CI formula (click to expand)
Start from the probability statement: \[P(-t_c \le t \le t_c) = 1 - \alpha\]
Multiply all three parts by \(\operatorname{se}(b_k)\): \[P\left(-t_c \cdot \operatorname{se}(b_k) \le b_k - \beta_k \le t_c \cdot \operatorname{se}(b_k)\right) = 1 - \alpha\]
Subtract \(b_k\) and multiply by \(-1\) (flipping the inequalities): \[P\left(b_k - t_c \cdot \operatorname{se}(b_k) \le \beta_k \le b_k + t_c \cdot \operatorname{se}(b_k)\right) = 1 - \alpha\]
This gives the interval \(b_k \pm t_c \cdot \operatorname{se}(b_k)\). \(\square\)
10.4 Interpretation: The Repeated Sampling View
WarningWhat “95% confidence” does and does not mean
“95% confidence” does not mean “there is a 95% probability that \(\beta_2\) is in the interval.” The parameter \(\beta_2\) is a fixed, unknown constant; it is either inside the interval or it is not. What is random is the interval itself: \(b_2\) changes from sample to sample (so the center moves), and \(\operatorname{se}(b_2)\) changes from sample to sample (so the width changes).
The correct interpretation: if we drew many samples and built a 95% CI from each one, approximately 95% of those intervals would contain the true \(\beta_2\). Our confidence is in the procedure, not in any single interval.
Interactive: CI coverage simulator
This widget draws 50 confidence intervals from repeated samples. Each interval either captures the true \(\beta_2 = 10\) (green) or misses it (red). Adjust the confidence level to see coverage change.
Show code
viewof ci_level = Inputs.range([0.80,0.99], {value:0.95,step:0.01,label:"Confidence level"})ci_true_beta =10ci_data = {const rng = d3.randomLcg(2026);const normal_gen = d3.randomNormal.source(rng)(0,80);const n =40;const nci =50;const intervals = [];// t critical value approximation (for df=38, close enough for visualization)const alpha =1- ci_level;// rough approximation of t critical valueconst z_approx =-Math.log(2*Math.min(alpha/2,1- alpha/2));// better: use normal approximation adjusted for small dfconst df = n -2;// Simple Cornish-Fisher approximationconst z = (() => {const p =1- alpha/2;const a =0.3989422804;// inverse normal via rational approximationconst t_val =Math.sqrt(-2*Math.log(1- p));const c0 =2.515517, c1 =0.802853, c2 =0.010328;const d1 =1.432788, d2 =0.189269, d3 =0.001308;return t_val - (c0 + c1*t_val + c2*t_val*t_val) / (1+ d1*t_val + d2*t_val*t_val + d3*t_val*t_val*t_val); })();// Adjust for t distribution (add small correction for finite df)const tc = z * (1+1/(4*df));for (let r =0; r < nci; r++) {const data = [];for (let i =0; i < n; i++) {const x =5+30*rng();const y =83+ ci_true_beta * x +normal_gen(); data.push({x, y}); }const xbar = d3.mean(data, d => d.x);const ybar = d3.mean(data, d => d.y);const num = d3.sum(data, d => (d.x- xbar) * (d.y- ybar));const den = d3.sum(data, d => (d.x- xbar) **2);const b2 = num / den;const sse = d3.sum(data, d => {const yhat = (ybar - b2 * xbar) + b2 * d.x;return (d.y- yhat) **2; });const sigma_hat =Math.sqrt(sse / (n -2));const se = sigma_hat /Math.sqrt(den);const lo = b2 - tc * se;const hi = b2 + tc * se;const covers = (lo <= ci_true_beta && ci_true_beta <= hi); intervals.push({sample: r +1, b2, lo, hi, covers}); }return intervals;}ci_coverage_rate = (d3.sum(ci_data, d => d.covers) / ci_data.length*100).toFixed(0)Plot.plot({width:640,height:500,x: {label:"β₂",domain: [ci_true_beta -12, ci_true_beta +12]},y: {label:"Sample #",domain: [0,51],reverse:true},marks: [ Plot.ruleX([ci_true_beta], {stroke:"#C41E3A",strokeWidth:2,strokeDasharray:"6 3"}), Plot.link(ci_data, {x1:"lo",x2:"hi",y1:"sample",y2:"sample",stroke: d => d.covers?"#2E8B57":"#C41E3A",strokeWidth:2 }), Plot.dot(ci_data, {x:"b2",y:"sample",fill: d => d.covers?"#2E8B57":"#C41E3A",r:3}), Plot.text([`Coverage: ${ci_coverage_rate}% (nominal: ${(ci_level*100).toFixed(0)}%)`], {x: ci_true_beta,y:51,textAnchor:"middle",fill:"#1E5A96",fontWeight:"bold",fontSize:13}) ],caption:`50 confidence intervals at the ${(ci_level*100).toFixed(0)}% level. Red dashed line = true β₂ = ${ci_true_beta}.`})
(a)
(b)
(c)
(d)
(e)
Figure 10.1: CI coverage simulator. Each horizontal bar is a confidence interval from one sample. Green = captures β₂; red = misses. The coverage rate tracks the nominal level.
Try it: Lower the confidence level from 95% to 80%. More intervals miss the true value; coverage drops from about 95% to about 80%. This is the tradeoff: narrower intervals give less coverage.
10.5 Food Expenditure Example
With \(N = 40\), \(df = 38\), \(b_2 = 10.21\), \(\operatorname{se}(b_2) = 2.09\), and \(t_c = t_{(0.975, 38)} = 2.024\):
We estimate with 95% confidence that an additional $100 of weekly income increases food expenditure by between $5.98 and $14.44. There is a tradeoff: more confidence requires a wider interval. At 90% the interval narrows to \([6.69, 13.73]\); at 99% it widens to \([4.54, 15.88]\).
flowchart LR
A["Point estimate b₂"] --> D["CI = b₂ ± tс · se(b₂)"]
B["Standard error se(b₂)"] --> D
C["Critical value tс<br/>from t(N−2)"] --> D
D --> E["Interpretation:<br/>procedure captures β₂<br/>in (1−α)% of samples"]
style A fill:#1E5A96,color:#fff
style B fill:#1E5A96,color:#fff
style C fill:#1E5A96,color:#fff
style D fill:#D4A84B,color:#fff
style E fill:#2E8B57,color:#fff
Figure 10.2: Constructing a confidence interval: three ingredients combine into a range of plausible values.
10.6 Confidence Intervals for Linear Combinations
What if we want a CI for expected food expenditure at a specific income, say \(x_0 = 20\)? This is \(\lambda = \beta_1 + 20\beta_2\), a linear combination of both parameters. The point estimate is \(\hat{\lambda} = b_1 + 20 b_2 = 287.61\), but its variance requires the full covariance matrix:
The estimators \(b_1\) and \(b_2\) are correlated (both come from the same data), so you cannot just add the individual variances. This is the same principle as Theorem 16.1 from Chapter 3.
From the food expenditure data, \(\operatorname{se}(\hat{\lambda}) = 14.18\), and the 95% CI is \(287.61 \pm 2.024 \times 14.18 = [258.91, \; 316.31]\). We estimate with 95% confidence that a household earning $2,000 per week spends between $258.91 and $316.31 on food.
Connection to Chapter 10: If the CI for \(\beta_2\) excludes a value \(c\), then the hypothesis test \(H_0: \beta_2 = c\) rejects at the same significance level. This link between CIs and hypothesis tests is formalized in the next chapter.
10.7 Practice
A researcher estimates \(b_2 = 5.0\) with \(\operatorname{se}(b_2) = 2.0\) and \(N = 25\) (so \(df = 23\)). The critical value \(t_{(0.95, 23)} = 1.714\). Construct a 90% confidence interval for \(\beta_2\). Does the interval contain zero?
TipShow Solution
\(5.0 \pm 1.714 \times 2.0 = 5.0 \pm 3.43 = [1.57, \; 8.43]\). Zero is not in the interval, so at the 10% significance level, we would reject \(H_0: \beta_2 = 0\). This connection between confidence intervals and hypothesis testing is developed in Chapter 10.