A confidence interval tells us where a parameter plausibly lives. A hypothesis test asks a sharper question: is the data compatible with a specific claim, or should we reject it? This chapter develops the anatomy of a hypothesis test, walks through two-sided and one-sided examples using the food expenditure data, shows that three decision methods always agree, defines p-values precisely, and clarifies the tradeoff between Type I and Type II errors.
NotePrerequisites
You should be comfortable with the \(t\)-distribution and confidence intervals from Chapter 9.
11.1 From Estimation to Testing
From the food expenditure regression (\(N = 40\), \(df = 38\)): \(b_2 = 10.21\), \(\operatorname{se}(b_2) = 2.09\), 95% CI \(= [5.97, 14.45]\). A confidence interval tells us where\(\beta_2\) plausibly lives. A hypothesis test asks a yes/no question: is the data compatible with a specific claim about \(\beta_2\), or should we reject that claim?
Consider two regressions. Regression A has \(b_2 = 12.50\) with \(\operatorname{se}(b_2) = 8.40\); Regression B has \(b_2 = 2.10\) with \(\operatorname{se}(b_2) = 0.35\). Regression A has the bigger coefficient, but its estimate is noisy (\(b_2\) could easily be zero). Regression B’s estimate is small but precise. \(\implies\) We need a formal measure of evidence relative to noise.
Signal vs noise: The \(t\)-statistic measures signal (how far \(b_k\) is from the null value) relative to noise (\(\operatorname{se}(b_k)\)). A large \(|t|\) means the signal dominates the noise.
11.2 Anatomy of a Hypothesis Test
Every hypothesis test has five components:
Definition 11.1 (Five Components of a Hypothesis Test)
Null hypothesis (\(H_0: \beta_k = c\)): the claim we put on trial; it always contains an equality
Alternative hypothesis (\(H_1\)): what we accept if we reject \(H_0\) (one-sided or two-sided)
Test statistic: \(t = (b_k - c) / \operatorname{se}(b_k)\)
Decision rule: rejection region or \(p\)-value threshold
Conclusion: reject or do not reject \(H_0\)
The test statistic:
\[
t = \frac{b_k - c}{\operatorname{se}(b_k)} \sim t_{(N-2)} \quad \text{if } H_0 \text{ is true}
\tag{11.1}\]
The numerator measures how far our estimate is from the null value; the denominator scales this distance by the estimation precision. If \(H_0\) is true, \(t\) should be close to zero. If \(H_0\) is false, \(|t|\) will tend to be large. A decision rule (rejection region or \(p\)-value threshold) determines whether \(|t|\) is large enough to reject, and we reach a conclusion.
Think of it as a trial: \(H_0\) is “innocent until proven guilty.” We need strong evidence to convict.
11.3 Two-Sided Tests
Testing \(H_0: \beta_2 = 0\) (does income affect food spending at all?). The test statistic is \(t = 10.21 / 2.09 = 4.88\). At \(\alpha = 0.05\), the critical value is \(t_c = 2.024\). Since \(|4.88| \ge 2.024\), we reject\(H_0\). There is a statistically significant relationship between income and food expenditure.
Testing \(H_0: \beta_2 = 7.5\) (consultant’s claim). The test statistic is \(t = (10.21 - 7.5) / 2.09 = 1.29\). Since \(|1.29| < 2.024\), we do not reject\(H_0\). The data are consistent with \(\beta_2 = 7.5\), but also with \(\beta_2 = 8.5\) (\(t = 0.82\)) or any value inside the confidence interval. Not rejecting does not prove the null is true.
11.4 Three Equivalent Decision Methods
For a two-sided test at level \(\alpha\), three methods always give the same answer:
For \(H_0: \beta_2 = 0\): \(|4.88| \ge 2.024\)\(\implies\)reject.
Reject if \(p \le \alpha\).
For \(H_0: \beta_2 = 0\): \(p = 0.00002 \le 0.05\)\(\implies\)reject.
Reject if the null value \(c\) falls outside the CI.
For \(H_0: \beta_2 = 0\): \(0 \notin [5.97, 14.45]\)\(\implies\)reject.
The three windows onto the same test always agree because they are algebraic rearrangements of the same inequality: \(|b_k - c| \ge t_c \cdot \operatorname{se}(b_k)\).
NoteWhy the three methods are equivalent (click to expand)
All three methods test the same condition. Start from the rejection region:
The CI method: the interval \(b_k \pm t_c \cdot \operatorname{se}(b_k)\) excludes \(c\) exactly when \(|b_k - c| > t_c \cdot \operatorname{se}(b_k)\), which is the same condition.
The \(p\)-value method: \(p \le \alpha\) exactly when \(|t| \ge t_c\) (since \(t_c\) is defined as the value where the tail area equals \(\alpha/2\)).
All three are equivalent statements of \(|b_k - c| \ge t_c \cdot \operatorname{se}(b_k)\). \(\square\)
Interactive: rejection region visualizer
Adjust the significance level \(\alpha\) and choose one-tail or two-tail. Enter an observed \(t\)-statistic to see whether it falls in the rejection region.
Show code
viewof alpha_level = Inputs.range([0.01,0.20], {value:0.05,step:0.01,label:"Significance level α"})viewof tail_type = Inputs.radio(["Two-tail","Right one-tail","Left one-tail"], {label:"Test type",value:"Two-tail"})viewof obs_t = Inputs.range([-5,5], {value:2.5,step:0.1,label:"Observed t-statistic"})t_critical = {// Approximate t critical value using normal approximation (good for df > 30)const p = tail_type ==="Two-tail"?1- alpha_level/2:1- alpha_level;const t_val =Math.sqrt(-2*Math.log(1- p));const c0 =2.515517, c1 =0.802853, c2 =0.010328;const d1 =1.432788, d2 =0.189269, d3 =0.001308;return t_val - (c0 + c1*t_val + c2*t_val*t_val) / (1+ d1*t_val + d2*t_val*t_val + d3*t_val*t_val*t_val);}reject_decision = {if (tail_type ==="Two-tail") returnMath.abs(obs_t) >= t_critical;if (tail_type ==="Right one-tail") return obs_t >= t_critical;return obs_t <=-t_critical;}t_density_data = {const pts = [];for (let x =-5; x <=5; x +=0.02) {// t density approximation (normal for simplicity at df=38)const density =Math.exp(-0.5* x * x) /Math.sqrt(2*Math.PI);let shaded =false;if (tail_type ==="Two-tail") { shaded =Math.abs(x) >= t_critical; } elseif (tail_type ==="Right one-tail") { shaded = x >= t_critical; } else { shaded = x <=-t_critical; } pts.push({x, density, shaded,shade_density: shaded ? density :0}); }return pts;}Plot.plot({width:640,height:380,x: {label:"t",domain: [-5,5]},y: {label:"Density"},marks: [ Plot.areaY(t_density_data, {x:"x",y:"density",fill:"#1E5A96",fillOpacity:0.15}), Plot.areaY(t_density_data, {x:"x",y:"shade_density",fill:"#C41E3A",fillOpacity:0.4}), Plot.line(t_density_data, {x:"x",y:"density",stroke:"#1E5A96",strokeWidth:2}), Plot.ruleX([obs_t], {stroke: reject_decision ?"#C41E3A":"#2E8B57",strokeWidth:3}), Plot.ruleY([0]), Plot.text( [reject_decision ?"REJECT H₀":"Do not reject H₀"], {x: obs_t,y:0.42,fill: reject_decision ?"#C41E3A":"#2E8B57",fontWeight:"bold",fontSize:15,textAnchor:"middle"} ), Plot.text( [`tс = ±${t_critical.toFixed(3)}`], {x:3.5,y:0.35,fill:"#888",fontSize:12} ) ],caption:`α = ${alpha_level}, ${tail_type}. Critical value ≈ ${t_critical.toFixed(3)}. Observed t = ${obs_t.toFixed(1)}. ${reject_decision ?"Reject H₀.":"Do not reject H₀."}`})
(a)
(b)
(c)
(d)
(e)
(f)
(g)
Figure 11.1: Rejection region visualizer. The shaded area(s) show where we reject H₀. Enter your t-statistic to see the verdict.
Try it: Set \(\alpha = 0.05\) (two-tail) and slide the observed \(t\) from 0 to 3. The vertical line changes from green (“do not reject”) to red (“reject”) when \(|t|\) crosses the critical value near 1.96.
11.5 The \(p\)-Value
Definition 11.2 (\(p\)-Value) The \(p\)-value is the probability of observing a test statistic at least as extreme as the one we calculated, assuming \(H_0\) is true.
Small \(p\) means the observed \(t\) would be very unlikely under \(H_0\), providing strong evidence against it. Large \(p\) means the observed \(t\) is not unusual under \(H_0\), giving no reason to doubt it.
The direction of \(H_1\) determines which tail(s) to measure:
Computing \(p\)-values for each type of alternative.
Alternative
\(p\)-value formula
Tail(s)
\(H_1: \beta_k > c\)
\(p = P(t_{(N-2)} \ge t)\)
Right
\(H_1: \beta_k < c\)
\(p = P(t_{(N-2)} \le t)\)
Left
\(H_1: \beta_k \neq c\)
\(p = 2 \cdot P(t_{(N-2)} \ge |t|)\)
Both
CautionOne-tail \(p\)-value from software output
Software regression output reports the two-tail \(p\)-value for \(H_0: \beta_k = 0\) by default. For a one-tail test, divide by 2 only when the sign of \(t\) agrees with your alternative. If \(t\) has the wrong sign, the one-tail \(p\)-value is \(1 - p_{\text{two-tail}}/2 > 0.5\), and you cannot reject.
11.6 One-Sided Tests
When economic theory predicts the sign of the effect (income should increase food spending), a one-sided test concentrates all \(\alpha\) in a single tail. For \(H_1: \beta_k > c\), reject if \(t \ge t_{(1-\alpha, N-2)}\). For \(H_1: \beta_k < c\), reject if \(t \le -t_{(1-\alpha, N-2)}\). One-sided tests have a lower critical value (for example, 1.686 vs 2.024 at \(\alpha = 0.05\) with \(df = 38\)), making it easier to reject in the predicted direction. Use one-sided only when theory gives a clear directional prediction before seeing the data.
One-tail vs two-tail: One-tail is more powerful in the predicted direction, but cannot detect effects in the opposite direction. Use one-tail only with strong prior theoretical justification.
11.7 Type I and Type II Errors
The two types of errors in hypothesis testing.
\(H_0\) is actually true
\(H_0\) is actually false
Reject \(H_0\)
Type I error (probability \(= \alpha\))
Correct
Do not reject \(H_0\)
Correct
Type II error (probability \(= \beta\))
A Type I error (false positive) means rejecting a true \(H_0\); its probability is \(\alpha\), which we control by choosing the significance level. A Type II error (false negative) means failing to reject a false \(H_0\); its probability depends on the true parameter value and is not directly controlled. Power\(= 1 - \beta\) is the probability of correctly rejecting a false \(H_0\).
WarningThe tradeoff between Type I and Type II errors
There is always a tradeoff: lowering \(\alpha\) reduces false positives but increases false negatives. The choice of \(\alpha\) should reflect the relative costs of the two error types. In the supermarket example from the slides (testing whether income raises food spending by more than $5.50), the cost of a false positive (building an unprofitable store) is high, so a conservative \(\alpha = 0.01\) is appropriate.
\(\implies\) “Do not reject” is weaker than “reject.” It means the data cannot distinguish \(\beta_2\) from the null value, not that \(\beta_2\)is the null value. Always report the magnitude of \(b_k\) alongside the \(t\)-statistic; statistical significance tells you whether the effect is distinguishable from zero, not whether it is large enough to care about.
flowchart TD
A["State H₀ and H₁"] --> B["Compute t = (bₖ − c) / se(bₖ)"]
B --> C{"Two-tail or<br/>one-tail?"}
C -->|Two-tail| D["Reject if |t| ≥ tс"]
C -->|Right one-tail| E["Reject if t ≥ tс"]
C -->|Left one-tail| F["Reject if t ≤ −tс"]
D --> G{"Reject?"}
E --> G
F --> G
G -->|Yes| H["Evidence against H₀<br/>at α level"]
G -->|No| I["Cannot reject H₀<br/>(does NOT prove H₀)"]
style A fill:#1E5A96,color:#fff
style B fill:#1E5A96,color:#fff
style H fill:#C41E3A,color:#fff
style I fill:#2E8B57,color:#fff
A researcher estimates \(b_2 = 3.5\) with \(\operatorname{se}(b_2) = 1.4\) and \(N = 30\) (\(df = 28\)). Test \(H_0: \beta_2 = 0\) vs \(H_1: \beta_2 \neq 0\) at \(\alpha = 0.05\). The critical value is \(t_{(0.975, 28)} = 2.048\).
TipShow Solution
\(t = 3.5 / 1.4 = 2.5\). Since \(|2.5| = 2.5 \ge 2.048\), reject \(H_0\). Equivalently, the 95% CI is \(3.5 \pm 2.048 \times 1.4 = 3.5 \pm 2.87 = [0.63, 6.37]\); since \(0 \notin [0.63, 6.37]\), we reject. The \(p\)-value is \(2 \cdot P(t_{28} \ge 2.5) \approx 0.019 < 0.05\), confirming the rejection. All three methods agree.