11 Hypothesis Testing

Is the Relationship Real, or Just Noise?

Inference

Hypothesis Testing

p-values

Author

Jake Anderson

Published

March 21, 2026

Modified

March 26, 2026

Abstract

A confidence interval tells us where a parameter plausibly lives. A hypothesis test asks a sharper question: is the data compatible with a specific claim, or should we reject it? This chapter develops the anatomy of a hypothesis test, walks through two-sided and one-sided examples using the food expenditure data, shows that three decision methods always agree, defines p-values precisely, and clarifies the tradeoff between Type I and Type II errors.

Prerequisites

You should be comfortable with the $t$-distribution and confidence intervals from Chapter 9.

11.1 From Estimation to Testing

From the food expenditure regression ($N = 40$, $df = 38$): $b_2 = 10.21$, $\operatorname{se}(b_2) = 2.09$, 95% CI $= [5.97, 14.45]$. A confidence interval tells us where $\beta_2$ plausibly lives. A hypothesis test asks a yes/no question: is the data compatible with a specific claim about $\beta_2$, or should we reject that claim?

Consider two regressions. Regression A has $b_2 = 12.50$ with $\operatorname{se}(b_2) = 8.40$; Regression B has $b_2 = 2.10$ with $\operatorname{se}(b_2) = 0.35$. Regression A has the bigger coefficient, but its estimate is noisy ($b_2$ could easily be zero). Regression B’s estimate is small but precise. $\implies$ We need a formal measure of evidence relative to noise.

Signal vs noise: The $t$-statistic measures signal (how far $b_k$ is from the null value) relative to noise ($\operatorname{se}(b_k)$). A large $|t|$ means the signal dominates the noise.

11.2 Anatomy of a Hypothesis Test

Every hypothesis test has five components:

Definition 11.1 (Five Components of a Hypothesis Test)

Null hypothesis ($H_0: \beta_k = c$): the claim we put on trial; it always contains an equality
Alternative hypothesis ($H_1$): what we accept if we reject $H_0$ (one-sided or two-sided)
Test statistic: $t = (b_k - c) / \operatorname{se}(b_k)$
Decision rule: rejection region or $p$-value threshold
Conclusion: reject or do not reject $H_0$

The test statistic:

\[ t = \frac{b_k - c}{\operatorname{se}(b_k)} \sim t_{(N-2)} \quad \text{if } H_0 \text{ is true} \tag{11.1}\]

The numerator measures how far our estimate is from the null value; the denominator scales this distance by the estimation precision. If $H_0$ is true, $t$ should be close to zero. If $H_0$ is false, $|t|$ will tend to be large. A decision rule (rejection region or $p$-value threshold) determines whether $|t|$ is large enough to reject, and we reach a conclusion.

Think of it as a trial: $H_0$ is “innocent until proven guilty.” We need strong evidence to convict.

11.3 Two-Sided Tests

Testing $H_0: \beta_2 = 0$ (does income affect food spending at all?). The test statistic is $t = 10.21 / 2.09 = 4.88$. At $\alpha = 0.05$, the critical value is $t_c = 2.024$. Since $|4.88| \ge 2.024$, we reject $H_0$. There is a statistically significant relationship between income and food expenditure.

Testing $H_0: \beta_2 = 7.5$ (consultant’s claim). The test statistic is $t = (10.21 - 7.5) / 2.09 = 1.29$. Since $|1.29| < 2.024$, we do not reject $H_0$. The data are consistent with $\beta_2 = 7.5$, but also with $\beta_2 = 8.5$ ($t = 0.82$) or any value inside the confidence interval. Not rejecting does not prove the null is true.

11.4 Three Equivalent Decision Methods

For a two-sided test at level $\alpha$, three methods always give the same answer:

Reject if $|t| \ge t_c$.

For $H_0: \beta_2 = 0$: $|4.88| \ge 2.024$ $\implies$ reject.

Reject if $p \le \alpha$.

For $H_0: \beta_2 = 0$: $p = 0.00002 \le 0.05$ $\implies$ reject.

Reject if the null value $c$ falls outside the CI.

For $H_0: \beta_2 = 0$: $0 \notin [5.97, 14.45]$ $\implies$ reject.

The three windows onto the same test always agree because they are algebraic rearrangements of the same inequality: $|b_k - c| \ge t_c \cdot \operatorname{se}(b_k)$.

Why the three methods are equivalent (click to expand)

All three methods test the same condition. Start from the rejection region:

\[|t| \ge t_c \iff \left|\frac{b_k - c}{\operatorname{se}(b_k)}\right| \ge t_c \iff |b_k - c| \ge t_c \cdot \operatorname{se}(b_k)\]

The CI method: the interval $b_k \pm t_c \cdot \operatorname{se}(b_k)$ excludes $c$ exactly when $|b_k - c| > t_c \cdot \operatorname{se}(b_k)$, which is the same condition.

The $p$-value method: $p \le \alpha$ exactly when $|t| \ge t_c$ (since $t_c$ is defined as the value where the tail area equals $\alpha/2$).

All three are equivalent statements of $|b_k - c| \ge t_c \cdot \operatorname{se}(b_k)$. $\square$

Interactive: rejection region visualizer

Adjust the significance level $\alpha$ and choose one-tail or two-tail. Enter an observed $t$-statistic to see whether it falls in the rejection region.

Show code

viewof alpha_level = Inputs.range([0.01, 0.20], {value: 0.05, step: 0.01, label: "Significance level α"})
viewof tail_type = Inputs.radio(["Two-tail", "Right one-tail", "Left one-tail"], {label: "Test type", value: "Two-tail"})
viewof obs_t = Inputs.range([-5, 5], {value: 2.5, step: 0.1, label: "Observed t-statistic"})

t_critical = {
  // Approximate t critical value using normal approximation (good for df > 30)
  const p = tail_type === "Two-tail" ? 1 - alpha_level/2 : 1 - alpha_level;
  const t_val = Math.sqrt(-2 * Math.log(1 - p));
  const c0 = 2.515517, c1 = 0.802853, c2 = 0.010328;
  const d1 = 1.432788, d2 = 0.189269, d3 = 0.001308;
  return t_val - (c0 + c1*t_val + c2*t_val*t_val) / (1 + d1*t_val + d2*t_val*t_val + d3*t_val*t_val*t_val);
}

reject_decision = {
  if (tail_type === "Two-tail") return Math.abs(obs_t) >= t_critical;
  if (tail_type === "Right one-tail") return obs_t >= t_critical;
  return obs_t <= -t_critical;
}

t_density_data = {
  const pts = [];
  for (let x = -5; x <= 5; x += 0.02) {
    // t density approximation (normal for simplicity at df=38)
    const density = Math.exp(-0.5 * x * x) / Math.sqrt(2 * Math.PI);
    let shaded = false;
    if (tail_type === "Two-tail") {
      shaded = Math.abs(x) >= t_critical;
    } else if (tail_type === "Right one-tail") {
      shaded = x >= t_critical;
    } else {
      shaded = x <= -t_critical;
    }
    pts.push({x, density, shaded, shade_density: shaded ? density : 0});
  }
  return pts;
}

Plot.plot({
  width: 640,
  height: 380,
  x: {label: "t", domain: [-5, 5]},
  y: {label: "Density"},
  marks: [
    Plot.areaY(t_density_data, {x: "x", y: "density", fill: "#1E5A96", fillOpacity: 0.15}),
    Plot.areaY(t_density_data, {x: "x", y: "shade_density", fill: "#C41E3A", fillOpacity: 0.4}),
    Plot.line(t_density_data, {x: "x", y: "density", stroke: "#1E5A96", strokeWidth: 2}),
    Plot.ruleX([obs_t], {stroke: reject_decision ? "#C41E3A" : "#2E8B57", strokeWidth: 3}),
    Plot.ruleY([0]),
    Plot.text(
      [reject_decision ? "REJECT H₀" : "Do not reject H₀"],
      {x: obs_t, y: 0.42, fill: reject_decision ? "#C41E3A" : "#2E8B57", fontWeight: "bold", fontSize: 15, textAnchor: "middle"}
    ),
    Plot.text(
      [`tс = ±${t_critical.toFixed(3)}`],
      {x: 3.5, y: 0.35, fill: "#888", fontSize: 12}
    )
  ],
  caption: `α = ${alpha_level}, ${tail_type}. Critical value ≈ ${t_critical.toFixed(3)}. Observed t = ${obs_t.toFixed(1)}. ${reject_decision ? "Reject H₀." : "Do not reject H₀."}`
})

(a)

(b)

(c)

(d)

(e)

(f)

(g)

Figure 11.1: Rejection region visualizer. The shaded area(s) show where we reject H₀. Enter your t-statistic to see the verdict.

Try it: Set $\alpha = 0.05$ (two-tail) and slide the observed $t$ from 0 to 3. The vertical line changes from green (“do not reject”) to red (“reject”) when $|t|$ crosses the critical value near 1.96.

11.5 The $p$-Value

Definition 11.2 ($p$-Value) The $p$-value is the probability of observing a test statistic at least as extreme as the one we calculated, assuming $H_0$ is true.

Small $p$ means the observed $t$ would be very unlikely under $H_0$, providing strong evidence against it. Large $p$ means the observed $t$ is not unusual under $H_0$, giving no reason to doubt it.

The direction of $H_1$ determines which tail(s) to measure:

Computing $p$-values for each type of alternative.
Alternative	$p$-value formula	Tail(s)
$H_1: \beta_k > c$	$p = P(t_{(N-2)} \ge t)$	Right
$H_1: \beta_k < c$	$p = P(t_{(N-2)} \le t)$	Left
$H_1: \beta_k \neq c$	$p = 2 \cdot P(t_{(N-2)} \ge \|t\|)$	Both

One-tail $p$-value from software output

Software regression output reports the two-tail $p$-value for $H_0: \beta_k = 0$ by default. For a one-tail test, divide by 2 only when the sign of $t$ agrees with your alternative. If $t$ has the wrong sign, the one-tail $p$-value is $1 - p_{\text{two-tail}}/2 > 0.5$, and you cannot reject.

11.6 One-Sided Tests

When economic theory predicts the sign of the effect (income should increase food spending), a one-sided test concentrates all $\alpha$ in a single tail. For $H_1: \beta_k > c$, reject if $t \ge t_{(1-\alpha, N-2)}$. For $H_1: \beta_k < c$, reject if $t \le -t_{(1-\alpha, N-2)}$. One-sided tests have a lower critical value (for example, 1.686 vs 2.024 at $\alpha = 0.05$ with $df = 38$), making it easier to reject in the predicted direction. Use one-sided only when theory gives a clear directional prediction before seeing the data.

One-tail vs two-tail: One-tail is more powerful in the predicted direction, but cannot detect effects in the opposite direction. Use one-tail only with strong prior theoretical justification.

11.7 Type I and Type II Errors

The two types of errors in hypothesis testing.
	$H_0$ is actually true	$H_0$ is actually false
Reject $H_0$	Type I error (probability $= \alpha$)	Correct
Do not reject $H_0$	Correct	Type II error (probability $= \beta$)

A Type I error (false positive) means rejecting a true $H_0$; its probability is $\alpha$, which we control by choosing the significance level. A Type II error (false negative) means failing to reject a false $H_0$; its probability depends on the true parameter value and is not directly controlled. Power $= 1 - \beta$ is the probability of correctly rejecting a false $H_0$.

The tradeoff between Type I and Type II errors

There is always a tradeoff: lowering $\alpha$ reduces false positives but increases false negatives. The choice of $\alpha$ should reflect the relative costs of the two error types. In the supermarket example from the slides (testing whether income raises food spending by more than $5.50), the cost of a false positive (building an unprofitable store) is high, so a conservative $\alpha = 0.01$ is appropriate.

$\implies$ “Do not reject” is weaker than “reject.” It means the data cannot distinguish $\beta_2$ from the null value, not that $\beta_2$ is the null value. Always report the magnitude of $b_k$ alongside the $t$-statistic; statistical significance tells you whether the effect is distinguishable from zero, not whether it is large enough to care about.

flowchart TD
    A["State H₀ and H₁"] --> B["Compute t = (bₖ − c) / se(bₖ)"]
    B --> C{"Two-tail or<br/>one-tail?"}
    C -->|Two-tail| D["Reject if |t| ≥ tс"]
    C -->|Right one-tail| E["Reject if t ≥ tс"]
    C -->|Left one-tail| F["Reject if t ≤ −tс"]
    D --> G{"Reject?"}
    E --> G
    F --> G
    G -->|Yes| H["Evidence against H₀<br/>at α level"]
    G -->|No| I["Cannot reject H₀<br/>(does NOT prove H₀)"]

    style A fill:#1E5A96,color:#fff
    style B fill:#1E5A96,color:#fff
    style H fill:#C41E3A,color:#fff
    style I fill:#2E8B57,color:#fff

Figure 11.2: Hypothesis testing decision flowchart.

11.8 Practice

A researcher estimates $b_2 = 3.5$ with $\operatorname{se}(b_2) = 1.4$ and $N = 30$ ($df = 28$). Test $H_0: \beta_2 = 0$ vs $H_1: \beta_2 \neq 0$ at $\alpha = 0.05$. The critical value is $t_{(0.975, 28)} = 2.048$.

Show Solution

$t = 3.5 / 1.4 = 2.5$. Since $|2.5| = 2.5 \ge 2.048$, reject $H_0$. Equivalently, the 95% CI is $3.5 \pm 2.048 \times 1.4 = 3.5 \pm 2.87 = [0.63, 6.37]$; since $0 \notin [0.63, 6.37]$, we reject. The $p$-value is $2 \cdot P(t_{28} \ge 2.5) \approx 0.019 < 0.05$, confirming the rejection. All three methods agree.

Slides

Download handout slides (PDF)

Download presentation slides with transitions (PDF)

Alternative	\(p\)-value formula	Tail(s)
\(H_1: \beta_k > c\)	\(p = P(t_{(N-2)} \ge t)\)	Right
\(H_1: \beta_k < c\)	\(p = P(t_{(N-2)} \le t)\)	Left
\(H_1: \beta_k \neq c\)	\(p = 2 \cdot P(t_{(N-2)} \ge \|t\|)\)	Both

	\(H_0\) is actually true	\(H_0\) is actually false
Reject \(H_0\)	Type I error (probability \(= \alpha\))	Correct
Do not reject \(H_0\)	Correct	Type II error (probability \(= \beta\))

11.1 From Estimation to Testing

11.2 Anatomy of a Hypothesis Test

11.3 Two-Sided Tests

11.4 Three Equivalent Decision Methods

Interactive: rejection region visualizer

11.5 The \(p\)-Value

11.6 One-Sided Tests

11.7 Type I and Type II Errors

11.8 Practice

Slides