18 F-Tests and Joint Hypothesis Testing

When Individual t-Tests Are Not Enough

F-Test

Joint Hypotheses

Multiple Regression

Author

Jake Anderson

Published

March 21, 2026

Modified

March 26, 2026

Abstract

Individual $t$-tests can both fail to reject even when two variables are jointly significant. The $F$-test solves this by comparing a restricted model (imposing $H_0$) to an unrestricted model and asking whether the restrictions cause a statistically meaningful increase in the sum of squared errors.

Prerequisites

You should be comfortable with individual $t$-tests in MR and the distinction between individual and joint significance before reading this chapter.

18.1 Why We Need a New Test

In the previous chapter, we saw that EXPER and EXPER$^2$ can both be individually insignificant while jointly capturing the curvature of the wage-experience profile. Each $t$-test asks: “Does this variable contribute, given that the other is already in the model?” When two regressors share most of their variation, neither looks significant alone. The $F$-test evaluates them together.

The $t$-test limitation: It tests one coefficient at a time. For joint hypotheses ($J \geq 2$ restrictions), the $F$-test is the correct tool.

18.2 The F-Statistic: Restricted vs. Unrestricted

The $F$-test compares two regressions. The unrestricted model includes all variables. The restricted model imposes $H_0$ (for example, setting $\beta_3 = 0$ and $\beta_4 = 0$). Each model produces a sum of squared errors, $SSE_U$ and $SSE_R$, and because the unrestricted model has more flexibility, $SSE_R \geq SSE_U$ always. The question is whether the difference is large enough to be statistically meaningful.

Theorem 18.1 (The F-Statistic) \[ F = \frac{(SSE_R - SSE_U) / J}{SSE_U / (N - K)} \tag{18.1}\]

Under $H_0$ (with assumptions MR1 through MR6), $F \sim F_{(J, N-K)}$. Reject $H_0$ if $F \geq F_c$.

The numerator measures the fit lost per restriction. The denominator is the baseline noise level per residual degree of freedom.

Components of the $F$-statistic.
Symbol	Meaning
$SSE_R$	Sum of squared errors from the restricted model
$SSE_U$	Sum of squared errors from the unrestricted model
$J$	Number of restrictions (count the “=” signs in $H_0$)
$K$	Number of parameters in the unrestricted model (including intercept)

The $F$-test is always one-sided (right tail only) because $SSE_R \geq SSE_U$ guarantees $F \geq 0$. A large $F$ means the restrictions cause a big increase in errors relative to noise, so the restricted variables were doing real work.

Right-tail only: The $F$-distribution is always non-negative. We reject when $F$ is large (restrictions hurt the fit), never when it is small.

When $J = 1$, the $F$-test and $t$-test are equivalent

For a single two-sided restriction, $F = t^2$ and the critical values satisfy $F_{(1, N-K)} = t_{(N-K)}^2$. The $p$-values are identical. Use $t$-tests for single restrictions (they handle one-sided alternatives); use $F$-tests for joint restrictions ($J \geq 2$).

Interactive: F-Test Calculator

Enter the sum of squared errors from your restricted and unrestricted models, along with the number of restrictions and sample size. The widget computes the $F$-statistic and plots it on the $F$-distribution with the rejection region shaded.

Show code

viewof sse_r_input = Inputs.range([50000, 200000], {value: 100847, step: 100, label: "SSE_R (restricted)"})
viewof sse_u_input = Inputs.range([50000, 200000], {value: 97882, step: 100, label: "SSE_U (unrestricted)"})
viewof j_input = Inputs.range([1, 10], {value: 2, step: 1, label: "J (restrictions)"})
viewof n_f_input = Inputs.range([20, 2000], {value: 1000, step: 10, label: "N (sample size)"})
viewof k_f_input = Inputs.range([2, 20], {value: 4, step: 1, label: "K (parameters in unrestricted)"})

f_result = {
  const J = j_input;
  const nk = n_f_input - k_f_input;
  const F_num = (sse_r_input - sse_u_input) / J;
  const F_den = sse_u_input / nk;
  const F_stat = Math.max(F_num / F_den, 0);

  // Approximate F critical value (using chi-sq approximation for large df)
  // For common cases, use lookup
  let F_crit;
  if (J === 1) F_crit = 3.84;
  else if (J === 2) F_crit = 3.00;
  else if (J === 3) F_crit = 2.60;
  else if (J === 4) F_crit = 2.37;
  else if (J === 5) F_crit = 2.22;
  else F_crit = 1.5 + 2.0 / J + 1.0; // rough approximation

  const reject = F_stat >= F_crit;

  // Generate F-distribution PDF (approximate with Beta distribution)
  // For large df2, F ~ chi2(df1)/df1
  const df1 = J, df2 = nk;
  const maxX = Math.max(F_stat * 1.5, F_crit * 2, 6);
  const pdf = d3.range(0.01, maxX, 0.05).map(x => {
    // F PDF approximation using the formula
    const logPdf = (df1/2) * Math.log(df1) + (df2/2) * Math.log(df2)
      + (df1/2 - 1) * Math.log(x)
      - ((df1 + df2)/2) * Math.log(df2 + df1 * x)
      - lnBeta(df1/2, df2/2);
    return {x, y: Math.exp(logPdf)};
  });

  function lnBeta(a, b) {
    return lnGamma(a) + lnGamma(b) - lnGamma(a + b);
  }
  function lnGamma(z) {
    // Stirling approximation
    if (z < 0.5) return Math.log(Math.PI / Math.sin(Math.PI * z)) - lnGamma(1 - z);
    z -= 1;
    const c = [76.18009172947146, -86.50532032941677, 24.01409824083091,
               -1.231739572450155, 0.001208650973866179, -0.000005395239384953];
    let x = 1.000000000190015;
    for (let i = 0; i < 6; i++) x += c[i] / (z + i + 1);
    const t = z + 5.5;
    return 0.5 * Math.log(2 * Math.PI) + (z + 0.5) * Math.log(t) - t + Math.log(x);
  }

  return {F_stat, F_crit, reject, pdf, maxX, df1, df2};
}

Plot.plot({
  width: 650, height: 280,
  x: {label: "F-statistic", domain: [0, f_result.maxX]},
  y: {label: "Density"},
  marks: [
    Plot.areaY(f_result.pdf, {x: "x", y: "y", fill: "#eee"}),
    Plot.areaY(f_result.pdf.filter(d => d.x >= f_result.F_crit), {x: "x", y: "y", fill: "#C41E3A", fillOpacity: 0.3}),
    Plot.line(f_result.pdf, {x: "x", y: "y", stroke: "#333", strokeWidth: 1.5}),
    Plot.ruleX([f_result.F_stat], {stroke: "#1E5A96", strokeWidth: 3}),
    Plot.ruleX([f_result.F_crit], {stroke: "#C41E3A", strokeDasharray: "4,4", strokeWidth: 1.5})
  ]
})

Show code

html`<div style="padding:1em; background:${f_result.reject ? '#fde8e8' : '#e8f5e8'}; border-radius:8px; margin-top:0.5em">
  <strong>F-statistic:</strong> ${f_result.F_stat.toFixed(3)} &nbsp;|&nbsp;
  <strong>F-critical (approx, α=0.05):</strong> ${f_result.F_crit.toFixed(2)} &nbsp;|&nbsp;
  <strong>df:</strong> (${f_result.df1}, ${f_result.df2})<br/>
  <strong>Decision:</strong> ${f_result.reject ? "Reject H₀: the restricted variables are jointly significant" : "Fail to reject H₀: restrictions are consistent with the data"}
</div>`

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Figure 18.1: F-test calculator. Enter SSE values and restrictions to compute the F-statistic. The shaded region is the 5% rejection zone.

18.3 Testing Overall Significance

The most common application of the $F$-test asks whether the model has any explanatory power at all:

\[ H_0: \beta_2 = \beta_3 = \cdots = \beta_K = 0 \qquad H_1: \text{at least one } \beta_k \neq 0 \]

The restricted model under $H_0$ contains only the intercept ($y_i = \beta_1 + e_i$), so $SSE_R = SST$ and $J = K - 1$. The $F$-statistic simplifies to:

\[ F = \frac{(SST - SSE) / (K - 1)}{SSE / (N - K)} \tag{18.2}\]

The overall $F$-test is reported automatically by regression software. Always check it first: if the overall $F$ fails to reject, none of the regressors are jointly significant.

Software organizes this in an Analysis of Variance (ANOVA) table, reporting the regression mean square, the residual mean square ($\hat{\sigma}^2$), and their ratio as the overall $F$-statistic. This is the first thing to check in any regression output: if the overall $F$-test fails to reject, none of the regressors are jointly significant.

flowchart TD
    A["State H₀<br/>(J equality restrictions)"] --> B["Estimate unrestricted model<br/>Get SSE_U, K, N"]
    B --> C["Estimate restricted model<br/>Get SSE_R"]
    C --> D["Compute F = (SSE_R - SSE_U)/J<br/>÷ SSE_U/(N-K)"]
    D --> E{"F ≥ F_c?"}
    E -->|Yes| F["Reject H₀<br/>Restrictions not<br/>supported by data"]
    E -->|No| G["Fail to reject H₀<br/>Restrictions are<br/>consistent with data"]

    style A fill:#1E5A96,color:#fff
    style F fill:#C41E3A,color:#fff
    style G fill:#2E8B57,color:#fff

Figure 18.2: Decision guide for using the F-test. Start with the hypothesis, identify restricted and unrestricted models, compute F, and compare to the critical value.

18.4 More General Restrictions

The $F$-test handles any linear equality hypothesis, not just “coefficient equals zero.” For example, in the Big Andy’s Burger Barn model, you might test whether $1,900/month is the optimal advertising level by testing $H_0: \beta_3 + 3.8\beta_4 = 1$. To obtain $SSE_R$, substitute the restriction into the original model, rearrange, and run OLS on the transformed equation. The $F$-statistic follows the same formula as Equation 18.1.

Setting up the restricted model

For restrictions of the form $c_1 \beta_2 + c_2 \beta_3 = r$, solve for one coefficient (e.g., $\beta_2 = (r - c_2 \beta_3)/c_1$), substitute into the original model, and rearrange into a form estimable by OLS. The $SSE$ from this transformed regression is $SSE_R$.

18.5 Restricted Least Squares

Sometimes economic theory provides parameter restrictions before looking at the data. For example, demand theory implies that if all prices and income double, quantity demanded should not change (homogeneity of degree zero). If you trust the restriction, you can impose it during estimation rather than test it. Solve the restriction for one parameter, substitute, and estimate by OLS on the transformed model. This yields restricted least squares estimates.

If the restriction is correct, the restricted estimator is unbiased with lower variance than unrestricted OLS. If the restriction is wrong, it introduces bias. Use the $F$-test to check whether the data support the restriction before relying on restricted estimates.

The trade-off of imposing restrictions

Correct restriction $\implies$ lower variance, same bias (efficiency gain). Wrong restriction $\implies$ bias introduced. Always test the restriction before imposing it.

18.6 Practice

A researcher estimates a wage model with EDUC, EXPER, and EXPER$^2$ ($K = 4$, $N = 1000$) and gets $SSE_U = 97{,}882.50$. Dropping both experience terms gives $SSE_R = 100{,}847.00$. Test whether experience is jointly significant at $\alpha = 0.05$.

Show Solution

There are $J = 2$ restrictions ($\beta_3 = 0$ and $\beta_4 = 0$). The $F$-statistic is:

\[ F = \frac{(100{,}847.00 - 97{,}882.50) / 2}{97{,}882.50 / 996} = \frac{2{,}964.50 / 2}{98.27} = \frac{1{,}482.25}{98.27} = 15.08 \]

The critical value $F_{(0.95, 2, 996)} \approx 3.00$. Since $15.08 > 3.00$, we reject $H_0$. The experience terms are jointly significant. Do not drop them, even though individual $t$-tests might be insignificant.

Slides

Download handout slides (PDF)

Download presentation slides with transitions (PDF)