5  The Normal Distribution, Sampling, and the CLT

Why the Bell Curve Shows Up Everywhere

Probability
Distributions
CLT
Author

Jake Anderson

Published

March 21, 2026

Modified

March 26, 2026

Abstract

Household income is right-skewed, yet we constantly assume regression errors are normally distributed. The Central Limit Theorem justifies this: regardless of the population shape, the sample mean is approximately normal for large samples. This chapter develops the CLT through simulation, proves the sampling distribution results, and previews the chi-squared, t, and F distributions that power hypothesis testing.

NotePrerequisites

You should be comfortable with expected value, variance, and the normal distribution from Chapter 2 and Chapter 3.

5.1 The Normal Distribution Revisited

The normal distribution \(N(\mu, \sigma^2)\) is symmetric and bell-shaped, fully determined by two parameters: \(\mu\) (center) and \(\sigma^2\) (spread). Changing \(\mu\) shifts the curve left or right; changing \(\sigma\) makes it wider or narrower. The shape is always the same bell. The 68-95-99.7 rule gives a quick summary: about 68% of values fall within one standard deviation of the mean, 95% within two, and 99.7% within three.

A useful property: linear combinations of independent normal random variables are also normal. If \(X_1 \sim N(\mu_1, \sigma_1^2)\) and \(X_2 \sim N(\mu_2, \sigma_2^2)\) are independent:

\[ Y = a_1 X_1 + a_2 X_2 \sim N\!\left(a_1\mu_1 + a_2\mu_2,\; a_1^2\sigma_1^2 + a_2^2\sigma_2^2\right) \tag{5.1}\]

If \(X_1\) and \(X_2\) are not independent, add the covariance term \(2a_1 a_2 \operatorname{Cov}(X_1, X_2)\) to the variance (recall Theorem 16.1 from Chapter 3). This closure property is essential when we study the sampling distribution of \(\bar{X}\).

Closure property: Sums of normals are normal. This is why the sampling distribution of \(\bar{X}\) is exactly normal when the population is normal, for any sample size.

5.2 Standardization

Any \(X \sim N(\mu, \sigma^2)\) can be converted to a standard normal \(Z \sim N(0,1)\) via \(Z = (X - \mu)/\sigma\). This centers the variable at zero and scales to unit variance. Probabilities for any normal distribution are computed by standardizing both endpoints and looking up the standard normal CDF \(\Phi(z) = P(Z \le z)\), as described in Chapter 2.

5.3 Sampling Distributions

The sample mean \(\bar{X} = \frac{1}{n}\sum_{i=1}^n X_i\) is a function of the random sample, so it is itself a random variable. If we drew a different sample, we would get a different \(\bar{X}\). The sampling distribution of \(\bar{X}\) describes its probability distribution across all possible samples of size \(n\).

Do not confuse: The population distribution is the distribution of individual observations. The sampling distribution is the distribution of a statistic (like \(\bar{X}\)) across repeated samples.

Two results follow directly from the expectation rules in Chapter 3:

Theorem 5.1 (Mean and Variance of the Sample Mean) \[ E(\bar{X}) = \mu \qquad \text{(unbiased: centered at the truth)} \tag{5.2}\]

\[ \operatorname{Var}(\bar{X}) = \frac{\sigma^2}{n} \qquad \text{(shrinks as } n \text{ grows)} \tag{5.3}\]

Larger samples produce more precise estimates. But what shape does the distribution of \(\bar{X}\) have? If the population is normal, then \(\bar{X}\) is a linear combination of independent normal random variables, so \(\bar{X} \sim N(\mu, \sigma^2/n)\) exactly, for any \(n\).

5.4 The Central Limit Theorem

What if the population is not normal? Household income is right-skewed, binary outcomes are discrete, test scores may be bounded. The Central Limit Theorem (CLT) handles all of these cases.

Theorem 5.2 (Central Limit Theorem) Let \(X_1, X_2, \ldots, X_n\) be independent and identically distributed (i.i.d.) random variables with mean \(\mu\) and finite variance \(\sigma^2\). Then as \(n \to \infty\): \[ \frac{\bar{X} - \mu}{\sigma / \sqrt{n}} \;\text{ is approximately distributed as }\; N(0, 1) \tag{5.4}\] Equivalently, \(\bar{X} \overset{\text{approx.}}{\sim} N(\mu, \sigma^2/n)\) for large \(n\).

The population can be skewed, bimodal, discrete, continuous, bounded, or unbounded. It does not need to be normal. The CLT applies as long as the population has finite variance. Simulations confirm this: even drawing from an Exponential(1) distribution (heavily right-skewed), the histogram of \(\bar{X}\) is approximately bell-shaped by \(n = 30\) and very close to normal by \(n = 100\).

WarningHow large is “large enough”?

For symmetric populations, \(n \ge 5\) often suffices. For moderately skewed populations, \(n \ge 30\) is a common rule of thumb. For heavily skewed populations, \(n \ge 100\) or more may be needed. In econometrics, sample sizes are typically in the hundreds or thousands, so the CLT approximation is usually excellent.

\(\implies\) The CLT is the reason we can do inference in econometrics without knowing the true error distribution. The OLS estimator \(\hat{\beta}\) is roughly a weighted average of the data, similar to a sample mean. The CLT tells us \(\hat{\beta}\) is approximately normally distributed in large samples, even if the error terms are not normal.

Interactive: CLT simulator

This is the flagship widget of the course. Choose a population shape, set a sample size, and watch the sampling distribution of \(\bar{X}\) converge to normal as \(n\) increases.

Show code
viewof pop_shape = Inputs.select(["Uniform", "Exponential", "Bimodal"], {label: "Population shape", value: "Exponential"})

viewof clt_n = Inputs.range([1, 200], {value: 5, step: 1, label: "Sample size n"})

clt_reps = 2000

clt_means = {
  const rng = d3.randomLcg(123);
  const means = [];
  for (let r = 0; r < clt_reps; r++) {
    let sum = 0;
    for (let i = 0; i < clt_n; i++) {
      let x;
      if (pop_shape === "Uniform") {
        x = rng();  // Uniform(0,1): μ=0.5, σ²=1/12
      } else if (pop_shape === "Exponential") {
        x = -Math.log(1 - rng());  // Exp(1): μ=1, σ²=1
      } else {
        // Bimodal: mixture of N(-2,0.5²) and N(2,0.5²)
        const component = rng() < 0.5 ? -2 : 2;
        // Box-Muller for normal
        const u1 = rng(), u2 = rng();
        const z = Math.sqrt(-2 * Math.log(u1)) * Math.cos(2 * Math.PI * u2);
        x = component + 0.5 * z;
      }
      sum += x;
    }
    means.push({xbar: sum / clt_n});
  }
  return means;
}

clt_stats = {
  const mu = pop_shape === "Uniform" ? 0.5 : pop_shape === "Exponential" ? 1.0 : 0.0;
  const sigma2 = pop_shape === "Uniform" ? 1/12 : pop_shape === "Exponential" ? 1.0 : 4.25;
  const se = Math.sqrt(sigma2 / clt_n);
  return {mu, se};
}

clt_normal_overlay = {
  const pts = [];
  const lo = clt_stats.mu - 4 * clt_stats.se;
  const hi = clt_stats.mu + 4 * clt_stats.se;
  const step = (hi - lo) / 200;
  for (let x = lo; x <= hi; x += step) {
    const z = (x - clt_stats.mu) / clt_stats.se;
    const density = Math.exp(-0.5 * z * z) / (clt_stats.se * Math.sqrt(2 * Math.PI));
    pts.push({x, density});
  }
  return pts;
}

clt_histogram = {
  const vals = clt_means.map(d => d.xbar);
  const lo = d3.min(vals);
  const hi = d3.max(vals);
  const nBins = 50;
  const binWidth = (hi - lo) / nBins;
  const bins = d3.bin().domain([lo, hi]).thresholds(nBins)(vals);
  return bins.map(b => ({
    x0: b.x0,
    x1: b.x1,
    density: b.length / (clt_reps * binWidth)
  }));
}

Plot.plot({
  width: 640,
  height: 380,
  x: {label: "Sample mean x̄"},
  y: {label: "Density"},
  marks: [
    Plot.rectY(clt_histogram, {x1: "x0", x2: "x1", y: "density", fill: "#1E5A96", fillOpacity: 0.5}),
    Plot.line(clt_normal_overlay, {x: "x", y: "density", stroke: "#E87722", strokeWidth: 2.5}),
    Plot.ruleY([0])
  ],
  caption: `Histogram of ${clt_reps} sample means (blue) vs N(μ, σ²/n) density (orange curve). n = ${clt_n}.`
})
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Figure 5.1: CLT simulator: the distribution of sample means converges to a bell curve regardless of the population shape.

Try it: Start with \(n = 1\) (the histogram matches the population). Slowly increase \(n\) and watch the bell curve emerge. The Exponential case is the most dramatic: a heavily skewed population produces a nearly symmetric sampling distribution by \(n \approx 30\).

5.5 Distributions Built from Normals

The CLT says \(\bar{X}\) is approximately normal in large samples. But there is a practical problem: \(\sigma^2\) is unknown. When we replace \(\sigma\) with the sample standard deviation \(S\), the distribution changes. Three distributions built from normal random variables handle this and related situations.

Let \(Z, Z_1, \ldots, Z_m\) be independent \(N(0,1)\) variables.

\(W = Z_1^2 + \cdots + Z_m^2 \sim \chi^2(m)\). This is the sum of squared standard normals; it is always positive and right-skewed, with \(E(W) = m\) and \(\operatorname{Var}(W) = 2m\).

If \(W \sim \chi^2(m)\) is independent of \(Z\), then \(t = Z / \sqrt{W/m} \sim t(m)\). The \(t\)-distribution is symmetric and bell-shaped like \(N(0,1)\), but with heavier tails. As \(m \to \infty\), \(t(m) \to N(0,1)\).

If \(W_1 \sim \chi^2(m_1)\) and \(W_2 \sim \chi^2(m_2)\) are independent, then \(F = (W_1/m_1) / (W_2/m_2) \sim F(m_1, m_2)\). It is always positive and right-skewed.

flowchart TD
    N["Z ~ N(0,1)<br/>Standard Normal"] --> C["χ² = Z₁² + ... + Zₘ²<br/>Chi-Squared(m)"]
    N --> T["t = Z / √(W/m)<br/>Student's t(m)"]
    C --> T
    C --> F["F = (W₁/m₁) / (W₂/m₂)<br/>F(m₁, m₂)"]

    style N fill:#1E5A96,color:#fff
    style C fill:#D4A84B,color:#fff
    style T fill:#D4A84B,color:#fff
    style F fill:#D4A84B,color:#fff
    style F2 fill:#2E8B57,color:#fff
Figure 5.2: The three distributions derived from the standard normal. Each appears in the inference chapters.

These distributions are the tools for confidence intervals (Chapter 9) and hypothesis tests (Chapter 10). You do not need to memorize their PDFs; you need to know when each arises.

These distributions are the tools for confidence intervals and hypothesis tests, which we develop in Chapter 9 and Chapter 10.

5.6 Practice

A factory fills cereal boxes. The fill weight per box has \(\mu = 368\) g and \(\sigma = 15\) g. The distribution of individual box weights is right-skewed. A quality inspector weighs \(n = 36\) boxes. What is the probability that \(\bar{X} > 375\) g?

By Theorem 5.2: \(\bar{X} \overset{\text{approx.}}{\sim} N(368, 15^2/36) = N(368, 6.25)\). Standardize: \(Z = (375 - 368)/\sqrt{6.25} = 7/2.5 = 2.8\). Then \(P(\bar{X} > 375) = 1 - \Phi(2.8) = 1 - 0.9974 = 0.0026\). Even though individual box weights are skewed, we can compute probabilities about \(\bar{X}\) using the normal distribution.

Slides