3 Random Variables and Distributions

The Language of Uncertainty

Probability

Distributions

Author

Jake Anderson

Published

March 21, 2026

Modified

March 26, 2026

Abstract

Before we can build a statistical model, we need a language for describing uncertainty. This chapter reviews random variables (discrete and continuous), probability distributions (PMFs, PDFs, and CDFs), and introduces the normal distribution and standardization. These are the building blocks for every estimator and test in the course.

3.1 Motivation: What Will You Earn?

You are about to graduate from UCLA with an economics degree. What will your starting salary be? You cannot know the exact number, but you can say something useful: “probably between $45,000 and $75,000,” or “most likely around $55,000.” You are already thinking in terms of a probability distribution. Econometrics formalizes this intuition.

3.2 Random Variables

Definition 3.1 (Random Variable) A random variable is a variable whose value is unknown until it is observed. Before you check your first paycheck, $\text{Wage}$ is a random variable. After you see it, $\text{wage} = \$58{,}000$ is a realization.

The convention: uppercase letters ($X$, $Y$) denote the random variable (uncertain); lowercase ($x$, $y$) denote a specific realized value.

Notation reminder: $X$ = random variable (uncertain), $x$ = observed value (fixed). This convention appears throughout the course.

Random variables come in two types. A discrete random variable takes a countable number of values: the number of job offers a graduate receives ($X \in \{0, 1, 2, 3, \ldots\}$), the number of cars owned by a household, or an indicator variable ($D = 1$ if the person is a college graduate, $D = 0$ otherwise). A continuous random variable can take any value in an interval: starting salary, stock returns, GDP growth rate, household expenditure. Most economic variables we study in econometrics (wages, prices, GDP, returns) are treated as continuous.

3.3 Discrete Distributions: The Probability Mass Function

For a discrete random variable, the probability mass function (PMF) gives the probability of each possible value:

\[ f(x) = P(X = x) \tag{3.1}\]

Every PMF must satisfy two properties: $0 \le f(x) \le 1$ for all $x$, and probabilities sum to one ($\sum_{\text{all } x} f(x) = 1$). For example, if the number of job offers $X$ has PMF $f(0) = 0.10$, $f(1) = 0.30$, $f(2) = 0.40$, $f(3) = 0.20$, you can read probabilities directly from the table.

In practice: Discrete distributions arise whenever the outcome is a count or a category. Binary outcomes (0/1) use the Bernoulli distribution; count data use the Poisson or Binomial.

3.4 Continuous Distributions: The Probability Density Function

For continuous random variables, $P(X = x) = 0$ for every single value $x$. (There are uncountably many possible values in any interval; if each got positive probability, the total would be infinite.) Instead, probabilities are areas under a curve:

\[ P(a < X < b) = \text{area under } f(x) \text{ between } a \text{ and } b \tag{3.2}\]

The probability density function (PDF) satisfies $f(x) \ge 0$ for all $x$ and total area under the curve equals 1. Notice that $f(x)$ is not itself a probability; it is a density. Only the area under $f(x)$ over an interval gives a probability.

Density is not probability

For continuous random variables, $f(x)$ can exceed 1 (e.g., a Uniform(0, 0.5) distribution has $f(x) = 2$ on its support). What must equal 1 is the total area, not the height of the curve at any point.

Strict vs Weak Inequalities

For continuous $X$, since $P(X = c) = 0$, strict and weak inequalities give the same answer: $P(a < X < b) = P(a \le X \le b)$.

3.5 The Cumulative Distribution Function

The cumulative distribution function (CDF) gives the probability that $X$ is at most a given value:

\[ F(x) = P(X \le x) \tag{3.3}\]

The CDF works identically for both discrete and continuous random variables. It is always non-decreasing, approaches 0 as $x \to -\infty$, and approaches 1 as $x \to +\infty$. Probabilities over intervals come directly from the CDF: $P(a < X \le b) = F(b) - F(a)$.

flowchart LR
    A["PMF f(x)<br/>(discrete)"] -->|"Sum probabilities"| C["CDF F(x)<br/>= P(X ≤ x)"]
    B["PDF f(x)<br/>(continuous)"] -->|"Integrate"| C
    C -->|"F(b) − F(a)"| D["P(a < X ≤ b)"]
    C -->|"1 − F(x)"| E["P(X > x)"]

    style A fill:#1E5A96,color:#fff
    style B fill:#1E5A96,color:#fff
    style C fill:#D4A84B,color:#fff
    style D fill:#2E8B57,color:#fff
    style E fill:#2E8B57,color:#fff

Figure 3.1: Relationships among the PMF/PDF, CDF, and probability calculations.

3.6 The Normal Distribution

The normal distribution dominates econometrics for three reasons. First, the Central Limit Theorem (covered in Chapter 4) says sample averages are approximately normal for large samples. Since many econometric estimators are functions of averages, normality shows up everywhere. Second, sums of independent normal random variables are normal, which makes deriving estimator distributions tractable. Third, many economic variables (log wages, measurement errors, test scores) have approximately bell-shaped distributions.

Definition 3.2 (Normal Distribution) If $X \sim N(\mu, \sigma^2)$, the distribution is completely determined by two parameters: the mean $\mu$ (center) and the variance $\sigma^2$ (spread). The distribution is symmetric around $\mu$.

The 68-95-99.7 rule applies: about 68% of values fall within one standard deviation, 95% within two, and 99.7% within three.

The 68-95-99.7 rule gives a quick sanity check for any bell-shaped distribution. If a value is more than 3 standard deviations from the mean, it is extremely unusual.

Any normal variable can be converted to a standard normal $Z \sim N(0, 1)$ via:

\[ Z = \frac{X - \mu}{\sigma} \tag{3.4}\]

This centers the variable at zero and scales to unit variance. For any $X \sim N(\mu, \sigma^2)$, probabilities are computed by standardizing and using $\Phi(z) = P(Z \le z)$, the standard normal CDF:

\[ P(a \le X \le b) = \Phi\!\left(\frac{b - \mu}{\sigma}\right) - \Phi\!\left(\frac{a - \mu}{\sigma}\right) \tag{3.5}\]

$\implies$ Every normal probability reduces to: standardize, then look up $\Phi$.

Interactive: distribution explorer

Choose a distribution and adjust its parameters to see how the Probability Density Function (PDF) and Cumulative Distribution Function (CDF) change in real time.

Show code

viewof dist_type = Inputs.select(["Normal", "Exponential", "Uniform"], {label: "Distribution", value: "Normal"})

viewof param1 = Inputs.range(
  dist_type === "Normal" ? [-5, 5] : dist_type === "Exponential" ? [0.1, 5] : [0, 5],
  {value: dist_type === "Normal" ? 0 : dist_type === "Exponential" ? 1 : 0,
   step: 0.1,
   label: dist_type === "Normal" ? "μ (mean)" : dist_type === "Exponential" ? "λ (rate)" : "a (lower bound)"}
)

viewof param2 = Inputs.range(
  dist_type === "Normal" ? [0.5, 5] : dist_type === "Uniform" ? [1, 10] : [0, 1],
  {value: dist_type === "Normal" ? 1 : dist_type === "Uniform" ? 5 : 0,
   step: 0.1,
   label: dist_type === "Normal" ? "σ (std dev)" : dist_type === "Uniform" ? "b (upper bound)" : "(unused)",
   disabled: dist_type === "Exponential"}
)

dist_data = {
  const xs = [];
  let lo, hi;
  if (dist_type === "Normal") {
    lo = param1 - 4 * param2;
    hi = param1 + 4 * param2;
  } else if (dist_type === "Exponential") {
    lo = 0;
    hi = 5 / param1 + 1;
  } else {
    lo = param1 - 0.5;
    hi = (param2 > param1 ? param2 : param1 + 1) + 0.5;
  }
  const step = (hi - lo) / 300;
  for (let x = lo; x <= hi; x += step) {
    let pdf, cdf;
    if (dist_type === "Normal") {
      const z = (x - param1) / param2;
      pdf = Math.exp(-0.5 * z * z) / (param2 * Math.sqrt(2 * Math.PI));
      // approximate CDF with error function approximation
      const t = 1 / (1 + 0.2316419 * Math.abs(z));
      const d = 0.3989422804 * Math.exp(-0.5 * z * z);
      const p = d * t * (0.3193815 + t * (-0.3565638 + t * (1.781478 + t * (-1.821256 + t * 1.330274))));
      cdf = z > 0 ? 1 - p : p;
    } else if (dist_type === "Exponential") {
      pdf = x >= 0 ? param1 * Math.exp(-param1 * x) : 0;
      cdf = x >= 0 ? 1 - Math.exp(-param1 * x) : 0;
    } else {
      const a = param1, b = param2 > param1 ? param2 : param1 + 1;
      pdf = (x >= a && x <= b) ? 1 / (b - a) : 0;
      cdf = x < a ? 0 : x > b ? 1 : (x - a) / (b - a);
    }
    xs.push({x, pdf, cdf});
  }
  return xs;
}

Plot.plot({
  width: 640,
  height: 300,
  x: {label: "x", domain: [-10, 15]},
  y: {label: "f(x)", domain: [0, 1.2]},
  marks: [
    Plot.areaY(dist_data, {x: "x", y: "pdf", fill: "#1E5A96", fillOpacity: 0.2}),
    Plot.line(dist_data, {x: "x", y: "pdf", stroke: "#1E5A96", strokeWidth: 2}),
    Plot.ruleY([0])
  ],
  caption: "Probability Density Function (PDF)"
})

Show code

Plot.plot({
  width: 640,
  height: 300,
  x: {label: "x", domain: [-10, 15]},
  y: {label: "F(x)", domain: [0, 1.05]},
  marks: [
    Plot.line(dist_data, {x: "x", y: "cdf", stroke: "#2E8B57", strokeWidth: 2}),
    Plot.ruleY([0]),
    Plot.ruleY([1], {stroke: "#888", strokeDasharray: "4 4"})
  ],
  caption: "Cumulative Distribution Function (CDF)"
})

(a)

(b)

(c)

(d)

(e)

(f)

Figure 3.2: Distribution explorer: select a distribution and adjust parameters to see the PDF and CDF update live.

3.7 Population vs Sample

A probability distribution describes a population: the complete set of all possible values. The distribution of all possible starting salaries for econ graduates is the population distribution; the salaries of the 200 graduates we surveyed are a sample. Greek letters ($\mu$, $\sigma^2$) denote population parameters (fixed but unknown); Roman letters ($\bar{x}$, $s^2$) denote sample statistics (computed from data, vary from sample to sample). Before we collect the data, the sample mean $\bar{X}$ is itself a random variable with its own distribution. How far $\bar{X}$ is likely to be from $\mu$, and what its distribution looks like, are the questions we take up in Chapter 3 and Chapter 4.

Greek vs Roman: $\mu, \sigma^2$ = population (fixed, unknown). $\bar{x}, s^2$ = sample (computed, random). This convention persists throughout the course.

3.8 Practice

Suppose starting wages follow $\text{Wage} \sim N(55{,}000,\; 10{,}000^2)$. What fraction of graduates earn between $45,000 and $70,000?

Show Solution

Standardize both endpoints: \[z_1 = \frac{45{,}000 - 55{,}000}{10{,}000} = -1.0, \qquad z_2 = \frac{70{,}000 - 55{,}000}{10{,}000} = 1.5\] Look up (or compute): $P(45{,}000 \le \text{Wage} \le 70{,}000) = \Phi(1.5) - \Phi(-1.0) = 0.9332 - 0.1587 = 0.7745$. About 77% of graduates earn between $45,000 and $70,000.

Slides

Download handout slides (PDF)

Download presentation slides with transitions (PDF)