21 Treatment Effects and Difference-in-Differences

Everything You’ve Learned, Applied to Policy

Treatment Effects

Difference-in-Differences

Causal Inference

Author

Jake Anderson

Published

March 21, 2026

Modified

March 26, 2026

Abstract

This final chapter applies the full regression toolkit to policy evaluation. The difference estimator measures treatment effects under random assignment. When assignment is not random, difference-in-differences removes common time trends by comparing changes across groups. The Card-Krueger minimum wage study illustrates both the method and the parallel trends assumption that makes it credible.

21.1 The Problem: Did the Policy Work?

On April 1, 1992, New Jersey raised its minimum wage from $4.25 to $5.05. Neighboring Pennsylvania kept its minimum wage at $4.25. The standard prediction is that a higher minimum wage reduces low-wage employment. But how do we measure the causal effect?

We cannot observe what would have happened in NJ without the increase. This unobserved scenario is the counterfactual. A simple before-vs-after comparison in NJ confounds the policy effect with time trends (the national economy, seasonal hiring patterns, changes in consumer demand). We need a strategy that isolates the policy effect from everything else changing at the same time.

The fundamental problem of causal inference: We can never observe the same unit in both treated and untreated states at the same time. Every causal method is a strategy for constructing a credible counterfactual.

21.2 The Difference Estimator and Selection Bias

The simplest treatment effect model is a regression with a dummy variable:

Definition 21.1 (Treatment Effect Model) \[ y_i = \beta_1 + \beta_2 d_i + e_i \tag{21.1}\]

where $d_i = 1$ for treated units and $d_i = 0$ for control units. The OLS estimate $b_2 = \bar{y}_1 - \bar{y}_0$ is the difference in sample means.

For this to be unbiased, we need $E(\bar{e}_1) = E(\bar{e}_0)$: all other factors must be equal on average across groups.

When units choose their treatment status, this condition fails. In the hospital example, sicker people are more likely to visit, so the average unobserved health factor differs between visitors and non-visitors. The difference estimator captures the treatment effect plus this systematic difference, which is selection bias. Random assignment eliminates selection bias by ensuring $E(\bar{e}_1) = E(\bar{e}_0)$.

Selection bias invalidates the simple difference

When treatment is self-selected, $\bar{y}_1 - \bar{y}_0$ confounds the treatment effect with pre-existing differences between groups. This is why observational studies require additional assumptions (like parallel trends) that experiments do not.

21.3 Natural Experiments

True randomized experiments are rare in economics. A natural experiment (or quasi-experiment) is a real-world situation where a policy change creates treatment and control groups that are plausibly comparable without deliberate randomization.

Card and Krueger (1994) exploited NJ’s minimum wage increase as a natural experiment. Both NJ and neighboring PA share similar labor markets, demographics, and fast-food chains. The policy change was not driven by NJ employment conditions. PA serves as a natural control group for NJ.

Card and Krueger (1994): Surveyed 410 fast-food restaurants in NJ and PA before and after NJ’s minimum wage increase. One of the most influential empirical papers in labor economics.

But we still have the time-trend problem. PA employment also changed between February and November 1992. We need a method that accounts for this.

21.4 Difference-in-Differences

Difference-in-Differences (DiD) uses two groups observed in two periods: before and after the policy. The logic proceeds in two steps. First, compute the change within each group over time: the control group’s change captures the time trend only, while the treatment group’s change captures the time trend plus the treatment effect. Second, take the difference of these changes:

Theorem 21.1 (The DiD Estimator) \[ \hat{\delta} = \underbrace{(\bar{y}_{T,\text{After}} - \bar{y}_{T,\text{Before}})}_{\text{treatment change}} - \underbrace{(\bar{y}_{C,\text{After}} - \bar{y}_{C,\text{Before}})}_{\text{control change}} \tag{21.2}\]

The time trend cancels, leaving only the treatment effect.

Using Card and Krueger’s fast-food employment data (410 restaurants, surveyed in February and November 1992):

Card-Krueger 2x2 table. Full-time equivalent (FTE) employees per restaurant.
	Feb 1992	Nov 1992	Change
PA (Control)	23.33	21.17	$-2.17$
NJ (Treatment)	20.44	21.03	$+0.59$

The DiD estimate: $\hat{\delta} = (+0.59) - (-2.17) = 2.75$. Employment in NJ increased by about 2.75 FTE relative to PA. This is the opposite of the standard prediction.

flowchart TD
    A["Treatment group change<br/>Δ_T = trend + treatment effect"] --> C["DiD = Δ_T - Δ_C<br/>= treatment effect"]
    B["Control group change<br/>Δ_C = trend only"] --> C

    style A fill:#1E5A96,color:#fff
    style B fill:#2E8B57,color:#fff
    style C fill:#D4A84B,color:#fff

Figure 21.1: The DiD logic. The control group’s change isolates the time trend. Subtracting it from the treatment group’s change removes the trend, leaving the treatment effect.

21.5 DiD as a Regression

The DiD estimate can be obtained from a single regression:

\[ y_{it} = \beta_1 + \beta_2\,\text{NJ}_i + \beta_3\,\text{After}_t + \delta\,(\text{NJ}_i \times \text{After}_t) + e_{it} \tag{21.3}\]

This uses indicator variables ($\text{NJ}_i$ and $\text{After}_t$) and an interaction term ($\text{NJ}_i \times \text{After}_t$). DiD is not a new technique; it is a regression with two dummies and their interaction.

DiD synthesizes three earlier concepts: indicator variables for group and time, interaction terms for the treatment effect, and F-tests for significance.

The coefficients map directly to the 2x2 table: $\beta_1 = 23.33$ (PA mean before), $\beta_2 = -2.89$ (NJ was 2.89 FTE below PA before the policy), $\beta_3 = -2.17$ (PA employment fell by 2.17, the time trend), and $\delta = 2.75$ (the treatment effect). You can test $H_0: \delta = 0$ with a standard $t$-test.

Coefficient	Formula	Card-Krueger value	Interpretation
$\beta_1$	$\bar{y}_{C, \text{Before}}$	23.33	PA mean FTE before
$\beta_2$	$\bar{y}_{T, \text{Before}} - \bar{y}_{C, \text{Before}}$	$-2.89$	NJ vs PA level difference
$\beta_3$	$\bar{y}_{C, \text{After}} - \bar{y}_{C, \text{Before}}$	$-2.17$	Time trend (from control)
$\delta$	DiD estimate	$+2.75$	Treatment effect

Just like any regression, you can add control variables (chain dummies, regional indicators) to improve precision and test robustness. Card and Krueger found that $\hat{\delta}$ barely changes when controls are added.

Interactive: DiD Estimator

Adjust the four cell means to see how the DiD estimate changes. Toggle “parallel trends violation” to see what happens when the treatment group was already on a different trajectory.

Show code

viewof y_cb = Inputs.range([15, 30], {value: 23.33, step: 0.25, label: "Control, Before"})
viewof y_ca = Inputs.range([15, 30], {value: 21.17, step: 0.25, label: "Control, After"})
viewof y_tb = Inputs.range([15, 30], {value: 20.44, step: 0.25, label: "Treated, Before"})
viewof y_ta = Inputs.range([15, 30], {value: 21.03, step: 0.25, label: "Treated, After"})
viewof violatePT = Inputs.toggle({label: "Parallel trends violation", value: false})

did_result = {
  const cb = y_cb, ca = y_ca, tb = y_tb, ta = y_ta;
  const controlChange = ca - cb;
  const treatChange = ta - tb;
  const did = treatChange - controlChange;

  // Counterfactual: where treatment group would be without treatment
  const counterfactual = tb + controlChange;

  // If parallel trends violated, show a different counterfactual
  const ptViolation = violatePT ? 1.5 : 0; // extra trend for treatment group
  const trueCounterfactual = violatePT ? tb + controlChange + ptViolation : counterfactual;
  const trueTreatmentEffect = violatePT ? ta - trueCounterfactual : did;

  return {cb, ca, tb, ta, controlChange, treatChange, did, counterfactual, trueCounterfactual, trueTreatmentEffect, violatePT: violatePT};
}

Plot.plot({
  width: 550, height: 380,
  marginLeft: 60,
  x: {label: "Period", domain: ["Before", "After"], padding: 0.4},
  y: {label: "FTE Employment", domain: [Math.min(did_result.cb, did_result.tb, did_result.ca, did_result.ta) - 3, Math.max(did_result.cb, did_result.tb, did_result.ca, did_result.ta) + 3]},
  color: {legend: true},
  marks: [
    // Control group line
    Plot.line([
      {x: "Before", y: did_result.cb},
      {x: "After", y: did_result.ca}
    ], {x: "x", y: "y", stroke: "#2E8B57", strokeWidth: 2.5, marker: "circle"}),
    // Treatment group line
    Plot.line([
      {x: "Before", y: did_result.tb},
      {x: "After", y: did_result.ta}
    ], {x: "x", y: "y", stroke: "#1E5A96", strokeWidth: 2.5, marker: "circle"}),
    // Counterfactual line (dashed)
    Plot.line([
      {x: "Before", y: did_result.tb},
      {x: "After", y: did_result.counterfactual}
    ], {x: "x", y: "y", stroke: "#888", strokeWidth: 1.5, strokeDasharray: "6,4"}),
    // True counterfactual if PT violated
    did_result.violatePT ? Plot.line([
      {x: "Before", y: did_result.tb},
      {x: "After", y: did_result.trueCounterfactual}
    ], {x: "x", y: "y", stroke: "#C41E3A", strokeWidth: 1.5, strokeDasharray: "3,3"}) : null,
    // Labels
    Plot.text([{x: "After", y: did_result.ca}], {x: "x", y: "y", text: d => "Control", dx: 35, fill: "#2E8B57"}),
    Plot.text([{x: "After", y: did_result.ta}], {x: "x", y: "y", text: d => "Treated", dx: 35, fill: "#1E5A96"}),
    Plot.text([{x: "After", y: did_result.counterfactual}], {x: "x", y: "y", text: d => "Counterfactual", dx: 55, fill: "#888"})
  ]
})

Show code

html`<div style="padding:1em; background:#f8f8f8; border-radius:8px; margin-top:0.5em">
  <strong>Control change:</strong> ${did_result.controlChange.toFixed(2)} &nbsp;|&nbsp;
  <strong>Treatment change:</strong> ${did_result.treatChange.toFixed(2)}<br/>
  <strong>DiD estimate (δ̂):</strong> ${did_result.did.toFixed(2)}
  ${did_result.violatePT ? html`<br/><span style="color:#C41E3A"><strong>Parallel trends violated:</strong> true treatment effect is ${did_result.trueTreatmentEffect.toFixed(2)}, but DiD estimates ${did_result.did.toFixed(2)}. The bias is ${(did_result.did - did_result.trueTreatmentEffect).toFixed(2)}.</span>` : html`<br/><em>Gray dashed line: counterfactual (where treated group would be without treatment, assuming parallel trends).</em>`}
</div>`

Toggle the parallel trends violation to see how DiD breaks down. When the treatment group has a steeper underlying trend, DiD overstates the treatment effect because it attributes the trend difference to the policy.

21.6 The Parallel Trends Assumption

DiD assumes that in the absence of treatment, both groups would have experienced the same change over time. This is the parallel trends assumption (also called common trends). The levels can differ; the trends must be the same.

Parallel trends is untestable

The assumption involves the counterfactual (what would have happened), which we never observe. We can check plausibility with pre-treatment data, but we cannot prove it holds.

When parallel trends hold, the control group’s change serves as a valid counterfactual for the treatment group. When they fail, DiD attributes pre-existing trend differences to the policy, producing a biased estimate.

Strictly, parallel trends is untestable because it involves the counterfactual (what would have happened), which we never observe. But we can check plausibility by plotting pre-treatment trends for both groups, running placebo tests on periods before the actual treatment, and adding controls for observable differences that might cause divergent trends. Strong pre-treatment evidence makes the assumption more convincing.

21.7 DiD with Panel Data

Card and Krueger observed 384 restaurants in both periods. With panel data, first-differencing eliminates unobserved restaurant-specific characteristics ($c_i$: location quality, manager ability, etc.):

\[ \Delta \text{FTE}_i = \beta_3 + \delta\,\text{NJ}_i + \Delta e_i \]

The restaurant fixed effect drops out along with $\beta_1$ and $\beta_2\,\text{NJ}_i$ (which is time-invariant). The estimate $\hat{\delta} = 2.75$ confirms the earlier DiD result. This is the same idea as the fixed effects estimator, which is covered fully in Econ 104.

Panel data advantage: First-differencing removes all time-invariant unobserved heterogeneity. This is more credible than cross-sectional DiD if restaurant-level factors affect employment.

21.8 What Comes Next?

This course provided the complete regression toolkit: estimation, inference, functional forms, indicator variables, interaction terms, and policy evaluation. Econ 104 extends these ideas to settings where standard OLS breaks down:

Panel data and fixed effects: you just saw first-differencing eliminate restaurant fixed effects; panel methods build on exactly this idea
Instrumental variables: what to do when a regressor is endogenous ($\text{Cov}(x, e) \neq 0$)
Time series: forecasting, autocorrelation, stationarity
Limited dependent variables: binary outcomes (logit/probit), counts, censoring

Beyond Econ 104, the frontier includes regression discontinuity designs, synthetic control methods, and the intersection of machine learning with causal inference. You now have the foundation for all of it.

flowchart LR
    A["SLR<br/>(Ch 5-10)"] --> B["Functional Forms<br/>(Ch 12)"]
    A --> C["Multiple Regression<br/>(Ch 13-15)"]
    C --> D["Interactions<br/>(Ch 16)"]
    C --> E["F-Tests<br/>(Ch 17)"]
    C --> F["Indicators<br/>(Ch 19)"]
    D --> G["DiD<br/>(Ch 20)"]
    E --> G
    F --> G
    G --> H["Econ 104:<br/>Panel, IV, TS"]

    style G fill:#1E5A96,color:#fff
    style H fill:#D4A84B,color:#fff

Figure 21.3: Where DiD fits in the course. Each concept builds on earlier ones. DiD synthesizes indicators, interactions, and hypothesis testing into a causal inference tool.

21.9 Practice

A state implements a job training program. You observe average monthly earnings for participants (treatment) and non-participants (control) before and after the program:

	Before	After
Control	$1,500	$1,600
Treatment	$1,200	$1,550

Compute the DiD estimate of the training effect. What assumption is required for this estimate to be causal?

Show Solution

The DiD estimate is:

\[ \hat{\delta} = (1550 - 1200) - (1600 - 1500) = 350 - 100 = 250 \]

The training program increased monthly earnings by an estimated $250. This estimate is causal only if the parallel trends assumption holds: absent the program, the treatment group’s earnings would have grown by the same $100 as the control group. If the treatment group was already on a steeper trajectory (e.g., because they were more motivated), the DiD estimate overstates the program’s effect.

Slides

Download handout slides (PDF)

Download presentation slides with transitions (PDF)