10  Random Effects: The Intuition

Fixed effects solved the omitted variable problem by giving each group its own intercept. But that solution has costs: it eats up degrees of freedom and can’t estimate the effect of anything that doesn’t vary within a group. Random effects takes a different approach — one that’s more efficient when its assumptions hold.

10.1 The Setup

A researcher tracks days to recovery and hours of physical therapy per week across patients at five hospitals: Mercy General, St. Luke’s, Valley Medical, Riverside, and County Memorial. Each hospital has a different baseline recovery time driven by staffing, equipment, and protocols. But the effect of therapy is the same everywhere — more therapy means fewer recovery days.

Fixed effects would estimate five separate intercepts, one per hospital. That’s manageable with five hospitals. But what about 50? Or 500? Each additional hospital adds another parameter to estimate. And if you want to ask “Do teaching hospitals have faster recovery?”, FE can’t answer — teaching status is constant within each hospital, so it gets absorbed.

10.2 From Fixed Parameters to Random Draws

Here’s the conceptual shift. Fixed effects treats each hospital’s baseline \(\alpha_j\) as a fixed, unknown parameter:

\[ y_{ij} = \alpha_j + \beta \, x_{ij} + \varepsilon_{ij} \]

Random effects decomposes \(\alpha_j\) into a population mean and a random deviation:

\[ \alpha_j = \bar{\alpha} + u_j \qquad \text{where } u_j \sim (0, \sigma_u^2) \]

Three pieces to keep straight:

  • \(\bar{\alpha}\) is the average baseline across all hospitals
  • \(u_j\) is hospital \(j\)’s deviation from that average — it plays the same role as \(\alpha_j\), but instead of being a free parameter, it’s a random draw from a distribution
  • \(\sigma_u^2\) measures how spread out hospital baselines are

Instead of estimating five (or five hundred) separate \(\alpha_j\)’s, we estimate \(\bar{\alpha}\) and one variance parameter \(\sigma_u^2\). That’s a dramatic reduction in the number of unknowns.

10.2.1 The Critical Assumption

Random effects requires:

\[ \text{Cov}(u_j, x_{ij}) = 0 \]

Patients at a better (worse) hospital can’t tend to receive more (less) therapy. If hospital quality is correlated with how much therapy patients get — because better hospitals have better protocols, or because sicker patients self-select into better hospitals — then RE is inconsistent.

10.3 The Error Components Model

Substituting \(\alpha_j = \bar{\alpha} + u_j\) into the model:

\[ y_{ij} = \bar{\alpha} + \beta \, x_{ij} + \underbrace{u_j + e_{ij}}_{v_{ij}} \]

The composite error \(v_{ij} = u_j + e_{ij}\) has two parts: a hospital-specific component \(u_j\) (the same for all patients at hospital \(j\)) and an idiosyncratic component \(e_{ij}\) (patient-specific noise).

The problem is that \(v_{ij}\) is not iid. Two patients at the same hospital share \(u_j\), so their errors are correlated. Even if the OLS slope estimate is close, the standard errors are wrong — too small, because OLS treats within-hospital observations as independent when they aren’t.

10.3.1 The Intraclass Correlation

The within-hospital correlation has a clean formula:

\[ \text{Corr}(v_{ij}, v_{kj}) = \frac{\sigma_u^2}{\sigma_u^2 + \sigma_e^2} = \rho \]

This ratio \(\rho\) — the intraclass correlation — tells you what share of total variance is explained by which hospital a patient is in. When \(\rho\) is close to 1, almost all variation is between hospitals. When it’s close to 0, hospitals are basically the same and grouping doesn’t help much.

This is where random effects gets its power. Instead of estimating each hospital in isolation, RE borrows strength from the ensemble (Tukey, 1970; Efron & Morris, 1973): it pulls each hospital’s estimate toward the overall mean, especially when a hospital has few patients. A hospital with only 5 patients gets a better estimate by learning from the other 495. The more hospitals look alike (small \(\rho\)), the more we can borrow.

10.4 RE Estimation: Partial Demeaning

OLS ignores the correlation in \(v_{ij}\). GLS accounts for it by transforming the data. The transformation is controlled by a single parameter:

\[ \hat{\alpha}_j = 1 - \frac{\sigma_e}{\sqrt{N_j \sigma_u^2 + \sigma_e^2}} \]

where \(N_j\) is the number of observations in group \(j\). The RE estimator partially demeans the data:

\[ y_{ij} - \hat{\alpha}_j \bar{y}_j \qquad \text{and} \qquad x_{ij} - \hat{\alpha}_j \bar{x}_j \]

Then you run OLS on the transformed data. This is feasible GLS.

10.4.1 The Spectrum from OLS to FE

What makes RE elegant is that it nests both pooled OLS and fixed effects as special cases:

\(\hat{\alpha}\) Transformation Equivalent to
0 \(y_{ij} - 0 = y_{ij}\) Pooled OLS (no group effect)
Between 0 and 1 \(y_{ij} - \hat{\alpha}_j \bar{y}_j\) RE: weighted average of within and between
1 \(y_{ij} - \bar{y}_j\) FE (full demeaning)

When \(\hat{\alpha}\) is close to 0, RE barely adjusts the data — there’s little group-level variation to worry about. When \(\hat{\alpha}\) is close to 1, RE essentially becomes FE — the group effects are so strong that you need full demeaning. In practice, it’s somewhere in between.

What pushes \(\hat{\alpha}\) toward 1? Two things: large \(\sigma_u^2\) relative to \(\sigma_e^2\) (strong group effects), and large \(N_j\) (many observations per group). With strong group effects or large groups, RE converges toward FE.

As \(\hat{\alpha}\) increases, we demean more aggressively, trusting each hospital’s own data rather than pulling it toward the overall mean.

10.5 When to Use RE vs. FE

The choice comes down to whether you can defend \(\text{Cov}(u_j, x_{ij}) = 0\).

Use RE when:

  1. Groups are random draws from a larger population — you sampled hospitals from all hospitals in the country
  2. There’s no reason to think group effects correlate with regressors — random assignment, natural experiments
  3. You need to estimate effects of group-level variables that don’t vary within group — teaching hospital status, rural vs. urban
  4. Efficiency: RE uses both within and between variation, giving smaller standard errors

Use FE when:

  1. Groups are specific entities you care about — these particular five hospitals, not a random sample
  2. It’s plausible that \(\text{Cov}(u_j, x_{ij}) \neq 0\) — better hospitals may assign more or less therapy
  3. Micro data with individuals or firms, where unobserved heterogeneity almost certainly correlates with regressors
  4. You only care about within-group effects

10.6 The Hausman Test

If RE assumptions hold, both FE and RE are consistent but RE is more efficient. If they fail, only FE is consistent. The Hausman test checks which world we’re in.

The hypotheses:

  • \(H_0\): \(\text{Cov}(u_j, x_{ij}) = 0\) \(\iff\) RE is consistent and more efficient
  • \(H_1\): \(\text{Cov}(u_j, x_{ij}) \neq 0\) \(\iff\) only FE is consistent

The idea is simple: if both estimators are consistent, they should give similar answers. The test statistic for a single regressor is:

\[ t = \frac{\hat{\beta}_{FE} - \hat{\beta}_{RE}}{\sqrt{\widehat{\text{Var}}(\hat{\beta}_{FE}) - \widehat{\text{Var}}(\hat{\beta}_{RE})}} \]

Under \(H_0\), FE is less efficient than RE, so \(\text{Var}(\hat{\beta}_{FE}) > \text{Var}(\hat{\beta}_{RE})\) and the denominator is well-defined. A large test statistic means the two estimators disagree, which means something is wrong with RE.

The decision: reject \(H_0\) (p < 0.05) and use FE, or fail to reject and use RE. With multiple regressors, the test generalizes to a \(\chi^2\) statistic that software handles automatically.

10.6.1 Practical Advice

When in doubt, FE is the safe default. RE is the reward for being able to argue \(\text{Cov}(u_j, x_{ij}) = 0\).