6  Time Series

Modeling Temporal Dependence in Economic Data

Time Series
Forecasting
Serial Correlation
Author

Jake Anderson

Published

March 3, 2026

Modified

March 4, 2026

This chapter introduces econometric methods for data collected over time, where the order of observations matters and yesterday’s values can help predict today’s.

6.1 Motivation

Most of the regression models we’ve studied so far assume that observations are independent draws from some population — like randomly sampling people for a survey. But economic data collected over time (GDP, inflation, stock prices) has a special structure: observations close together in time tend to be similar.

Today’s GDP is a lot like yesterday’s GDP. This month’s inflation is related to last month’s inflation. This dependence across time is called serial correlation (or autocorrelation), and it requires different tools.

6.2 Stationarity

Before we can apply regression to time series data, we need the data to behave “nicely” over time. The key property is covariance stationarity:

  1. Constant mean: \(E(Y_t) = \mu\) for all \(t\)
  2. Constant variance: \(\text{Var}(Y_t) = \sigma^2\) for all \(t\)
  3. Covariance depends only on distance: \(\text{Cov}(Y_t, Y_{t-k})\) depends only on \(k\), not on \(t\)

Why does this matter? If the mean or variance is drifting over time, then a regression estimated on one period won’t apply to another. The coefficients themselves would be unstable. Stationarity ensures that the statistical properties we estimate are meaningful and stable.

Typically no — stock prices tend to drift upward (or downward) over time, so the mean is not constant. However, stock returns (percentage changes) are often approximately stationary. This is why financial economists work with returns rather than prices.

6.3 Autocorrelation

The autocorrelation function (ACF) measures how correlated a series is with its own past values:

\[ \rho_k = \frac{\text{Cov}(Y_t, Y_{t-k})}{\text{Var}(Y_t)} \tag{6.1}\]

  • \(\rho_0 = 1\) always (correlation with itself)
  • \(\rho_1\) = correlation between consecutive observations (lag 1)
  • \(\rho_k\) = correlation between observations \(k\) periods apart

The sample autocorrelation is:

\[ r_k = \frac{\sum_{t=k+1}^{T}(Y_t - \bar{Y})(Y_{t-k} - \bar{Y})}{\sum_{t=1}^{T}(Y_t - \bar{Y})^2} \]

When reading an ACF plot, bars that extend beyond the dashed confidence bands indicate statistically significant autocorrelation at that lag. If residuals from a model show significant ACF spikes, the model is missing dynamics.

6.4 Autoregressive Models

An AR(p) model says that today’s value depends on its own past \(p\) values:

\[ Y_t = \delta + \theta_1 Y_{t-1} + \theta_2 Y_{t-2} + \cdots + \theta_p Y_{t-p} + v_t \tag{6.2}\]

where \(v_t\) is white noise (zero mean, constant variance, no serial correlation).

The simplest case is the AR(1): \(Y_t = \delta + \theta_1 Y_{t-1} + v_t\). Whether this is stationary depends on \(\theta_1\):

  • If \(|\theta_1| < 1\): stationary, shocks fade over time
  • If \(\theta_1 = 1\): random walk (non-stationary)
  • If \(|\theta_1| > 1\): explosive (non-stationary)

The AR(2) adds a second lag: \(Y_t = \delta + \theta_1 Y_{t-1} + \theta_2 Y_{t-2} + v_t\). This allows richer dynamics — oscillations, humps, etc.

6.4.1 Forecasting with AR Models

One-step-ahead forecast from an AR(2):

\[ \hat{Y}_{T+1} = \hat{\delta} + \hat{\theta}_1 Y_T + \hat{\theta}_2 Y_{T-1} \]

Plug in the most recent observed values and the estimated coefficients.

\[\hat{Y}_{T+1} = 0.67 + 0.12(0.8) + (-0.09)(-0.2) = 0.67 + 0.096 + 0.018 = 0.784\]

6.4.2 Forecast Uncertainty

As we forecast further ahead, uncertainty grows:

\[ \hat{Y}_{T+j} \pm t_c \cdot \hat{\sigma}_j \]

where \(\hat{\sigma}_j\) increases with the horizon \(j\). Intuitively, each forecast builds on previous forecasts, compounding the uncertainty. Forecast intervals get wider the further out you go.

6.5 ARDL Models

An ARDL(p, q) model — autoregressive distributed lag — includes lags of both the dependent variable and an explanatory variable:

\[ Y_t = \delta + \theta_1 Y_{t-1} + \cdots + \theta_p Y_{t-p} + \delta_0 X_t + \delta_1 X_{t-1} + \cdots + \delta_q X_{t-q} + v_t \]

As an example, consider an ARDL(1,1) Phillips Curve relating inflation (\(INF\)) to changes in unemployment (\(DU\)):

\[ INF_t = \delta + \theta_1 INF_{t-1} + \delta_0 DU_t + \delta_1 DU_{t-1} + v_t \tag{6.3}\]

6.5.1 Multipliers

The effect of a one-unit change in \(X\) unfolds over time:

Multiplier Formula Interpretation
Impact \(\delta_0\) Immediate effect this period
Interim (1 period) \(\delta_0 + \delta_1\) Cumulative effect after 1 period
Long-run \(\frac{\delta_0 + \delta_1}{1 - \theta_1}\) Total effect after all dynamics play out
  • Impact: \(-0.69\) (a 1-unit rise in \(DU_t\) immediately reduces inflation by 0.69)
  • Interim: \(-0.69 + 0.32 = -0.37\)
  • Long-run: \(\frac{-0.69 + 0.32}{1 - 0.56} = \frac{-0.37}{0.44} = -0.84\)

The long-run effect is larger because the lagged dependent variable propagates the initial shock forward.

6.6 Lagged Dependent Variables

One way to make models dynamic is to include a lagged dependent variable on the right-hand side:

\[ y_t = \beta_1 + \beta_2 y_{t-1} + \beta_3 x_t + e_t \]

The lagged variable \(y_{t-1}\) is a random regressor. As long as it is uncorrelated with the error term \(e_t\), OLS is consistent. However, if the errors are serially correlated, this assumption breaks down.

6.6.1 When \(y_{t-1}\) is Uncorrelated with \(e_t\) (OLS Works)

Example: Stock returns with i.i.d. shocks

Suppose daily stock returns follow: \[ r_t = \alpha + \beta r_{t-1} + e_t \] where \(e_t\) represents news arriving on day \(t\) (earnings surprises, Fed announcements, etc.). If today’s news is genuinely unpredictable and unrelated to past news, then \(e_t\) is i.i.d. and uncorrelated with \(r_{t-1}\). OLS consistently estimates \(\beta\).

Example: GDP growth with white noise errors

If quarterly GDP growth depends on last quarter’s growth plus truly random shocks (weather, one-off events), and these shocks don’t persist, then \(y_{t-1} \perp e_t\) and OLS is fine.

6.6.2 When \(y_{t-1}\) is Correlated with \(e_t\) (OLS Fails)

Now suppose errors follow an AR(1) process: \[ e_t = \rho e_{t-1} + v_t \] where \(v_t\) is white noise. Then \(y_{t-1}\) must be correlated with \(e_t\):

  • \(y_{t-1}\) depends directly on \(e_{t-1}\) (from the original equation at \(t-1\))
  • \(e_t\) depends on \(e_{t-1}\) (from the AR(1) structure)
  • Therefore, if \(\rho \neq 0\), we have \(\text{Cov}(y_{t-1}, e_t) \neq 0\)

Example: Inflation with persistent shocks

Suppose inflation follows: \[ \pi_t = \alpha + \beta \pi_{t-1} + e_t \] If \(e_t\) represents supply shocks (oil prices, supply chain disruptions), these shocks often persist—an oil price spike doesn’t resolve in one period. The shock at \(t-1\) affects both \(\pi_{t-1}\) (directly) and \(e_t\) (through persistence). OLS is biased.

Example: Consumption with habit formation

If consumption shocks reflect slowly-changing preferences or habits, the error term will be serially correlated. Yesterday’s consumption \(c_{t-1}\) was affected by yesterday’s preference shock \(e_{t-1}\), which is correlated with today’s shock \(e_t\).

6.6.3 Why Serial Correlation Breaks OLS

The intuition is straightforward:

  1. \(y_{t-1}\) contains information about \(e_{t-1}\) — since \(y_{t-1} = \beta_1 + \beta_2 y_{t-2} + \beta_3 x_{t-1} + e_{t-1}\)
  2. \(e_{t-1}\) predicts \(e_t\) — when errors are serially correlated (\(\rho \neq 0\))
  3. Therefore \(y_{t-1}\) predicts \(e_t\) — violating exogeneity

OLS attributes some of the error’s persistence to the coefficient \(\beta_2\), biasing it upward when \(\rho > 0\).

WarningTesting is Essential

It is critical to test for serial correlation in models with lagged dependent variables. Standard tests (Durbin-Watson) don’t work here; use the Breusch-Godfrey test instead.

6.7 Testing for Serial Correlation

6.7.1 Breusch-Godfrey Test

The BG test checks whether the model’s residuals are autocorrelated. We prefer it over the Durbin-Watson test because it works even with lagged dependent variables on the right-hand side.

To test up to order \(q\):

  1. Estimate the model by OLS, get residuals \(\hat{e}_t\)
  2. Run the auxiliary regression: \[\hat{e}_t = \gamma_0 + \gamma_1 (\text{all original regressors}) + \rho_1 \hat{e}_{t-1} + \cdots + \rho_q \hat{e}_{t-q} + v_t\]
  3. Test \(H_0: \rho_1 = \rho_2 = \cdots = \rho_q = 0\) using \(LM = N \times R^2\)
  4. Compare to \(\chi^2_{(q)}\)
WarningThe Auxiliary Regression Includes the Original Regressors

A common mistake is to regress \(\hat{e}_t\) on only the lagged residuals. The BG test requires including all regressors from the original model alongside the lagged residuals. This is what makes it valid with lagged dependent variables.

6.8 Model Selection: AIC vs. BIC

When choosing the lag order for an AR or ARDL model, we use information criteria:

\[ \text{AIC} = \ln(\hat{\sigma}^2) + \frac{2K}{T}, \quad \text{BIC} = \ln(\hat{\sigma}^2) + \frac{K \ln(T)}{T} \]

Both penalize model complexity (\(K\) = number of parameters), but BIC penalizes more heavily — it prefers simpler models. We choose the model with the lowest criterion value. When AIC and BIC disagree, BIC is typically preferred for consistent selection.

7 Example: Inflation Forecasting

An AR(2) model for quarterly inflation gives:

Estimate
\(\hat{\delta}\) 0.4523
\(\hat{\theta}_1\) 0.6234
\(\hat{\theta}_2\) 0.2145

With \(INF_T = 2.5\) and \(INF_{T-1} = 3.0\):

(a) Compute the one-step-ahead forecast.

\[\hat{INF}_{T+1} = 0.4523 + 0.6234(2.5) + 0.2145(3.0) = 0.4523 + 1.5585 + 0.6435 = 2.65\]

(b) The researcher compares AR(1) through AR(4) models:

Model AIC BIC
AR(1) 245.23 252.45
AR(2) 242.18 252.67
AR(3) 241.95 255.71
AR(4) 240.23 257.26

Which model should be selected by BIC?

AR(1) — it has the lowest BIC (252.45). Note that AIC would select AR(4) (240.23), illustrating that BIC prefers parsimony. In practice, the AR(1) is preferred unless there is strong theoretical reason for more lags.

7.1 Slide Deck

TipWhat’s next?

When time series methods meet panel data, we get dynamic panel models with the Nickell bias and Arellano-Bond GMM. For supply-and-demand systems, see Simultaneous Equations. To review cross-sectional methods, see Heteroskedasticity.