6 Time Series
Modeling Temporal Dependence in Economic Data
This chapter introduces econometric methods for data collected over time, where the order of observations matters and yesterday’s values can help predict today’s.
6.1 Motivation
Most of the regression models we’ve studied so far assume that observations are independent draws from some population — like randomly sampling people for a survey. But economic data collected over time (GDP, inflation, stock prices) has a special structure: observations close together in time tend to be similar.
Today’s GDP is a lot like yesterday’s GDP. This month’s inflation is related to last month’s inflation. This dependence across time is called serial correlation (or autocorrelation), and it requires different tools.
6.2 Stationarity
Before we can apply regression to time series data, we need the data to behave “nicely” over time. The key property is covariance stationarity:
- Constant mean: \(E(Y_t) = \mu\) for all \(t\)
- Constant variance: \(\text{Var}(Y_t) = \sigma^2\) for all \(t\)
- Covariance depends only on distance: \(\text{Cov}(Y_t, Y_{t-k})\) depends only on \(k\), not on \(t\)
Why does this matter? If the mean or variance is drifting over time, then a regression estimated on one period won’t apply to another. The coefficients themselves would be unstable. Stationarity ensures that the statistical properties we estimate are meaningful and stable.
Typically no — stock prices tend to drift upward (or downward) over time, so the mean is not constant. However, stock returns (percentage changes) are often approximately stationary. This is why financial economists work with returns rather than prices.
6.3 Autocorrelation
The autocorrelation function (ACF) measures how correlated a series is with its own past values:
\[ \rho_k = \frac{\text{Cov}(Y_t, Y_{t-k})}{\text{Var}(Y_t)} \tag{6.1}\]
- \(\rho_0 = 1\) always (correlation with itself)
- \(\rho_1\) = correlation between consecutive observations (lag 1)
- \(\rho_k\) = correlation between observations \(k\) periods apart
The sample autocorrelation is:
\[ r_k = \frac{\sum_{t=k+1}^{T}(Y_t - \bar{Y})(Y_{t-k} - \bar{Y})}{\sum_{t=1}^{T}(Y_t - \bar{Y})^2} \]
When reading an ACF plot, bars that extend beyond the dashed confidence bands indicate statistically significant autocorrelation at that lag. If residuals from a model show significant ACF spikes, the model is missing dynamics.
6.4 Autoregressive Models
An AR(p) model says that today’s value depends on its own past \(p\) values:
\[ Y_t = \delta + \theta_1 Y_{t-1} + \theta_2 Y_{t-2} + \cdots + \theta_p Y_{t-p} + v_t \tag{6.2}\]
where \(v_t\) is white noise (zero mean, constant variance, no serial correlation).
The simplest case is the AR(1): \(Y_t = \delta + \theta_1 Y_{t-1} + v_t\). Whether this is stationary depends on \(\theta_1\):
- If \(|\theta_1| < 1\): stationary, shocks fade over time
- If \(\theta_1 = 1\): random walk (non-stationary)
- If \(|\theta_1| > 1\): explosive (non-stationary)
The AR(2) adds a second lag: \(Y_t = \delta + \theta_1 Y_{t-1} + \theta_2 Y_{t-2} + v_t\). This allows richer dynamics — oscillations, humps, etc.
6.4.1 Forecasting with AR Models
One-step-ahead forecast from an AR(2):
\[ \hat{Y}_{T+1} = \hat{\delta} + \hat{\theta}_1 Y_T + \hat{\theta}_2 Y_{T-1} \]
Plug in the most recent observed values and the estimated coefficients.
\[\hat{Y}_{T+1} = 0.67 + 0.12(0.8) + (-0.09)(-0.2) = 0.67 + 0.096 + 0.018 = 0.784\]
6.4.2 Forecast Uncertainty
As we forecast further ahead, uncertainty grows:
\[ \hat{Y}_{T+j} \pm t_c \cdot \hat{\sigma}_j \]
where \(\hat{\sigma}_j\) increases with the horizon \(j\). Intuitively, each forecast builds on previous forecasts, compounding the uncertainty. Forecast intervals get wider the further out you go.
6.5 ARDL Models
An ARDL(p, q) model — autoregressive distributed lag — includes lags of both the dependent variable and an explanatory variable:
\[ Y_t = \delta + \theta_1 Y_{t-1} + \cdots + \theta_p Y_{t-p} + \delta_0 X_t + \delta_1 X_{t-1} + \cdots + \delta_q X_{t-q} + v_t \]
As an example, consider an ARDL(1,1) Phillips Curve relating inflation (\(INF\)) to changes in unemployment (\(DU\)):
\[ INF_t = \delta + \theta_1 INF_{t-1} + \delta_0 DU_t + \delta_1 DU_{t-1} + v_t \tag{6.3}\]
6.5.1 Multipliers
The effect of a one-unit change in \(X\) unfolds over time:
| Multiplier | Formula | Interpretation |
|---|---|---|
| Impact | \(\delta_0\) | Immediate effect this period |
| Interim (1 period) | \(\delta_0 + \delta_1\) | Cumulative effect after 1 period |
| Long-run | \(\frac{\delta_0 + \delta_1}{1 - \theta_1}\) | Total effect after all dynamics play out |
- Impact: \(-0.69\) (a 1-unit rise in \(DU_t\) immediately reduces inflation by 0.69)
- Interim: \(-0.69 + 0.32 = -0.37\)
- Long-run: \(\frac{-0.69 + 0.32}{1 - 0.56} = \frac{-0.37}{0.44} = -0.84\)
The long-run effect is larger because the lagged dependent variable propagates the initial shock forward.
6.6 Lagged Dependent Variables
One way to make models dynamic is to include a lagged dependent variable on the right-hand side:
\[ y_t = \beta_1 + \beta_2 y_{t-1} + \beta_3 x_t + e_t \]
The lagged variable \(y_{t-1}\) is a random regressor. As long as it is uncorrelated with the error term \(e_t\), OLS is consistent. However, if the errors are serially correlated, this assumption breaks down.
6.6.3 Why Serial Correlation Breaks OLS
The intuition is straightforward:
- \(y_{t-1}\) contains information about \(e_{t-1}\) — since \(y_{t-1} = \beta_1 + \beta_2 y_{t-2} + \beta_3 x_{t-1} + e_{t-1}\)
- \(e_{t-1}\) predicts \(e_t\) — when errors are serially correlated (\(\rho \neq 0\))
- Therefore \(y_{t-1}\) predicts \(e_t\) — violating exogeneity
OLS attributes some of the error’s persistence to the coefficient \(\beta_2\), biasing it upward when \(\rho > 0\).
It is critical to test for serial correlation in models with lagged dependent variables. Standard tests (Durbin-Watson) don’t work here; use the Breusch-Godfrey test instead.
6.7 Testing for Serial Correlation
6.7.1 Breusch-Godfrey Test
The BG test checks whether the model’s residuals are autocorrelated. We prefer it over the Durbin-Watson test because it works even with lagged dependent variables on the right-hand side.
To test up to order \(q\):
- Estimate the model by OLS, get residuals \(\hat{e}_t\)
- Run the auxiliary regression: \[\hat{e}_t = \gamma_0 + \gamma_1 (\text{all original regressors}) + \rho_1 \hat{e}_{t-1} + \cdots + \rho_q \hat{e}_{t-q} + v_t\]
- Test \(H_0: \rho_1 = \rho_2 = \cdots = \rho_q = 0\) using \(LM = N \times R^2\)
- Compare to \(\chi^2_{(q)}\)
A common mistake is to regress \(\hat{e}_t\) on only the lagged residuals. The BG test requires including all regressors from the original model alongside the lagged residuals. This is what makes it valid with lagged dependent variables.
6.8 Model Selection: AIC vs. BIC
When choosing the lag order for an AR or ARDL model, we use information criteria:
\[ \text{AIC} = \ln(\hat{\sigma}^2) + \frac{2K}{T}, \quad \text{BIC} = \ln(\hat{\sigma}^2) + \frac{K \ln(T)}{T} \]
Both penalize model complexity (\(K\) = number of parameters), but BIC penalizes more heavily — it prefers simpler models. We choose the model with the lowest criterion value. When AIC and BIC disagree, BIC is typically preferred for consistent selection.
7 Example: Inflation Forecasting
An AR(2) model for quarterly inflation gives:
| Estimate | |
|---|---|
| \(\hat{\delta}\) | 0.4523 |
| \(\hat{\theta}_1\) | 0.6234 |
| \(\hat{\theta}_2\) | 0.2145 |
With \(INF_T = 2.5\) and \(INF_{T-1} = 3.0\):
(a) Compute the one-step-ahead forecast.
\[\hat{INF}_{T+1} = 0.4523 + 0.6234(2.5) + 0.2145(3.0) = 0.4523 + 1.5585 + 0.6435 = 2.65\]
(b) The researcher compares AR(1) through AR(4) models:
| Model | AIC | BIC |
|---|---|---|
| AR(1) | 245.23 | 252.45 |
| AR(2) | 242.18 | 252.67 |
| AR(3) | 241.95 | 255.71 |
| AR(4) | 240.23 | 257.26 |
Which model should be selected by BIC?
AR(1) — it has the lowest BIC (252.45). Note that AIC would select AR(4) (240.23), illustrating that BIC prefers parsimony. In practice, the AR(1) is preferred unless there is strong theoretical reason for more lags.
7.1 Slide Deck
When time series methods meet panel data, we get dynamic panel models with the Nickell bias and Arellano-Bond GMM. For supply-and-demand systems, see Simultaneous Equations. To review cross-sectional methods, see Heteroskedasticity.