6 Advanced Times Series

6.1 Vector Autoregressions

The models we studied in Chapter 5 treat the variable of interest as dependent on its own past, and possibly on a set of exogenous variables. While this approach is useful, often, economic and financial variables may interact simultaneously rather than in isolation. For example, GDP and consumption influence each other, as do interest rates and inflation, stock returns and exchange rates, industrial production and trade flows, or sales and prices. These relationships are inherently dynamic, bidirectional, and endogenous, meaning that variables mutually influence one another. As a result, relying on a single equation model unrealistically forces the analyst to designate one variable as dependent and treat the others as exogenous. A vector Autoregression (VAR) treats all variables as endogenous. Instead of modeling one equation, we can model a system of equations where each variable depends on its own lags and the lags of all other variables. Formally:

A Vector AutoRegression (VAR) is a set of k time series regressions, in which the regressors are lagged values of all k series. With \(p\) number of lags in each of the equations is the system of equations is called a \(VAR(p)\). For \(k=2\):

\[ Z_t = c_{1} + \phi_{11}Z_{t-1} + \phi_{12}Z_{t-2} + \cdots + \phi_{1p}Z_{t-p} + \gamma_{11}X_{t-1} + \gamma_{12}X_{t-2} + \cdots + \gamma_{1p}X_{t-p} + \varepsilon_{1t} \]

\[ X_t = c_{2} + \phi_{21}Z_{t-1} + \phi_{22}Z_{t-2} + \cdots + \phi_{2p}Z_{t-p} + \gamma_{21}X_{t-1} + \gamma_{22}X_{t-2} + \cdots + \gamma_{2p}X_{t-p} + \varepsilon_{2t} \]

In a VAR with \(k\) variables and \(p\) lags, each equation includes \(k \times p\) regressors. You estimate \(k\) separate regressions, so the system becomes a collection of standard linear regressions with identical regressors.

For instance, a VAR(1) with two variables (\(k=2\)) can be expressed as:

\[ z_t = c_{1} + \phi_{1}z_{t-1} + \gamma_{1}x_{t-1} + \varepsilon_{1t} \]

\[ x_t = c_{2} + \phi_{2}z_{t-1} + \gamma_{2}x_{t-1} + \varepsilon_{2t} \]

The two equations can be estimated separately using ordinary least squares (OLS). The system can be written in matrix form as:

\[ Y = c + AY_{t-1} + \epsilon \]

where:

\[ Y_t = \begin{bmatrix} z_t \\ x_t \end{bmatrix}; \; c = \begin{bmatrix} c_1 \\ c_2 \\ \end{bmatrix}; \; A = \begin{bmatrix} \phi_1 & \gamma_1 \\ \phi_2 & \gamma_2 \end{bmatrix}; \; \epsilon_t = \begin{bmatrix} \varepsilon_{1t} \\ \varepsilon_{2t} \\ \end{bmatrix} \]

Important assumptions:

Stationarity. Transform the data if needed so that the \(k\) series are stationary. It will be discussed further when studying cointegration.
White noise errors
No perfect multicollinearity

Contemporaneous correlation is allowed. Errors across equations can be correlated.

Variables in the system should be plausibly related, so they can meaningfully help forecast one another. Including unrelated variables adds estimation noise without contributing predictive information, ultimately reducing forecast accuracy.

The choice of lag lengths is also fundamental, we’d like to specify models with the ability to capture the dynamic relationships among variables while maintaining parsimony. As studied earlier, information criteria is useful for this purpose, by extending the single-equation information criterion, AIC or BIC, to a system of equations, the optimal lag length is the one that minimizes the criterion. The selection, however, should be complemented with residual diagnostics (e.g., Ljung-Box test) and interpretability, asking ourselves if the lag length make sense given the frequency of the data.

The Stability Condition

The stability condition requires all the eigenvalues of the coefficient matrix to lie inside the unit circle. The coefficient matrix in a VAR indicates how variables affect each other over time. The eigenvalues of this matrix then capture the persistence of the system:

Eigenvalues close to 1 \(\rightarrow\) highly persistent dynamics
Eigenvalues greater than 1 \(\rightarrow\) explosive behavior
Eigenvalues less than 1 \(\rightarrow\) stable, mean-reverting system

In other words, each eigenvalue must have modulus less than 1. To better understand the concept, we may think of eigenvalues as measuring how strongly shocks propagate through the system.

Recall the AR(1) model:

\[ \begin{align} Y_t &= \phi Y_{t-1} + \varepsilon_t \\ &= \phi (\phi Y_{t-2} + \varepsilon_{t-1}) + \varepsilon_t \\ &= \phi^2Y_{t-2} + \varepsilon_t + \phi\varepsilon_{t-1} \\ &= \phi^3Y_{t-3} + \varepsilon_t + \phi\varepsilon_{t-1} + \phi^2 \varepsilon_{t-2} \\ &= \vdots\\ &= \phi^{h+1}Y_{t-(h+1)} + \varepsilon_t + \phi\varepsilon_{t-1} + \phi^2 \varepsilon_{t-2}+ \phi^3 \varepsilon_{t-3} + \cdots + \phi^{h} \varepsilon_{t-h} \\ \text{without loss of generality,} \\ &= \varepsilon_t + \phi\varepsilon_{t-1} + \phi^2 \varepsilon_{t-2}+ \phi^3 \varepsilon_{t-3} + \cdots + \phi^{h} \varepsilon_{t-h} \end{align} \]

The key object is \(\phi^h\). If \(|\phi|<1\), then \(\phi^h \rightarrow 0\) as \(h \rightarrow \infty\), shocks die out. If \(|\phi| \ge 1\), then shocks persist or explode. This is why stationarity in the AR(1) process requires \(∣\phi∣<1\).

Similarly, the VAR(1), expressed as:

\[ Y_t =AY_{t-1} + \epsilon_t \]

can be rewritten as:

\[ Y_t = \epsilon_t + A\epsilon_{t-1} + A^2\epsilon_{t-2} + \cdots \]

They key now is \(A^h\), does \(A^h \rightarrow 0\) as \(h \rightarrow \infty\)?. The matrix \(A\) in a VAR system summarizes two key features of the dynamics, (1) the directions in which the system evolves, captured by the eigenvectors, and (2) the persistence of those dynamics, captured by the eigenvalues. If all eigenvalues have modulus less than 1, then \(A \rightarrow 0\), but if any eigenvalue has modulus \(\ge 1\), \(A^h\) does not decay.

Each eigenvalue tells us how shocks evolve along its corresponding direction. If the modulus of all eigenvalues is less than one, shocks die out over time and the system is stable (stationary).

While stability means that all eigenvalues, \(\lambda_i\) of the companion matrix must satisfy be \(<1\), sometimes statistical software reports the roots, not the eigenvalues directly. These roots are the inverse of the eigenvalues: \(z_i = \frac{1}{\lambda_i}\), so the condition \(\lambda<1\) is equivalent to, roots outside the unit circle, \(z_i >1\).

In short, stability is the multivariate extension of stationarity, it ensures that the system behaves well over time and that the statistical properties needed for inference and forecasting are valid, ensuring that:

Shocks to the system do not explode over time
The effect of shocks gradually dies out
The series fluctuate around a constant mean and variance

Example

Show code

import pandas as pd
import numpy  as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import plotly.graph_objects as go
import yfinance as yf
from statsmodels.tsa.api import VAR
from datetime import datetime
from pandas_datareader import data as pdr
from statsmodels.tsa.stattools import adfuller, kpss
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.stats.diagnostic import acorr_ljungbox

Variable Selection

Show code

vars = ['NGDPRSAXDCMXQ', 'LRHUTTTTMXQ156S']
ts = pdr.DataReader(vars, 'fred', start='1990', end = '2026')
ts = ts.reset_index() 
ts = ts.rename(columns={'DATE': 'Date',
                        'NGDPRSAXDCMXQ': 'GDP',
                        'LRHUTTTTMXQ156S': 'Unemploy'
                        })

ts['Date'] = pd.to_datetime(ts['Date'], dayfirst=True) + pd.offsets.QuarterEnd(0)
ts.set_index(pd.PeriodIndex(ts['Date'], freq='Q'), inplace=True)


ts['gGDP']  = np.log(ts['GDP']).diff() * 100
ts['cUnem'] = ts['Unemploy'].diff()

ts = ts.dropna()

Show code

fig = go.Figure()

# Add observed data
fig.add_trace(go.Scatter(
    x=ts['Date'],
    y=ts['gGDP'],
    mode='lines',
    name='GDP',
    hoverinfo='text',
    hovertemplate='<b>Date:</b> %{x|%Y-%m-%d}<br><b>Value:</b> %{y:.2f}<extra></extra>'
))

fig.add_trace(go.Scatter(
    x=ts['Date'],
    y=ts['Unemploy'],
    mode='lines',
    name='Unemployment',
    hoverinfo='text',
    hovertemplate='<b>Date:</b> %{x|%Y-%m-%d}<br><b>Value:</b> %{y:.2f}<extra></extra>'
))

fig.update_layout(
    title=f'Mexican GDP Growth and Unemployment',
    xaxis_title='Date',
    yaxis_title='Percentage',
    hovermode='x unified'
)

fig.show()

Show code

plt.figure(figsize=(7, 5))
plt.plot(ts['Date'], ts['cUnem'])
plt.plot(ts['Date'], ts['gGDP'])
plt.title('Quarterly Growth of GDP')
plt.xlabel('Date')
plt.ylabel('Percent')
plt.tight_layout()
plt.show()

Stationarity

Show code

adf_p  = adfuller(ts['gGDP'])[1]
kpss_p = kpss(ts['gGDP'], regression='c', nlags='auto')[1]

print(f"ADF p-value: {adf_p:.4f}")
print(f"KPSS p-value: {kpss_p:.4f}")

ADF p-value: 0.0000
KPSS p-value: 0.1000

/var/folders/wl/12fdw3c55777609gp0_kvdrh0000gn/T/ipykernel_69811/2165326570.py:2: InterpolationWarning:

The test statistic is outside of the range of p-values available in the
look-up table. The actual p-value is greater than the p-value returned.

Show code

adf_p  = adfuller(ts['cUnem'])[1]
kpss_p = kpss(ts['cUnem'], regression='c', nlags='auto')[1]

print(f"ADF p-value: {adf_p:.4f}")
print(f"KPSS p-value: {kpss_p:.4f}")

ADF p-value: 0.0000
KPSS p-value: 0.1000

/var/folders/wl/12fdw3c55777609gp0_kvdrh0000gn/T/ipykernel_69811/277256259.py:2: InterpolationWarning:

The test statistic is outside of the range of p-values available in the
look-up table. The actual p-value is greater than the p-value returned.

Estimation

var_data = ts[['gGDP', 'cUnem']]
model = VAR(var_data) # Prepares the structure of a VAR model
lag_selection = model.select_order(maxlags=8)
print(lag_selection.summary())

 VAR Order Selection (* highlights the minimums) 
=================================================
      AIC         BIC         FPE         HQIC   
-------------------------------------------------
0     -0.7597     -0.7140      0.4678     -0.7411
1     -1.090*    -0.9530*     0.3362*     -1.034*
2      -1.073     -0.8446      0.3419     -0.9804
3      -1.036     -0.7164      0.3548     -0.9064
4      -1.003     -0.5919      0.3668     -0.8363
5     -0.9440     -0.4410      0.3895     -0.7397
6     -0.8992     -0.3048      0.4075     -0.6578
7     -0.8392     -0.1533      0.4331     -0.5606
8     -0.7843   -0.006932      0.4581     -0.4685
-------------------------------------------------

FPE criteron measures how well the model that is expected to produce out-of-sample forecast error. The lowest value signals the smallest error. HQIC balances fit and parsimony, sitting between AIC and BIC in how strongly it penalizes model complexity.

The model with lag = 0 represents a model with no dynamics. It is included by default to verify that adding lags actually improves the model. Most criteria here select one lag.

p = lag_selection.selected_orders['aic']  # or 'bic', etc.
results = model.fit(p)
print(results.summary())

  Summary of Regression Results   
==================================
Model:                         VAR
Method:                        OLS
Date:           Thu, 23, Apr, 2026
Time:                     15:25:43
--------------------------------------------------------------------
No. of Equations:         2.00000    BIC:                  -0.861362
Nobs:                     130.000    HQIC:                 -0.939932
Log likelihood:          -298.333    FPE:                   0.370207
AIC:                    -0.993710    Det(Omega_mle):        0.353694
--------------------------------------------------------------------
Results for equation gGDP
===========================================================================
              coefficient       std. error           t-stat            prob
---------------------------------------------------------------------------
const            0.564815         0.233491            2.419           0.016
L1.gGDP         -0.189198         0.103076           -1.836           0.066
L1.cUnem        -0.681201         0.861660           -0.791           0.429
===========================================================================

Results for equation cUnem
===========================================================================
              coefficient       std. error           t-stat            prob
---------------------------------------------------------------------------
const            0.008871         0.026026            0.341           0.733
L1.gGDP         -0.024925         0.011489           -2.169           0.030
L1.cUnem         0.238346         0.096044            2.482           0.013
===========================================================================

Correlation matrix of residuals
             gGDP     cUnem
gGDP     1.000000 -0.620808
cUnem   -0.620808  1.000000

Residual diagnostics

residuals = results.resid

fig = plt.figure(figsize=(7, 7))
gs = fig.add_gridspec(2, 2)

ax1 = fig.add_subplot(gs[0, :]) 
sns.lineplot(x=np.arange(len(residuals['gGDP'])), y=residuals['gGDP'], ax=ax1)
ax1.set_title("Residuals for the gGDP equation")

ax2 = fig.add_subplot(gs[1, 0])
plot_acf(residuals['gGDP'], lags=30, zero=False, ax=ax2)
ax2.set_title("ACF of Residuals")

ax3 = fig.add_subplot(gs[1, 1])
sns.histplot(residuals['gGDP'], kde=True, ax=ax3)
ax3.set_title("Histogram of Residuals")

plt.tight_layout()
plt.show()

fig = plt.figure(figsize=(7, 7))
gs = fig.add_gridspec(2, 2)

ax1 = fig.add_subplot(gs[0, :]) 
sns.lineplot(x=np.arange(len(residuals['cUnem'])), y=residuals['cUnem'], ax=ax1, color='orange')
ax1.set_title("Residuals for the cUnem equation")

ax2 = fig.add_subplot(gs[1, 0])
plot_acf(residuals['cUnem'], lags=30, zero=False, ax=ax2, color='orange')
ax2.set_title("ACF of Residuals")

ax3 = fig.add_subplot(gs[1, 1])
sns.histplot(residuals['cUnem'], kde=True, ax=ax3, color='orange')
ax3.set_title("Histogram of Residuals")

plt.tight_layout()
plt.show()

You could also implement the Ljung-Box test:

lb_test = acorr_ljungbox(residuals['gGDP'], lags=[4], model_df = 2, return_df=True)
print(lb_test)

    lb_stat  lb_pvalue
4  4.813516   0.090107

lb_test = acorr_ljungbox(residuals['cUnem'], lags=[4], model_df = 2, return_df=True)
print(lb_test)

   lb_stat  lb_pvalue
4  0.70236   0.703857

At a 95% confidence level, the null hypothesis of no serial correlation can’t be rejected.

Note that, in principle, the Ljung-Box test should adjust for all dynamic parameters that capture serial dependence in the residuals. In a VAR, this could be captured by the number of lagged regressors in each equation. In practice, some implementations simplify this adjustment, but for consistency with SARIMA models, we counted the number of coefficients for the lagged variables.

Stability

eigvals = 1 / results.roots
print(np.abs(eigvals))

[0.22577981 0.27492769]

print(results.is_stable())

True

plt.figure(figsize=(4,4))
plt.scatter(eigvals.real, eigvals.imag)

circle = plt.Circle((0,0), 1, fill=True, alpha = 0.25)
plt.gca().add_artist(circle)

plt.axhline(0)
plt.axvline(0)

plt.gca().set_aspect('equal', adjustable='box')
plt.xlim(-1.25, 1.25)
plt.ylim(-1.25, 1.25)

plt.title("Eigenvalues (inside unit circle)")
plt.show()

Interpreting the results

print(results.summary())

  Summary of Regression Results   
==================================
Model:                         VAR
Method:                        OLS
Date:           Thu, 23, Apr, 2026
Time:                     15:25:44
--------------------------------------------------------------------
No. of Equations:         2.00000    BIC:                  -0.861362
Nobs:                     130.000    HQIC:                 -0.939932
Log likelihood:          -298.333    FPE:                   0.370207
AIC:                    -0.993710    Det(Omega_mle):        0.353694
--------------------------------------------------------------------
Results for equation gGDP
===========================================================================
              coefficient       std. error           t-stat            prob
---------------------------------------------------------------------------
const            0.564815         0.233491            2.419           0.016
L1.gGDP         -0.189198         0.103076           -1.836           0.066
L1.cUnem        -0.681201         0.861660           -0.791           0.429
===========================================================================

Results for equation cUnem
===========================================================================
              coefficient       std. error           t-stat            prob
---------------------------------------------------------------------------
const            0.008871         0.026026            0.341           0.733
L1.gGDP         -0.024925         0.011489           -2.169           0.030
L1.cUnem         0.238346         0.096044            2.482           0.013
===========================================================================

Correlation matrix of residuals
             gGDP     cUnem
gGDP     1.000000 -0.620808
cUnem   -0.620808  1.000000

Equation 1:

\[ gGDP_t = 0.565 - 0.19\cdot gGDP_{t-1} - 0.69 \cdot cUnem_{t-1} \]

The constant (0.565) indicates an average quarterly growth around 0.6%. The coefficient on lagged GDP (-0.19, p = 0.066), implies a negative relationship, growth spikes are often followed by a decrease, a reversion to the mean. Changes in unemployment do not predict GDP growth as the coefficient is not statistically different from zero.

Equation 2:

\[ cUnem_t = 0.009 - 0.025\cdot gGDP_{t-1} + 0.240 \cdot cUnem_{t-1} \]

The constant is not statistically different from zero, i.e., we cannot reject the hypothesis that the mean change in unemployment is zero. Over time, unemployment does not trend upward or downward systematically. The coefficient on lagged GDP (-0.025), suggest that higher GDP growth reduces unemployment next period. This is an important result consistent Okun’s Law. What does the coefficient on lagged unemployment indicate?

Once a VAR has been estimated, passed the residual diagnostics, and satisfied the stability condition, we can use it either for predictive purposes or to examine relationships among the variables (or both).

6.1.1 Forecasting

The VAR provides a law of motion for the system that can be used to generate joint forecasts of all variables in which each variable is forecast using its own past and the past of all other variables.

Consider a VAR(p) written in matrix form as:

\[ Y_t = c + A_1Y_{t-1} + A_2Y_{t-2} + \cdots + A_pY_{t-p} + \epsilon_t \]

Once the parameters have been estimated, forecasting is obtained by replacing future unknown values with their model-implied expectations conditional on the information available at the forecast origin. Suppose the last observed period is \(T\). The one-period forecast of the system is

\[ \hat{Y}_{T+1|T} = \hat{c} + \hat{A_1}Y_{T} + \hat{A_2}Y_{T-1} + \cdots + + \hat{A_p}Y_{T-p+1} \] the forecast uses only the deterministic part implied by the estimated model because the expected value of the future innovation is zero.

The multi-period forecast follows the same logic, but now a recursion appears. The two-period forecast is

\[ \hat{Y}_{T+2|T} = \hat{c} + \hat{A_1}\hat{Y}_{T+1} + \hat{A_2}Y_{T} + \cdots + + \hat{A_p}Y_{T-p+2} \]

since \(Y_{T+1}\) is not observed at \(T\), it was replaced by the forecast obtained in the previous step. This recursive structure continues for longer horizons.

Important, stability was required in order to ensure that the forecast path is well behaved. In a stable VAR, the effect of shocks will diminish over time, by contrast, forecasts may explode or behave erratically in an unstable VAR, which is why such a model is not suitable for forecasting or for substantive interpretation.

periods = 4
forecast = results.forecast(var_data.values[-p:], steps= periods)

Show code

forecast_df = pd.DataFrame(
    forecast,
    columns=var_data.columns
)

forecast_df

	gGDP	cUnem
0	0.430339	-0.022423
1	0.498671	-0.007199
2	0.475372	-0.005274
3	0.478468	-0.004234

6.1.2 Impulse Response Functions (IRF)

VAR models allow us to describe how economic systems respond to shocks. This is the purpose of impulse response analysis. In its moving average representation, the VAR model can be written as:

\[ Y_t = \mu + \sum_{h=0}^{\infty}\Phi_h \epsilon_{t-h} \]

\(\epsilon_t\): shocks (innovations, news) \(\Phi_h\): describes how shocks propagate over time

Thus, the current value of the system can be interpreted as the accumulation of past shocks and their dynamic effects.

An Impulse Response Function (IRF) traces the effect of a one time shock to one variable on the entire system over time. It allows us to understand how a shock today affects each variable in the system, the direction and magnitude of the response, and how persistent the effect is.

A key complication arises because the residuals, by construction, are correlated, \(Cov(\epsilon_t) = \Sigma \ne I\). As a result, a shock to one equation is not isolated, making interpretation difficult. To address this, we transform the residuals as: \[\epsilon_t = P u_t\] where \(u_t\) is the vector of orthogonal shocks (uncorrelated) and \(P\) is a matrix such that \(PP'=\Sigma\).

This transformation allows us to interpret shocks as independent innovations. Substituting into the Moving Average Representation of the VAR system:

\[ Y_t = \mu + \sum_{h=0}^{\infty}\Psi_h u_{t-h} \]

where: \(\Psi_h=\Phi_hP\). Each \(\Psi_h\) describes the response of the system at horizon \(h\) to a one-unit orthogonal shock to one variable, holding shocks to all other variables equal to zero.

Each column of \(\Psi_h\) corresponds to a specific shock, specifically, column \(i\) is the response to a shock in variable \(i\). The rows represent the responses of all variables in the system.

The impulse response functions for our model are obtained as follows:

Show code

irf = results.irf(10)
irf.plot(figsize=(7,7), signif = 0.10);

Each figure in the panel shows how one variable responds over time to a one-time shock in another variable. The horizon tells us how long the effect lasts.

Suppose Mexico experiences a sudden surge in manufacturing exports due to nearshoring demand from the U.S., stronger than anticipated, how does unemployment respond over the next quarters?

Practice

Apply Vector Autoregression (VAR) techniques to analyze the dynamic interaction between long-term interest rates and market uncertainty. Using data on the 10-year U.S. Treasury yield and the VIX index, this exercise explores how changes in financial conditions and perceived risk evolve jointly over time. Understanding the relationship between interest rates and volatility is essential in finance, as shifts in risk sentiment and term structure dynamics influence asset pricing, portfolio allocation, and hedging strategies.

We will estimate a VAR model, select the appropriate lag length, and evaluate its adequacy through diagnostic and stability tests. Finally, impulse response functions will be used to examine how unexpected changes in long-term yields affect market volatility, and vice versa. The goal of the exercise is to move beyond model estimation and develop the ability to interpret dynamic relationships in financial markets in a rigorous and economically meaningful way.

6.2 Cointegration

If we regress one nonstationary series on another, we may obtain apparently strong results even when the variables are unrelated. This is the problem of spurious regression. This is why we have emphasized the importance of testing for unit roots and, when needed, transforming the data by taking differences before modeling.

This, however, might create an important issue, if two nonstationary variables are linked by an economic equilibrium relationship in the long run, and we difference them mechanically, we may remove the long-run information contained in the levels of the variables, which eliminates precisely the information we care about most.

Take, for instance, income and consumption. Theory suggests both series grow over time, but may not be stationary on its own. Similarly, the prices of closely related financial assets, such as the interest rate on short vs long term bonds (Figure 6.1 top). Even though both series move over time, they do not drift apart indefinitely. Instead, they remain tied to one another by an equilibrium condition, so the spread (\(r_{10y} - r_{3m}\)) is relatively stable (Figure 6.1 bottom). That’s because they have a common stochastic trend.

Figure 6.1: US Treasury Maturity, 3-month vs 10-year

If we difference these variables blindly, they may lose meaningful long-run structure. Instead, we want a framework that tells us whether a long-run relationship exists. Cointegration is designed to address such cases, allowing us to work with variables that are individually nonstationary, while still capturing the possibility that they move together over time in a stable long-run relationship.

6.1 Vector Autoregressions

6.1.1 Forecasting

6.1.2 Impulse Response Functions (IRF)

6.2 Cointegration

6.3 Conditional Heteroskedaticity