Identification and Estimation When Causation Runs Both Ways
Simultaneous Equations
Identification
2SLS
Author
Jake Anderson
Published
March 3, 2026
Modified
March 4, 2026
This chapter deals with what happens when two or more variables are determined together at the same time.
7.1 Motivation
In most regressions, we think of \(x\) as causing \(y\). But sometimes causation runs both ways simultaneously. Think about supply and demand at a farmers’ market. The price of strawberries depends on how many people want them (demand) and how many farmers brought them (supply). But the quantity sold also depends on the price. So price determines quantity, and quantity determines price — they settle together in equilibrium.
When we observe price and quantity data from a market, each data point is an equilibrium — the intersection of supply and demand. If we just run OLS of \(Q\) on \(P\), we don’t recover the demand curve or the supply curve. We get some confused mixture of both.
7.2 Worked Example: Bruin Boba
7.2.1 Supply and Demand at Bruin Boba
Imagine you track weekly price per cup (\(P_t\)) and cups sold (\(Q_t\)) at Bruin Boba:
Week 1 (normal): Nothing special.
Week 2 (hot week): It’s 90°F all week \(\implies\) lots more students want boba (demand shifts out).
where \(u_D\) is a demand shifter (hot weather, virality, etc.) and \(v_S\) is a supply shifter (input costs). When input costs rise, supply shifts left, which we model as negative \(v_S\).
Week 2 and Week 3 have the same price (\(P=6.0\)) but very different quantities (140 vs 80) because one week is a demand shift and the other is a supply shift. A single OLS line through \((P_t, Q_t)\) can’t recover demand (or supply).
Equilibrium points from shifting demand and supply curves
The gray dashed line shows what OLS estimates when you naively regress \(Q\) on \(P\). It doesn’t match the true demand curve (blue) or the supply curve (red). This is simultaneity bias: because \(P\) and \(Q\) are jointly determined in equilibrium, \(\text{Cov}(P, u_D) \neq 0\) and OLS is biased and inconsistent.
Note
To estimate demand, we need an instrumental variable that shifts supply (like input costs) but is uncorrelated with demand shocks. This isolates movement along the demand curve rather than shifts of it.
7.3 Structural Equations
The structural form describes the behavioral relationships — what economic theory says about how agents make decisions.
Consider the truffle market:
\[
\text{Demand:} \quad Q = \alpha_1 + \alpha_2 P + \alpha_3 PS + \alpha_4 DI + e^d
\tag{7.3}\]
\(P\) = price of truffles, \(Q\) = quantity of truffles
\(PS\) = price of a substitute good (exogenous)
\(DI\) = disposable income of consumers (exogenous)
\(PF\) = price of a factor of production (exogenous)
\(e^d, e^s\) = demand and supply shocks
Endogenous variables (\(P\) and \(Q\)): determined within the system by the intersection of supply and demand. Their values depend on the shocks \(e^d\) and \(e^s\).
Exogenous variables (\(PS\), \(DI\), \(PF\)): determined outside the system. They shift the curves but are not affected by the market equilibrium.
Notice that in the demand equation, \(P\) is endogenous — it’s correlated with \(e^d\) because supply shocks (which affect \(P\) through equilibrium) are baked into the error. OLS requires \(\text{Cov}(P, e^d) = 0\), but here \(\text{Cov}(P, e^d) \neq 0\). So OLS on the structural equations is biased and inconsistent — the same endogeneity problem we saw in the IV chapter.
7.4 Reduced-Form Equations
The reduced form expresses each endogenous variable as a function of only exogenous variables. We get these by solving the structural system.
Derivation: Set demand equal to supply:
\[
\alpha_1 + \alpha_2 P + \alpha_3 PS + \alpha_4 DI + e^d = \beta_1 + \beta_2 P + \beta_3 PF + e^s
\]
The nice thing about reduced forms is that OLS actually works on them — the right-hand side has only exogenous variables, so there’s no endogeneity problem. These reduced-form equations are exactly the first-stage regressions in 2SLS. The reduced-form parameters (\(\pi\)’s) are combinations of structural parameters (\(\alpha\)’s and \(\beta\)’s).
TipThink: For a simpler model where Demand is \(Q = \alpha_1 P + \alpha_2 X + e^d\) and Supply is \(Q = \beta_1 P + e^s\), derive the reduced form for \(P\).
Set demand = supply: \(\alpha_1 P + \alpha_2 X + e^d = \beta_1 P + e^s\)
\[(\alpha_1 - \beta_1)P = -\alpha_2 X + (e^s - e^d)\]
This is the reduced form: \(P = \pi_1 X + v_1\) where \(\pi_1 = \alpha_2 / (\beta_1 - \alpha_1)\).
7.5 The Identification Problem
So can we actually recover the structural parameters? This depends on whether the equations are identified.
7.5.1 The Order Condition
In a system of \(M\) simultaneous equations, an equation is identified if it excludes at least \(M - 1\) variables that appear elsewhere in the system.
Intuition: To trace out the demand curve, we need something that shifts supply but not demand. If supply shifts while demand stays fixed, the equilibrium moves along the demand curve, revealing its slope.
Checking the truffle model:
Demand equation (Equation 7.3): \(PF\) is excluded (it’s in supply but not demand). One exclusion \(\geq M - 1 = 1\). \(\checkmark\) Identified.
Supply equation (Equation 7.4): \(PS\) and \(DI\) are excluded (they’re in demand but not supply). Two exclusions \(\geq M - 1 = 1\). \(\checkmark\) Identified (actually overidentified — more exclusions than needed).
TipThink: What if both equations contained all the same variables?
Then neither equation would exclude anything. We’d have zero exclusions, which is less than \(M - 1 = 1\). Both equations would be unidentified — no estimation method can recover the structural parameters. This is the original supply-and-demand identification problem: without shifters, you can’t tell the curves apart.
7.6 2SLS for Simultaneous Systems
Once we know an equation is identified, we estimate it with two-stage least squares — same method as the IV chapter, just applied to each structural equation separately.
Estimating the demand equation:
First stage: Regress \(P\) on all exogenous variables (\(PS\), \(DI\), \(PF\)) to get \(\hat{P}\)
Second stage: In the demand equation, replace \(P\) with \(\hat{P}\) and estimate by OLS
The instruments for \(P\) in the demand equation are the variables excluded from demand — here, \(PF\) (the supply shifter).
Estimating the supply equation:
First stage: Same regression — \(P\) on all exogenous variables
Second stage: In the supply equation, replace \(P\) with \(\hat{P}\)
The instruments for \(P\) in the supply equation are \(PS\) and \(DI\) (the demand shifters).
Same rule as in the IV chapter: check that the instruments are strong. The F-statistic from the first-stage regression should exceed 10 (Staiger-Stock rule of thumb). Weak instruments lead to biased and imprecise 2SLS estimates here too.
8 Example: The Truffle Market
The truffle market model:
\[\text{Demand:} \quad Q = \alpha_1 + \alpha_2 P + \alpha_3 PS + \alpha_4 DI + e^d\]\[\text{Supply:} \quad Q = \beta_1 + \beta_2 P + \beta_3 PF + e^s\]
(a) Which variables are endogenous and which are exogenous?
TipAnswer
Endogenous:\(P\) and \(Q\) — jointly determined by the intersection of supply and demand.
Exogenous:\(PS\) (price of substitute), \(DI\) (disposable income), \(PF\) (price of factor of production) — determined outside the truffle market.
(b) Check identification for both equations.
TipAnswer
With \(M = 2\) equations, we need at least \(M - 1 = 1\) exclusion per equation.
Demand: Excludes \(PF\) (1 exclusion \(\geq 1\)). \(\checkmark\) Just identified.
(c) A researcher naively runs OLS on the demand equation and gets \(\hat{\alpha}_2 = -0.37\). The 2SLS estimate is \(\hat{\alpha}_2 = -0.53\). Why are they different, and which should we trust?
TipAnswer
They differ because OLS is biased and inconsistent — \(P\) is endogenous, so \(\text{Cov}(P, e^d) \neq 0\). The OLS estimate of \(-0.37\) understates the true price sensitivity of demand (it’s biased toward zero because of simultaneity).
We should trust the 2SLS estimate (\(-0.53\)), which uses \(PF\) as an instrument for \(P\). Since \(PF\) shifts supply but not demand, it isolates movement along the demand curve. Assuming the instrument is valid and strong, 2SLS is consistent.
(d) For the supply equation, a Sargan overidentification test gives \(NR^2 = 0.43\) with 1 degree of freedom. The 5% critical value for \(\chi^2_1\) is 3.84. Interpret.
TipAnswer
\(H_0\): All surplus instruments are valid (uncorrelated with \(e^s\)).
Since \(0.43 < 3.84\), we fail to reject\(H_0\). The surplus instrument appears valid.
Note: The supply equation has \(L = 2\) instruments (\(PS\) and \(DI\)) for \(B = 1\) endogenous variable (\(P\)), so \(L - B = 1\) surplus instrument can be tested.