flowchart TD
A["E(X): center"] --> D["Var(X) = E(X²) − [E(X)]²"]
A --> E["Cov(X,Y) = E(XY) − E(X)E(Y)"]
D --> F["Var(aX+bY)<br/>= a²Var(X) + b²Var(Y)<br/>+ 2ab·Cov(X,Y)"]
E --> F
E --> G["β₂ = Cov(X,Y) / Var(X)<br/>(population slope)"]
D --> G
style A fill:#1E5A96,color:#fff
style D fill:#1E5A96,color:#fff
style E fill:#1E5A96,color:#fff
style F fill:#D4A84B,color:#fff
style G fill:#2E8B57,color:#fff
4 Expectation, Variance, and Covariance
How Diversification Reduces Risk
Expected value tells you the center of a distribution, variance tells you the spread, and covariance tells you how two variables move together. This chapter develops all three through a portfolio problem that shows why diversification works only when stocks are not perfectly correlated. These formulas are the algebra behind every estimator, standard error, and test statistic in the course.
4.1 Motivation: The Portfolio Problem
Suppose you have $1,000 to invest. Stock A and Stock B both have an average return of 8% and the same volatility. Should you put everything in one stock, or split 50/50? Both options have the same expected return, so the answer depends on how the two stocks move relative to each other. To make this precise, we need three tools: expectation, variance, and covariance.
4.2 Expected Value
Definition 4.1 (Expected Value) The expected value (or mean) of a discrete random variable \(X\) is a probability-weighted average of its possible values: \[ E(X) = \sum_x x \cdot f(x) \tag{4.1}\]
If you repeated the experiment infinitely many times, the average outcome would converge to \(E(X)\). The expected value is a fixed number describing the distribution; do not confuse it with the sample mean \(\bar{x}\), which varies from sample to sample.
\(E(X)\) is a property of the population distribution, not a sample quantity. It is a fixed number, not a random variable. See Section 4.7 for how sample analogs replace population quantities in regression.
The rules for expected values are the foundation for every derivation in this course. Linearity says \(E(aX + b) = aE(X) + b\) for constants \(a\) and \(b\). Additivity says \(E(X + Y) = E(X) + E(Y)\), whether or not \(X\) and \(Y\) are independent. Combined: \(E(aX + bY + c) = aE(X) + bE(Y) + c\). Constants scale and shift the expected value; no surprises here.
If \(X\) and \(Y\) are independent, \(E(XY) = E(X) \cdot E(Y)\). If they are dependent, \(E(XY) \neq E(X) \cdot E(Y)\) in general. The gap between the two has a name: covariance.
4.3 Variance
Definition 4.2 (Variance) The variance measures how spread out a distribution is around its mean: \[ \operatorname{Var}(X) = E\!\left[(X - \mu)^2\right] = E(X^2) - [E(X)]^2 \tag{4.2}\]
The second form (the shortcut formula) is usually easier to compute. The standard deviation \(\sigma_X = \sqrt{\operatorname{Var}(X)}\) has the same units as \(X\). Under a linear transformation:
\[ \operatorname{Var}(aX + b) = a^2 \operatorname{Var}(X) \tag{4.3}\]
Two things to notice: adding a constant does not change spread, and the multiplicative constant is squared. If you double every value, deviations from the mean also double, so squared deviations quadruple.
Quick check: \(\operatorname{Var}(2X + 5) = 4\operatorname{Var}(X)\). The “+5” disappears; the “2” gets squared. This is a frequent exam question.
4.4 Covariance and Correlation
The covariance measures the direction of linear association between two random variables:
\[ \operatorname{Cov}(X, Y) = E\!\left[(X - \mu_X)(Y - \mu_Y)\right] = E(XY) - E(X)E(Y) \tag{4.4}\]
Positive covariance means \(X\) and \(Y\) tend to move in the same direction; negative covariance means they tend to move in opposite directions; zero covariance means no linear association (there may still be nonlinear dependence). Independence implies zero covariance, but the reverse does not hold.
\(\operatorname{Cov}(X, Y) = 0\) means no linear association. \(X\) and \(Y\) can still be strongly related in a nonlinear way. For example, if \(X \sim N(0,1)\) and \(Y = X^2\), then \(\operatorname{Cov}(X, Y) = 0\) but \(Y\) is completely determined by \(X\).
Because the magnitude of covariance depends on the units of \(X\) and \(Y\), we standardize to get a unitless measure:
\[ \rho_{XY} = \operatorname{Corr}(X, Y) = \frac{\operatorname{Cov}(X, Y)}{\sigma_X \cdot \sigma_Y} \tag{4.5}\]
Correlation is bounded between \(-1\) and \(1\). \(|\rho| = 1\) means a perfect linear relationship; \(\rho = 0\) means no linear association.
4.5 Variance of a Linear Combination
Theorem 4.1 (Variance of a Linear Combination) \[ \operatorname{Var}(aX + bY) = a^2\operatorname{Var}(X) + b^2\operatorname{Var}(Y) + 2ab\,\operatorname{Cov}(X, Y) \tag{4.6}\]
The cross-term \(2ab\,\operatorname{Cov}(X, Y)\) is what makes portfolio risk depend on the relationship between assets. Without it, you would always predict \(\operatorname{Var} = a^2\operatorname{Var}(X) + b^2\operatorname{Var}(Y)\), which is correct only when \(\operatorname{Cov}(X,Y) = 0\).
“The variance of a sum is the sum of the variances” is only true when \(\operatorname{Cov} = 0\). Forgetting the covariance term is one of the most common errors in this course. For example, \(\operatorname{Var}(X - Y) = \operatorname{Var}(X) + \operatorname{Var}(Y) - 2\operatorname{Cov}(X, Y)\); note that variances still add even for a difference.
Start from the definition: \(\operatorname{Var}(aX + bY) = E[(aX + bY - E(aX + bY))^2]\). Since \(E(aX + bY) = a\mu_X + b\mu_Y\):
\[\operatorname{Var}(aX + bY) = E[a(X - \mu_X) + b(Y - \mu_Y)]^2\]
Expand the square:
\[= E[a^2(X - \mu_X)^2 + 2ab(X - \mu_X)(Y - \mu_Y) + b^2(Y - \mu_Y)^2]\]
Take expectations term by term (linearity of \(E\)):
\[= a^2 E[(X - \mu_X)^2] + 2ab\,E[(X - \mu_X)(Y - \mu_Y)] + b^2 E[(Y - \mu_Y)^2]\]
\[= a^2\operatorname{Var}(X) + 2ab\,\operatorname{Cov}(X, Y) + b^2\operatorname{Var}(Y) \qquad \square\]
4.6 The Portfolio Problem: Solved
With portfolio return \(R_P = wR_A + (1-w)R_B\) and \(\operatorname{Var}(R_A) = \operatorname{Var}(R_B) = 0.04\), the portfolio variance at \(w = 0.5\) is:
\[ \operatorname{Var}(R_P) = 0.02 + 0.5\,\operatorname{Cov}(R_A, R_B) \tag{4.7}\]
| Scenario | \(\rho_{AB}\) | \(\operatorname{Var}(R_P)\) |
|---|---|---|
| Perfect co-movement | \(+1\) | 0.04 |
| Some co-movement | \(+0.5\) | 0.03 |
| Unrelated | \(0\) | 0.02 |
| Partial hedge | \(-0.5\) | 0.01 |
| Perfect hedge | \(-1\) | 0 |
Investing everything in one stock gives \(\operatorname{Var}(R_A) = 0.04\). Diversification reduces risk whenever \(\rho < 1\). With \(\rho = -1\), risk is eliminated entirely. Without the covariance term, you would predict \(\operatorname{Var}(R_P) = 0.02\) regardless, missing the entire story.
Interactive: portfolio variance calculator
Adjust the correlation \(\rho\) and weight \(w\) to see how portfolio variance changes. The bar chart decomposes the total into its three components: \(w^2\operatorname{Var}(A)\), \((1-w)^2\operatorname{Var}(B)\), and the \(2w(1-w)\operatorname{Cov}\) term.
Show code
viewof rho = Inputs.range([-1, 1], {value: 0.5, step: 0.05, label: "Correlation ρ"})
viewof w = Inputs.range([0, 1], {value: 0.5, step: 0.01, label: "Weight w in Stock A"})
portfolio_components = {
const varA = 0.04;
const varB = 0.04;
const sigA = Math.sqrt(varA);
const sigB = Math.sqrt(varB);
const covAB = rho * sigA * sigB;
const comp_a = w * w * varA;
const comp_b = (1 - w) * (1 - w) * varB;
const comp_cov = 2 * w * (1 - w) * covAB;
const total = comp_a + comp_b + comp_cov;
return [
{component: "w² Var(A)", value: comp_a, color: "#1E5A96"},
{component: "(1−w)² Var(B)", value: comp_b, color: "#2E8B57"},
{component: "2w(1−w) Cov(A,B)", value: comp_cov, color: comp_cov >= 0 ? "#D4A84B" : "#C41E3A"}
];
}
portfolio_total = d3.sum(portfolio_components, d => d.value)
Plot.plot({
width: 640,
height: 300,
x: {label: "Component", domain: portfolio_components.map(d => d.component)},
y: {label: "Variance", domain: [Math.min(-0.04, d3.min(portfolio_components, d => d.value) - 0.005), 0.05]},
color: {domain: portfolio_components.map(d => d.component), range: portfolio_components.map(d => d.color)},
marks: [
Plot.barY(portfolio_components, {x: "component", y: "value", fill: "component"}),
Plot.ruleY([0]),
Plot.ruleY([portfolio_total], {stroke: "#C41E3A", strokeWidth: 2, strokeDasharray: "6 3"}),
Plot.text([`Total Var(Rₚ) = ${portfolio_total.toFixed(4)}`],
{x: "2w(1−w) Cov(A,B)", y: portfolio_total + 0.003, fill: "#C41E3A", fontWeight: "bold", fontSize: 13})
]
})Try it: Set \(\rho = -1\) and \(w = 0.5\). The covariance bar perfectly cancels the variance bars, giving \(\operatorname{Var}(R_P) = 0\): a perfect hedge.
4.7 Connection to Regression
These seven formulas (expected value definition, linearity of \(E\), variance definition, variance scaling, covariance shortcut, correlation, and variance of a sum) are the algebra behind every estimator, standard error, and test statistic in this course. The population regression slope is \(\beta_2 = \operatorname{Cov}(X, Y) / \operatorname{Var}(X)\), and the Ordinary Least Squares (OLS) estimator replaces these with sample analogs. Unbiasedness proofs use \(E(aX + b) = aE(X) + b\). Standard errors use \(\operatorname{Var}(aX + b) = a^2\operatorname{Var}(X)\).
Looking ahead: In Section 4.5 we derived \(\operatorname{Var}(aX + bY)\). In Chapter 7, you will see this exact formula applied to the OLS estimator \(b_2 = \sum w_i y_i\).
4.8 Practice
You invest equally in three uncorrelated stocks (\(w_1 = w_2 = w_3 = 1/3\)), each with \(\operatorname{Var}(R_i) = 0.09\). What is the portfolio variance?
Since all pairwise covariances are zero: \[\operatorname{Var}(R_P) = \left(\frac{1}{3}\right)^2 (0.09) + \left(\frac{1}{3}\right)^2 (0.09) + \left(\frac{1}{3}\right)^2 (0.09) = 3 \times \frac{0.09}{9} = 0.03\] The single-stock variance is 0.09. Splitting equally among three uncorrelated stocks cuts the variance to one-third. This generalizes: with \(n\) uncorrelated, equally weighted stocks, portfolio variance is \(\sigma^2 / n\).