4  Expectation, Variance, and Covariance

How Diversification Reduces Risk

Probability
Statistics
Author

Jake Anderson

Published

March 21, 2026

Modified

March 26, 2026

Abstract

Expected value tells you the center of a distribution, variance tells you the spread, and covariance tells you how two variables move together. This chapter develops all three through a portfolio problem that shows why diversification works only when stocks are not perfectly correlated. These formulas are the algebra behind every estimator, standard error, and test statistic in the course.

4.1 Motivation: The Portfolio Problem

Suppose you have $1,000 to invest. Stock A and Stock B both have an average return of 8% and the same volatility. Should you put everything in one stock, or split 50/50? Both options have the same expected return, so the answer depends on how the two stocks move relative to each other. To make this precise, we need three tools: expectation, variance, and covariance.

4.2 Expected Value

Definition 4.1 (Expected Value) The expected value (or mean) of a discrete random variable \(X\) is a probability-weighted average of its possible values: \[ E(X) = \sum_x x \cdot f(x) \tag{4.1}\]

If you repeated the experiment infinitely many times, the average outcome would converge to \(E(X)\). The expected value is a fixed number describing the distribution; do not confuse it with the sample mean \(\bar{x}\), which varies from sample to sample.

\(E(X)\) is a property of the population distribution, not a sample quantity. It is a fixed number, not a random variable. See Section 4.7 for how sample analogs replace population quantities in regression.

The rules for expected values are the foundation for every derivation in this course. Linearity says \(E(aX + b) = aE(X) + b\) for constants \(a\) and \(b\). Additivity says \(E(X + Y) = E(X) + E(Y)\), whether or not \(X\) and \(Y\) are independent. Combined: \(E(aX + bY + c) = aE(X) + bE(Y) + c\). Constants scale and shift the expected value; no surprises here.

NoteWhat About \(E(XY)\)?

If \(X\) and \(Y\) are independent, \(E(XY) = E(X) \cdot E(Y)\). If they are dependent, \(E(XY) \neq E(X) \cdot E(Y)\) in general. The gap between the two has a name: covariance.

4.3 Variance

Definition 4.2 (Variance) The variance measures how spread out a distribution is around its mean: \[ \operatorname{Var}(X) = E\!\left[(X - \mu)^2\right] = E(X^2) - [E(X)]^2 \tag{4.2}\]

The second form (the shortcut formula) is usually easier to compute. The standard deviation \(\sigma_X = \sqrt{\operatorname{Var}(X)}\) has the same units as \(X\). Under a linear transformation:

\[ \operatorname{Var}(aX + b) = a^2 \operatorname{Var}(X) \tag{4.3}\]

Two things to notice: adding a constant does not change spread, and the multiplicative constant is squared. If you double every value, deviations from the mean also double, so squared deviations quadruple.

Quick check: \(\operatorname{Var}(2X + 5) = 4\operatorname{Var}(X)\). The “+5” disappears; the “2” gets squared. This is a frequent exam question.

4.4 Covariance and Correlation

The covariance measures the direction of linear association between two random variables:

\[ \operatorname{Cov}(X, Y) = E\!\left[(X - \mu_X)(Y - \mu_Y)\right] = E(XY) - E(X)E(Y) \tag{4.4}\]

Positive covariance means \(X\) and \(Y\) tend to move in the same direction; negative covariance means they tend to move in opposite directions; zero covariance means no linear association (there may still be nonlinear dependence). Independence implies zero covariance, but the reverse does not hold.

CautionZero covariance does not imply independence

\(\operatorname{Cov}(X, Y) = 0\) means no linear association. \(X\) and \(Y\) can still be strongly related in a nonlinear way. For example, if \(X \sim N(0,1)\) and \(Y = X^2\), then \(\operatorname{Cov}(X, Y) = 0\) but \(Y\) is completely determined by \(X\).

Because the magnitude of covariance depends on the units of \(X\) and \(Y\), we standardize to get a unitless measure:

\[ \rho_{XY} = \operatorname{Corr}(X, Y) = \frac{\operatorname{Cov}(X, Y)}{\sigma_X \cdot \sigma_Y} \tag{4.5}\]

Correlation is bounded between \(-1\) and \(1\). \(|\rho| = 1\) means a perfect linear relationship; \(\rho = 0\) means no linear association.

flowchart TD
    A["E(X): center"] --> D["Var(X) = E(X²) − [E(X)]²"]
    A --> E["Cov(X,Y) = E(XY) − E(X)E(Y)"]
    D --> F["Var(aX+bY)<br/>= a²Var(X) + b²Var(Y)<br/>+ 2ab·Cov(X,Y)"]
    E --> F
    E --> G["β₂ = Cov(X,Y) / Var(X)<br/>(population slope)"]
    D --> G

    style A fill:#1E5A96,color:#fff
    style D fill:#1E5A96,color:#fff
    style E fill:#1E5A96,color:#fff
    style F fill:#D4A84B,color:#fff
    style G fill:#2E8B57,color:#fff
Figure 4.1: How expectation, variance, and covariance relate to each other and to regression.

4.5 Variance of a Linear Combination

Theorem 4.1 (Variance of a Linear Combination) \[ \operatorname{Var}(aX + bY) = a^2\operatorname{Var}(X) + b^2\operatorname{Var}(Y) + 2ab\,\operatorname{Cov}(X, Y) \tag{4.6}\]

The cross-term \(2ab\,\operatorname{Cov}(X, Y)\) is what makes portfolio risk depend on the relationship between assets. Without it, you would always predict \(\operatorname{Var} = a^2\operatorname{Var}(X) + b^2\operatorname{Var}(Y)\), which is correct only when \(\operatorname{Cov}(X,Y) = 0\).

WarningThe covariance term cannot be ignored

“The variance of a sum is the sum of the variances” is only true when \(\operatorname{Cov} = 0\). Forgetting the covariance term is one of the most common errors in this course. For example, \(\operatorname{Var}(X - Y) = \operatorname{Var}(X) + \operatorname{Var}(Y) - 2\operatorname{Cov}(X, Y)\); note that variances still add even for a difference.

Start from the definition: \(\operatorname{Var}(aX + bY) = E[(aX + bY - E(aX + bY))^2]\). Since \(E(aX + bY) = a\mu_X + b\mu_Y\):

\[\operatorname{Var}(aX + bY) = E[a(X - \mu_X) + b(Y - \mu_Y)]^2\]

Expand the square:

\[= E[a^2(X - \mu_X)^2 + 2ab(X - \mu_X)(Y - \mu_Y) + b^2(Y - \mu_Y)^2]\]

Take expectations term by term (linearity of \(E\)):

\[= a^2 E[(X - \mu_X)^2] + 2ab\,E[(X - \mu_X)(Y - \mu_Y)] + b^2 E[(Y - \mu_Y)^2]\]

\[= a^2\operatorname{Var}(X) + 2ab\,\operatorname{Cov}(X, Y) + b^2\operatorname{Var}(Y) \qquad \square\]

4.6 The Portfolio Problem: Solved

With portfolio return \(R_P = wR_A + (1-w)R_B\) and \(\operatorname{Var}(R_A) = \operatorname{Var}(R_B) = 0.04\), the portfolio variance at \(w = 0.5\) is:

\[ \operatorname{Var}(R_P) = 0.02 + 0.5\,\operatorname{Cov}(R_A, R_B) \tag{4.7}\]

How correlation affects portfolio variance. Single-stock variance is 0.04.
Scenario \(\rho_{AB}\) \(\operatorname{Var}(R_P)\)
Perfect co-movement \(+1\) 0.04
Some co-movement \(+0.5\) 0.03
Unrelated \(0\) 0.02
Partial hedge \(-0.5\) 0.01
Perfect hedge \(-1\) 0

Investing everything in one stock gives \(\operatorname{Var}(R_A) = 0.04\). Diversification reduces risk whenever \(\rho < 1\). With \(\rho = -1\), risk is eliminated entirely. Without the covariance term, you would predict \(\operatorname{Var}(R_P) = 0.02\) regardless, missing the entire story.

Interactive: portfolio variance calculator

Adjust the correlation \(\rho\) and weight \(w\) to see how portfolio variance changes. The bar chart decomposes the total into its three components: \(w^2\operatorname{Var}(A)\), \((1-w)^2\operatorname{Var}(B)\), and the \(2w(1-w)\operatorname{Cov}\) term.

Show code
viewof rho = Inputs.range([-1, 1], {value: 0.5, step: 0.05, label: "Correlation ρ"})
viewof w = Inputs.range([0, 1], {value: 0.5, step: 0.01, label: "Weight w in Stock A"})

portfolio_components = {
  const varA = 0.04;
  const varB = 0.04;
  const sigA = Math.sqrt(varA);
  const sigB = Math.sqrt(varB);
  const covAB = rho * sigA * sigB;
  const comp_a = w * w * varA;
  const comp_b = (1 - w) * (1 - w) * varB;
  const comp_cov = 2 * w * (1 - w) * covAB;
  const total = comp_a + comp_b + comp_cov;
  return [
    {component: "w² Var(A)", value: comp_a, color: "#1E5A96"},
    {component: "(1−w)² Var(B)", value: comp_b, color: "#2E8B57"},
    {component: "2w(1−w) Cov(A,B)", value: comp_cov, color: comp_cov >= 0 ? "#D4A84B" : "#C41E3A"}
  ];
}

portfolio_total = d3.sum(portfolio_components, d => d.value)

Plot.plot({
  width: 640,
  height: 300,
  x: {label: "Component", domain: portfolio_components.map(d => d.component)},
  y: {label: "Variance", domain: [Math.min(-0.04, d3.min(portfolio_components, d => d.value) - 0.005), 0.05]},
  color: {domain: portfolio_components.map(d => d.component), range: portfolio_components.map(d => d.color)},
  marks: [
    Plot.barY(portfolio_components, {x: "component", y: "value", fill: "component"}),
    Plot.ruleY([0]),
    Plot.ruleY([portfolio_total], {stroke: "#C41E3A", strokeWidth: 2, strokeDasharray: "6 3"}),
    Plot.text([`Total Var(Rₚ) = ${portfolio_total.toFixed(4)}`],
      {x: "2w(1−w) Cov(A,B)", y: portfolio_total + 0.003, fill: "#C41E3A", fontWeight: "bold", fontSize: 13})
  ]
})
(a)
(b)
(c)
(d)
(e)
Figure 4.2: Portfolio variance decomposition. Adjust ρ and w to see how the covariance term reshapes total risk.

Try it: Set \(\rho = -1\) and \(w = 0.5\). The covariance bar perfectly cancels the variance bars, giving \(\operatorname{Var}(R_P) = 0\): a perfect hedge.

4.7 Connection to Regression

These seven formulas (expected value definition, linearity of \(E\), variance definition, variance scaling, covariance shortcut, correlation, and variance of a sum) are the algebra behind every estimator, standard error, and test statistic in this course. The population regression slope is \(\beta_2 = \operatorname{Cov}(X, Y) / \operatorname{Var}(X)\), and the Ordinary Least Squares (OLS) estimator replaces these with sample analogs. Unbiasedness proofs use \(E(aX + b) = aE(X) + b\). Standard errors use \(\operatorname{Var}(aX + b) = a^2\operatorname{Var}(X)\).

Looking ahead: In Section 4.5 we derived \(\operatorname{Var}(aX + bY)\). In Chapter 7, you will see this exact formula applied to the OLS estimator \(b_2 = \sum w_i y_i\).

4.8 Practice

You invest equally in three uncorrelated stocks (\(w_1 = w_2 = w_3 = 1/3\)), each with \(\operatorname{Var}(R_i) = 0.09\). What is the portfolio variance?

Since all pairwise covariances are zero: \[\operatorname{Var}(R_P) = \left(\frac{1}{3}\right)^2 (0.09) + \left(\frac{1}{3}\right)^2 (0.09) + \left(\frac{1}{3}\right)^2 (0.09) = 3 \times \frac{0.09}{9} = 0.03\] The single-stock variance is 0.09. Splitting equally among three uncorrelated stocks cuts the variance to one-third. This generalizes: with \(n\) uncorrelated, equally weighted stocks, portfolio variance is \(\sigma^2 / n\).

Slides