7  OLS Estimation

How to Draw the Best Line Through a Scatter Plot

Regression
OLS
Estimation
Author

Jake Anderson

Published

March 21, 2026

Modified

March 26, 2026

Abstract

Many lines could be drawn through a scatter plot. Ordinary Least Squares (OLS) picks the one that minimizes the sum of squared residuals. This chapter derives the OLS formulas from first principles, applies them to the food expenditure data, and establishes that estimators are random variables whose properties we can study.

NotePrerequisites

You should be familiar with the simple linear regression model and assumptions SR1–SR6 from Chapter 5.

7.1 The Problem: Which Line?

Given \(N = 40\) data points from the food expenditure dataset, many lines could be drawn through the scatter plot. Three different people would draw three different “best” lines. We need an objective criterion for choosing.

7.2 The Least Squares Principle

For any candidate line \(\hat{y}_i = b_1 + b_2 x_i\), the residual for observation \(i\) is:

\[ \hat{e}_i = y_i - \hat{y}_i = y_i - b_1 - b_2 x_i \tag{7.1}\]

A good line should make these residuals small overall. Why not minimize \(\sum \hat{e}_i\)? Because positive and negative residuals cancel: a terrible line through the middle could have \(\sum \hat{e}_i = 0\). Minimizing \(\sum |\hat{e}_i|\) works (it gives median regression) but has no closed-form solution. Squaring gives us clean calculus.

Why squared? Squaring penalizes large residuals more than small ones, produces a smooth objective function (differentiable everywhere), and yields a closed-form solution. These are computational conveniences, not deep truths.

Definition 7.1 (Sum of Squared Residuals (SSR)) \[ S(b_1, b_2) = \sum_{i=1}^{N} \hat{e}_i^2 = \sum_{i=1}^{N} (y_i - b_1 - b_2 x_i)^2 \tag{7.2}\]

The least squares principle chooses \(b_1\) and \(b_2\) to minimize \(S(b_1, b_2)\). The values that achieve this minimum are the Ordinary Least Squares (OLS) estimators.

CautionErrors vs residuals

Do not confuse \(e_i\) (the true error, which we never observe) with \(\hat{e}_i\) (the residual, which we compute from our fitted line). The error \(e_i = y_i - \beta_1 - \beta_2 x_i\) uses the true (unknown) parameters; the residual \(\hat{e}_i = y_i - b_1 - b_2 x_i\) uses our estimates.

7.3 Deriving the OLS Formulas

Take partial derivatives of \(S(b_1, b_2)\) with respect to \(b_1\) and \(b_2\), set them equal to zero:

\[\frac{\partial S}{\partial b_1} = -2\sum(y_i - b_1 - b_2 x_i) = 0\]

\[\frac{\partial S}{\partial b_2} = -2\sum x_i(y_i - b_1 - b_2 x_i) = 0\]

From the first equation, dividing by \(N\): \(\bar{y} = b_1 + b_2 \bar{x}\), which gives \(b_1 = \bar{y} - b_2 \bar{x}\). Substituting into the second equation and simplifying (replacing \(y_i\) with \(\beta_1 + \beta_2 x_i + e_i\), subtracting means) gives the slope formula in deviation-from-mean form.

From the first-order conditions, dividing by \(N\):

\[ b_1 = \bar{y} - b_2 \bar{x} \tag{7.3}\]

\(\implies\) The fitted line always passes through the point \((\bar{x}, \bar{y})\). Once we find \(b_2\), we get \(b_1\) for free.

Geometric interpretation: The OLS line is the only line through \((\bar{x}, \bar{y})\) that minimizes the sum of squared vertical distances to the data.

Substituting into the second normal equation and simplifying gives the slope in deviation-from-mean form:

Theorem 7.1 (OLS Slope Estimator) \[ b_2 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} \tag{7.4}\]

The numerator measures how \(x\) and \(y\) co-vary around their means; the denominator measures how much \(x\) varies around its mean. In other words, \(b_2 = \text{sample covariance of } x \text{ and } y \; / \; \text{sample variance of } x\), with the \(N-1\) denominators canceling exactly.

\(\implies\) The slope estimate captures how \(y\) moves with \(x\), scaled by how much \(x\) moves on its own.

Interactive: draw your own regression line

Use the sliders to choose an intercept and slope. The Sum of Squared Residuals (SSR) updates in real time. Can you find the values that minimize SSR? Compare your answer to the OLS solution.

Show code
viewof user_b1 = Inputs.range([0, 200], {value: 100, step: 1, label: "Intercept (b₁)"})
viewof user_b2 = Inputs.range([0, 20], {value: 5, step: 0.1, label: "Slope (b₂)"})

ols_scatter_data = [
  {x: 3.69, y: 115.22}, {x: 4.39, y: 135.98}, {x: 4.75, y: 119.34}, {x: 6.03, y: 114.96},
  {x: 12.47, y: 187.05}, {x: 12.98, y: 243.34}, {x: 16.42, y: 267.43}, {x: 17.58, y: 238.71},
  {x: 18.95, y: 295.94}, {x: 20.00, y: 317.78}, {x: 20.18, y: 216.00}, {x: 20.43, y: 269.30},
  {x: 21.41, y: 302.49}, {x: 23.66, y: 325.61}, {x: 24.87, y: 301.58}, {x: 25.13, y: 264.47},
  {x: 27.83, y: 342.75}, {x: 28.96, y: 339.01}, {x: 29.05, y: 365.52}, {x: 33.40, y: 424.96}
]

ols_ssr = {
  let ssr = 0;
  for (const d of ols_scatter_data) {
    const resid = d.y - user_b1 - user_b2 * d.x;
    ssr += resid * resid;
  }
  return ssr;
}

ols_best = {
  const n = ols_scatter_data.length;
  const xbar = d3.mean(ols_scatter_data, d => d.x);
  const ybar = d3.mean(ols_scatter_data, d => d.y);
  const num = d3.sum(ols_scatter_data, d => (d.x - xbar) * (d.y - ybar));
  const den = d3.sum(ols_scatter_data, d => (d.x - xbar) ** 2);
  const b2 = num / den;
  const b1 = ybar - b2 * xbar;
  let ssr = 0;
  for (const d of ols_scatter_data) {
    const resid = d.y - b1 - b2 * d.x;
    ssr += resid * resid;
  }
  return {b1, b2, ssr};
}

Plot.plot({
  width: 640,
  height: 400,
  x: {label: "x (income, $100s)", domain: [0, 36]},
  y: {label: "y (food expenditure, $)", domain: [50, 480]},
  marks: [
    Plot.dot(ols_scatter_data, {x: "x", y: "y", fill: "#1E5A96", r: 5}),
    // User's line
    Plot.line(
      [{x: 0, y: user_b1}, {x: 36, y: user_b1 + user_b2 * 36}],
      {x: "x", y: "y", stroke: "#D4A84B", strokeWidth: 2.5, strokeDasharray: "6 3"}
    ),
    // OLS line
    Plot.line(
      [{x: 0, y: ols_best.b1}, {x: 36, y: ols_best.b1 + ols_best.b2 * 36}],
      {x: "x", y: "y", stroke: "#2E8B57", strokeWidth: 2}
    ),
    // Residual segments for user line
    Plot.link(ols_scatter_data, {
      x1: "x", y1: "y",
      x2: "x", y2: d => user_b1 + user_b2 * d.x,
      stroke: "#C41E3A", strokeOpacity: 0.3
    }),
    Plot.text([`Your SSR: ${ols_ssr.toFixed(0)}`], {x: 28, y: 120, fill: "#D4A84B", fontWeight: "bold", fontSize: 13}),
    Plot.text([`OLS SSR: ${ols_best.ssr.toFixed(0)}`], {x: 28, y: 90, fill: "#2E8B57", fontWeight: "bold", fontSize: 13})
  ],
  caption: "Dashed gold = your line. Solid green = OLS line. Red segments = residuals for your line."
})
(a)
(b)
(c)
(d)
(e)
(f)
Figure 7.1: Draw your own regression line by adjusting the intercept and slope. The SSR updates live. OLS minimizes this quantity.

Challenge: Try to get your SSR below the OLS SSR. You will find it impossible: OLS, by construction, achieves the minimum.

7.4 Applying OLS to the Food Expenditure Data

From the \(N = 40\) observations: \(\bar{x} = 19.60\), \(\bar{y} = 283.57\), \(\sum(x_i - \bar{x})(y_i - \bar{y}) = 18{,}671.27\), and \(\sum(x_i - \bar{x})^2 = 1{,}828.79\). The slope is \(b_2 = 18{,}671.27 / 1{,}828.79 = 10.21\): a $100 increase in weekly income is associated with a $10.21 increase in expected weekly food expenditure. The intercept is \(b_1 = 283.57 - 10.21 \times 19.60 \approx 83.42\). The fitted regression line is \(\hat{y}_i = 83.42 + 10.21 \, x_i\).

7.5 Fitted Values and Residuals

For each observation, OLS produces a fitted value \(\hat{y}_i = b_1 + b_2 x_i\) (the model’s prediction) and a residual \(\hat{e}_i = y_i - \hat{y}_i\) (the prediction error). Every data point decomposes as \(y_i = \hat{y}_i + \hat{e}_i\), splitting the observed value into an explained part and an unexplained part. A household earning $2,000 per week (\(x = 20\)) with actual food expenditure \(y = 350\) gets \(\hat{y} = 287.62\) and \(\hat{e} = 62.38\); it spends $62.38 more on food than the model predicts.

Two properties of OLS residuals: (1) \(\sum \hat{e}_i = 0\): residuals sum to zero. (2) \(\sum x_i \hat{e}_i = 0\): residuals are uncorrelated with \(x\) in the sample. These are consequences of the normal equations.

flowchart LR
    Y["Observed yᵢ"] --> FIT["Fitted ŷᵢ = b₁ + b₂xᵢ<br/>(explained by model)"]
    Y --> RES["Residual êᵢ = yᵢ − ŷᵢ<br/>(unexplained)"]
    FIT --> CHECK["∑êᵢ = 0<br/>∑xᵢêᵢ = 0"]
    RES --> CHECK

    style Y fill:#1E5A96,color:#fff
    style FIT fill:#2E8B57,color:#fff
    style RES fill:#D4A84B,color:#fff
    style CHECK fill:#888,color:#fff
Figure 7.2: How OLS decomposes each observation into a fitted value and a residual.

7.6 Estimators Are Random Variables

The formula \(b_2 = \sum(x_i - \bar{x})(y_i - \bar{y}) / \sum(x_i - \bar{x})^2\) is an estimator (a random variable), while the number \(b_2 = 10.21\) from our particular sample is an estimate (a fixed number). A different random sample of 40 households would give different \(y_i\) values and therefore a different \(b_2\). The estimator has a probability distribution, a mean, and a variance. Understanding these properties is the subject of Chapter 7.

7.7 Practice

A dataset of \(N = 5\) observations has \(\bar{x} = 4\), \(\bar{y} = 10\), \(\sum(x_i - \bar{x})(y_i - \bar{y}) = 20\), and \(\sum(x_i - \bar{x})^2 = 10\). Compute \(b_1\) and \(b_2\), and predict \(\hat{y}\) at \(x_0 = 6\).

\(b_2 = 20/10 = 2\). \(b_1 = 10 - 2 \times 4 = 2\). The fitted line is \(\hat{y} = 2 + 2x\). At \(x_0 = 6\): \(\hat{y}_0 = 2 + 2(6) = 14\).

Slides