joypauls.github.io

Why Use Squared Errors?

January 01, 2020

Gist

The function that minimizes the squared error loss function is the conditional mean.

Good Problems, Bad Explanations

I came to statistics and machine learning from a background in proof-driven mathematics. These two are not in opposition at all, however most of my high school and undergrad exposure to stats certainly felt that way. One of the most significant stumbling blocks for me as a beginner was how soon the concept of squared errors came up right away with very little motivation! Many concepts in beginning statistics matches intuition pretty closely so sometimes the background is unnecessary - for me, this was not one of those concepts. While it isn’t the only context this comes up in, I think the best way to explore this is in the context of the regression problem.

A Little Setup: Regression

Let’s consider a minimal version of the regression problem: given our data in the form of a dependent variable YY and an independent variable XX1, we want to find a function ff so that we can model the unknown process that generated our data. Our model looks like this,

Y=f(X)+ϵY = f(X) + \epsilon

where ϵ\epsilon represents the prediction error. We don’t usually want just any function though - we want the best function! As mathematicians, that word “best” should raise a lot of questions, and this is where we need the concept of a loss function. Without going too deep, a loss function LL provides the framework for evaulating the performance of ff and defines what “best” means by answering the question how off is our guess of ff?. We then have two tasks to solve in our simplified regression problem:

  1. Choose an appropriate loss function LL
  2. Find the function ff such that the value of LL is the lowest

In introductory stats, the loss function we use is almost always the squared error loss.

Squared Errors

Minimizing Squared Errors, Conditional Mean

Why not another loss function?



  1. Most of this stuff is generalizable, but for simplicity assume Y,XRY,X \in \mathbb{R}


Written by Joy P a person and stuff, does things occasionally