Least Squares
Categories: Metrics
Don’t feel bad for the squares because people keep hanging the adjective “least” on them. They’re tough. Big squares don't cry.
Least squares is the name of the process we use to find the best line to represent two-variable (or bivariate) data sets that appear to have some kind of linear relationship. Imagine a scatter plot of bivariate data which is comparing the overall annual review score from management for each employee to that employee’s overall productivity scores, as measured by their actual output of work. When plotted, that data might show an overall pattern that appears to start low on the left and increase as we go to the right, indicating that increasingly positive reviews go hand-in-hand with higher productivity. We might want to put a number (or better yet, an equation) on the possible linear relationship between the evaluation score and the productivity level, so we can use that to help us analyze our business better. Those data points might even look like they follow an imaginary line very closely, or they might kinda wander away from that imaginary line, but still show an overall linear pattern.
When we decide to find the equation for the best possible line to represent those points, unironically called the best-fit line, we look for the line that has the smallest, total vertical distance between all the points and the best-fit line. Some of those points are below the line, giving us negative distances, while points above the line give us positive distances. We square those vertical distances so the negatives don’t mistakenly cancel out the positives.
So...we’re finding the smallest, squared vertical distances (or least squares) to properly locate our best-fit line.