Variance

Categories: Metrics

Variance is just a way to measure how spread out the numbers in a set of data are.

Large variance means the numbers typically aren't grouped closely together. Think about the United Nations General Assembly. There's literally nothing there but people from every different country. Massive variance. Small variance means the numbers typically are grouped closely together. Think about your local Klan meeting. Everyone dresses the same, and uh, they probably look pretty similar underneath the robes as well. Not much variance in that group.

We all know that some people are attracted to certain "types." Maybe that type is a certain hair color. Maybe that type has to do with height. Maybe it's about IQ. Maybe it's a great sense of humor. Maybe it's a big pair of…eyes. Well, the dating apps often track those characteristics for who gets an "Oh yeah!" or a swipe left, or an "Ugh, pass!"

Then the app starts to only display results that fit your "type." The app will assign number values to different characteristics, like hair color. Then they find the variance in that characteristic as a way to determine if you have a preference in that area. A relatively large variance in a trait means you don't like only a certain type there. A relatively small variance means you probably do have a strong preference.

iHeart-mony (a dating app) assigns numbers to hair colors and shades within those colors. Light blonde is 1. Medium blonde is 2. And so on. Natasha just signed up for iHeart-mony and has been "Oh yeah!"-ing and "Ugh, pass!"-ing like mad. We're gonna peek at her first twenty "Oh yeah!" ratings, just for hair color. We'll have to calculate the mean, i.e. average, of her ratings first, before we can get to a variance; we'll need to add five 1s, eight 2s, three 3s, two 4s, one 7 and one 10. Then we'll divide by 20.

Now we can get to Natasha's variance for hair color. Since we took a small sample of Natasha's furious "Oh yeah!"-ing, we should use the s-squared version. The symbol for a sample variance is s-squared. We use sigma squared for a population variance, and the formula has just n on the bottom, not the n minus 1. In both formulas, n is the number of data points, x-bar is the symbol for the average of a sample, and mu is the symbol for the average of a population. The formula looks ugly, but the process is pretty simple. First, we subtract the mean from each data point, then we square each of those values, then we total those answers, and divide the result by one less than the number of data points…20 – 1 equals 19. Natasha has a variance of 4.9342 in hair color, with a mean of 2.75.

By itself, those numbers don’t tell us much, but let's say we took 20 hair color results from a different iHeart-mony user. Donna's variance is 8.2344, with a mean of 2.84. Both ladies seem to gravitate toward blonds overall, since they both have a mean in the blond region (1 to 3), but Donna's variance is so much higher than Natasha's.

Natasha seems to really prefer hair on the lighter end of the spectrum...and doesn’t like much that's not blond-ish. Donna seems to be all over the place with different hair colors, and only a slight blond preference. It’s likely that hair color isn't a deal-breaker for Donna, but is a deal-breaker for Natasha. Online platforms like Facebook and Google use similar reasoning to help target ads specifically to what you like-slash-comment-on-slash-search-for to personalize your experience.

The servers at Insta-Book assign each tag on a picture to a category number. Over a week or so, the bots determined that Donna had a mean of 41.21 (squarely in the "partying with friends" category) with a variance of 18.45. Natasha, on the other hand, has a mean of 65.83 (smack dab in "outdoor adventure" land) with a variance of 5.21.

What can the bots do with this info to send targeted ads to the two users? Donna's variance is so much higher than Natasha's, indicating that she seems to like a wider variety of, well, stuff. Sending her mostly ads that are related to partying with friends, but with ads from other groups, seems wise. On the other hand, Natasha only seems to care about livin’ la vida...outdoors. She should get ads that are almost exclusively targeted at that lifestyle.

One last bit about variance. Many people mistakenly believe that a higher mean always goes hand in hand with a higher variance. Not true at all. Variance is about how far the points are from the mean, no matter where the mean is.

Anyway, something must be wrong with iHeart-mony's algorithm, because all we get are pics of guys with huge mullets, bell-shaped birthmarks above their right eye, and excessively large ears. They’d better have some awfully good personalities.

Find other enlightening terms in Shmoop Finance Genius Bar(f)