Correlation Coefficient

  

Categories: Metrics, Trading

The Correlation Coefficient, or r value, is the mathematical measure of the correlation between two variables, or how close the points on a scatterplot are to the line of best fit as determined by linear regression.

Values of r range from -1 to 1. Values of r closer to -1 and 1 represent data points very close to the best fit line, either with a negative slope (for negative r values) or with a positive slope (positive r). Values of r closer to 0 represent data points farther from the line (and more cloud-like in appearance).

We typically calculate r using technology. Almost no one does it by hand. Seriously, use a graphing calculator or spreadsheet or website to do it for you. If the r-value for data relating annual salary to days of vacation per year is 0.94, we can expect the scatterplot of the data to be a set of points in a nearly perfect line from the lower left of the plot to the upper right. We can also assume there’s a very strong correlation between those two variables. The one variable doesn’t have to cause the other to change, but they are correlated...somehow.

Related or Semi-related Video

Finance: What are correlation coefficien...37 Views

00:00

Finance allah shmoop what are correlation coefficients Kind of sounds

00:08

like a new card game from the makers of cards

00:10

against humanity or an exotic disease that spreads like wildfire

00:15

on a cruise ship you know been there But a

00:17

correlation coefficient is actually a measure of how strongly connected

00:20

or correlated to different variables are It's also a measure

00:24

of how close the points on a scatter plot are

00:26

to the vest Fifth line this thing running through them

00:30

A correlation coefficient is kind of like a ranch hand

00:32

who's in charge of hurting data Okay so let's take

00:35

a closer look at the data points in our corral

00:38

taken from wild pizza restaurant Yeah they're a set of

00:41

by vary it or to variable data In this case

00:45

the data points on the x axis are the number

00:47

of minutes a table has to wait for their food

00:49

since ordering and the data points on the y axis

00:52

are the percentage of the total bill left as a

00:54

tip Interesting correlation here Pete the owner namesake of wild

00:57

pete's pizza believes there's a relationship between how long a

01:00

table waits for the food and how much they tip

01:02

generally the first step in finding a correlation coefficient is

01:05

to determine if the data points are in a roughly

01:07

leaning your pattern So we need to whip up a

01:09

quick scatter plot like this thing If the data points

01:12

don't have an obvious linear pattern lily shouldn't even bother

01:15

to calculate the correlation coefficient because it's not meaningful Once

01:18

there appears to be a linear or roughly linear pattern

01:21

to the data it's time to get calculate their partner

01:24

okay The formula for the correlation coefficient which is denoted

01:27

by the variable are here was a bit unwieldy and

01:30

typically the correlation coefficient calculated using an actual calculator of

01:33

some kind But still it's nice to know where these

01:35

numbers come from so we'll do it by hand and

01:37

double check our work So the process goes like this

01:39

First we find the mean in standard deviation in the

01:42

ecs data in the wide out of treating each set

01:44

of data as its own list separate from each other

01:46

We'll use a calculator just a shortcut this part of

01:49

the process and now we need to take its data

01:51

point in the x list Subtract the mean from it

01:53

and divide that result by the standard deviation so twelve

01:57

months fifteen point one six six seven which is negative

01:59

Three point one six seven divided by five point six

02:01

blah blah blah which is negative about a half then

02:04

twenty minus fifteen point one six seven which is four

02:07

point eight three three divided by five points You bubba

02:09

blah blah blah which is point eight six and change

02:11

and so on But we need the lather rinse Repeat

02:13

that same process of subtracting the mean of the y

02:16

data from each y value and then dividing the standard

02:18

deviation in the y values Right Well that'll be sixteen

02:21

months Fourteen which in california is too divided by three

02:24

point two eight blah blah blah which is point six

02:26

and change So we have thirteen months fourteen which is

02:28

negative one divided by three point two eight six which

02:31

is well negative point three ish So now we need

02:33

to multiply each matched acts And why value from our

02:36

previous calculations That'll be negative Point five six and change

02:39

times a point six blah blah blah which is negative

02:42

Point three four for one Then we have point eight

02:44

six three times negative point three oh four which is

02:47

a negative point two six two Then negative point seven

02:50

four four times one point two one seven two which

02:53

is Well what is that Negative point nine and so

02:56

on Now he's some the values we just got which

02:58

is all this stuff We adam all up and it

03:00

comes out to negative Four point four five five four

03:04

Okay one last step here Cowpokes We just need to

03:06

divide one less than the number of data points We

03:09

have six data points So we divide by negative Four

03:11

point four five five four yeah by five Divide that

03:14

And that means our correlation coefficient or our value is

03:18

negative Point eight nine one one Interesting Excellent Well now

03:22

we have a real correlation coefficient also What does it

03:25

mean Well for starters we can interpret what it actually

03:28

means here Say we did their correlation coefficient or our

03:31

value is a measure of how strong your relationship is

03:34

between the two variables Assuming that linear ish pattern exists

03:37

It does not however mean that the one variable causes

03:40

the other It just means there's some kind of relationship

03:43

between them toe actually put a value on how strong

03:45

the correlation is We need to examine the continuum of

03:48

correlation Positive correlations represent situations where the scatter plot appears

03:52

to climb from left to right Negative correlations represent situations

03:56

where the scatter plot appears Toe fall from left to

03:58

right like our tips versus time data Well strong correlations

04:02

or values between point seven and one for positive correlations

04:06

and between negative point seven and one four negative correlations

04:09

That's just rough Numbers They're about point 7 And if

04:11

it's a one to one relationship it means that if

04:14

you let go of the apple it will fall every

04:16

time we're assuming they're on earth Scatter plot points will

04:20

be pretty darn close to the best fit line through

04:22

the points there medium correlations are in the point for

04:25

two point seven range and they got the negative ones

04:28

And so on Scatter plot points will be a we

04:30

distance from the best fit line Then it's not White

04:33

is tightly packed around that line and then we correlations

04:36

and just looks like a cloud It's like values from

04:38

zero two point for and zero negative point for and

04:41

they're just kind of like maybe there's a line through

04:43

there but maybe not well in our case it's our

04:45

our value is negative point eight nine one one While

04:49

it's very very negatively correlated between the two time of

04:52

ordering the food and when it shows up and the

04:55

tip paid at least the tip percentage of the meal

04:58

Which means that as it takes longer and longer for

05:01

food to arrive after ordering in general the tip percentage

05:04

goes down Also because this pattern is a strong correlation

05:08

this pattern is likely to be predictable in terms of

05:10

a certain weight time leading to a certain percentage A

05:12

while back we mentioned that our values aren't often whipped

05:15

up by hand Instead we use graphing calculator spreadsheets websites

05:18

any of them you know to whip up a mess

05:20

of our values in no time Pop the data into

05:22

the list one into in a t i a graphing

05:25

calculator Go to the count menu in the stat function

05:27

and run a lynn rag Linear regression You know we

05:30

see in our value of ours a negative point eight

05:32

nine one which is very close to our by the

05:35

hand value of point eight nine hundred eleven year negative

05:38

and is on ly different dude around it So yeah

05:40

when you need to rustle up in our value y'all

05:42

should probably grab something Check unless you want to go

05:44

through the headache of finding that our value by hand

05:47

remember that the r value just suggests a relationship between

05:49

the variables revenues saying one causes the other correlation does

05:53

not equal causation Remember that tattoo that somewhere but not

05:57

on your own body Also remember that the stronger correlations

06:00

air closer to negative one in one and farther from

06:02

zero in the middle And finally when they all go

06:05

to a restaurant and takes a spell get your order

06:07

Don't take it out on the server by stiffing them

06:09

on the tip There's a strong positive correlation between stiffing

06:13

service on tips and you know getting your food spat

06:16

in next time And while just being a massive

Up Next

Finance: What is Inverse Correlation?
1 Views

What is inverse correlation? An inverse correlation is a relationship between two variables in which one moves in the opposite direction to the oth...

Find other enlightening terms in Shmoop Finance Genius Bar(f)