Regression Analysis

Categories: Regulations, Trading

Regression analysis. No. It’s not a therapy session in which your psychiatrist tries to figure out why you’ve gone back to using pacifiers. It’s simply this: the process by which a series of different independent variables are compared to a dependent variable to see which might have the greatest effect on the value of the dependent variable.

Well okay. That’s the theory of it. But...what about a practical example?

All right…let’s take Pete, the pizza joint guy. How does he know what’s bringing in customers? Is it his new “burrito pizza”? Or the virtual skee-ball machines?

Well, we can use some math to find an equation (usually a linear one) that best matches the pattern in the data. Then we can see how close the points are to that line. And that will solve our burrito pizza/skee-ball conundrum.

The closer the data points are to the line, the more likely there's some kind of link between the independent and dependent variables. It doesn’t mean one variable causes another. It just means that they're linked somehow.

Like, what about the link between ice cream sales and drowning deaths? A morbid connection, but see how close the data points are to that special line? So yeah. There's absolutely a link between ice cream sales and drowning deaths. Greater ice cream sales on a given day is always linked to more drowning deaths on that day.

Why? What’s the linking factor? Flavor of ice cream? Accessibility of public swimming pools?

Clearly ice cream isn’t some insidious killer drowning people who get in the water without waiting the requisite hour. But there is a link between these two variables. As it turns out, higher ice cream sales happen on hotter days. So heat is the linking factor.

More people go swimming on hotter days. When more people swim, there are going to be more drowning possibilities. So ice cream sales and drowning deaths are linked, but ice cream sales don’t cause drowning deaths. Similarly, there's no link between your shoe size and your GPA. Unless you buy huge shoes, build a mini-computer that fits in the extra space in your shoes, and use that to help you, um...cheat. Don’t do that, by the way. Always cite Shmoop.

Anyway, back to Pete, the owner of Zah's Pizza. Pete almost has more customers lately than he can handle. While the lightning is striking, Pete wants to find a way to...bottle it.

The thing is, he's made two significant changes to his restaurant, and he's not sure which one is most responsible for the influx of people tossing money at him. Is it the Virtual Skee-Ball machines...or his new Burrito Pizza? Is there, in fact, a link at all?

It could be both that are responsible, but that's beyond Pete's skill to determine. He can only compare one at a time to the increased revenue. Pete picks different days and plots the number of Burrito Pizza orders against the total money made that same day.

He can see that low Burrito Pizza order numbers are paired with lower daily revenues. Also, high Burrito Pizza orders are paired with higher daily revenues. The closer the points on Pete's graph are to that imaginary line, the more likely it is that the independent variable (Burrito Pizza sales) is at least related in some meaningful way to the dependent variable (total daily revenue).

From here...Pete starts getting into some pretty complicated math. Hope you've got your TI-84 handy.

All right, all right. We won't do it to you. We'll skip the numbers and formulas. For now.

Long story short, Pete used a regression analysis on the two different variables he thought might influence his bank account the most. His conclusion: any decisions he makes going forward should probably be menu-focused, as opposed to attraction-focused.

His secondary conclusion: people get mad when you roll up a pizza and call it a burrito.

Related or Semi-related Video

Finance: What is Regression Analysis?7 Views

00:00

Finance allah shmoop what is regression analysis Regression and elses

00:08

no it's not a therapy session in which your psychiatrist

00:12

tries to figure out why you've gone back to using

00:14

passive fires It's simply this the process by which a

00:17

siri's have different independent variables are compay haired to a

00:21

dependent variable to see which might have the greatest effect

00:25

on the value of the dependent variable All right Well

00:28

okay That's The theory of it anyway But what about

00:31

some practical examples Well what are these graphs And what

00:34

do they tell us Well let's take pete the pizza

00:36

joint guy How does he know what's bringing in customers

00:40

Is it his new burrito pizza or the virtual skee

00:44

ball machines he put in the back Well we can

00:46

use some math here to find an equation Usually a

00:49

linear one Linear regression Very fine Mathematic sport That best

00:53

matches the pattern in the data Then we can see

00:56

how close the points are to that line and that

00:59

you know will solve our burrito pizza Steve all conundrum

01:02

and help pete manage his business better Well the closer

01:06

that data points are to the line the more likely

01:08

there's some kind of link between the independent and dependent

01:12

variables well it doesn't mean one variable causes another It

01:16

just means they're linked somehow Like what about the link

01:20

between ice cream sales and drownings Death that's a morbid

01:23

connection but see how cloaks the data points are to

01:26

that special line So yeah there's absolutely some meaningful link

01:30

between ice cream sales and drownings deaths greater ice cream

01:33

sales on a given day is always linked to mohr

01:36

drowning deaths on that day Why what's the linking factor

01:40

Flavor of ice cream of the amount of sugar in

01:43

the ice cream Too much in ice cream fat and

01:46

crap and stuff accessibility to public swimming pools Well clearly

01:50

ice cream isn't some insidious killer drowning people who get

01:53

in the water without waiting the records that you know

01:56

one hour But there is a link between those two

01:58

variables Think about it As it turns out hire isis

02:01

scream sales happen on hotter days so heat or sunshine

02:05

is the linking factor Mohr people go swimming on hotter

02:10

days when more people swim while they're going to be

02:12

more drowning possibilities anyway so i scream sales in drowning

02:16

Deaths are linked but ice cream sales don't cause drowning

02:20

death Got it No causal link there Similarly check out

02:24

how the points in this graph are not really close

02:26

to the line at all There's no link between your

02:29

shoe size and your g p a you know unless

02:32

you buy huge shoes build a mini computer that fits

02:34

in the extra space in your shoes and use that

02:36

to help you you know cheat Don't do that by

02:39

the way Always cite shmoop anyway back to pete the

02:42

owner of zaza pizza Pete almost has more customers lately

02:45

than he can handle while the lightning is striking Pete

02:48

wants to find a way Teo you know bottle it

02:50

The thing is he's made to significant changes to his

02:53

restaurant and he's not sure which one is more responsible

02:57

for the influx of people tossing money of him Is

03:00

it the virtual skee ball machines Or is it his

03:03

new burrito pizza Is there in fact any link at

03:06

all Well it could be both that are responsible but

03:09

that's beyond pete skill and this course to determine he

03:12

can only compare one at a time to the increased

03:14

Revenue so pete picks different days and plots the number

03:17

of burrito pizza orders against the total money made that

03:20

day Notice how the data points seem closely to follow

03:23

an imaginary line there fromthe lower left to the upper

03:27

right In general we can see that low burrito pizza

03:30

order numbers are paired with lower daily revenues Also hi

03:34

burrito pizza orders are paired with higher daily revenues high

03:39

against high low against low will the closer the points

03:42

are too that imaginary line the more likely it is

03:45

that the independent variable in this case burrito pizza sales

03:49

is at least related in some meaningful way to the

03:52

dependent variable like it's the pendant on sales of total

03:56

daily revenue under our tea i eighty for their or

03:59

phone or computer or whatever you're using first week pop

04:02

up our data into the list by pressing the stat

04:04

button Then enter we put in the ex data in

04:07

list one there l won and the y data enlist

04:10

to l two Now we press the second key and

04:13

the mod key to get out of that menu If

04:15

we don't get out of that menu well we're just

04:18

begging to screw the pooch here so get out Get

04:19

out now we bash stat move over to the cal

04:22

commend you and choose option for which is lean wreg

04:25

a x plus be all right That's in texas shorthand

04:28

for linear regression Yeah on the menu it brings up

04:32

moved down to calculator and then press enter if you're

04:36

cal doesn't show the r squared and our values Well

04:39

you need to hit youtube in search for how to

04:41

turn on stat diagnostics t i eighty four there's a

04:45

bunch of important info in the results that we need

04:47

to check out most importantly for pete's sake is the

04:50

value of our the closer that our value is toe

04:53

one or negative one The closer the points are two

04:57

best fit that possible line Well the closer they are

05:00

value is toe one for graphs with positive slopes or

05:03

negative one for graphs with negative slope the stronger the

05:06

link between the independent independent variables there right That link

05:10

is called a correlation right They correlate it doesn't mean

05:13

higher daily revenues are absolutely caused by burrito pizza lovers

05:17

but it does suggest there somehow correlated and that correlation

05:21

is strong anyway The a and b values that you

05:23

see on the display happen to be the slope And

05:25

why intercept of the equation in the best possible line

05:28

pete can use these to predict daily revenues if he

05:30

knows the number of burrito pizza sails in a day

05:33

But that's a different video Pete still needs to know

05:36

if virtual skee ball is so exciting that it might

05:39

be more responsible for daily revenue jumps He also plotted

05:42

the number of times virtual skee ball was played in

05:45

a day versus those same daily revenue figures Well guess

05:48

what The points look like a cloud instead of having

05:51

any obvious linear pattern Well if we pop that data

05:54

into the cal can run the same linear regression process

05:57

again we get a very different our value We can

06:00

also just see that the points aren't that close to

06:02

the line that our value is not close toe one

06:05

at all In fact it's cozying up to zero like

06:08

it's Ah you know frat boy and zero is well

06:11

every girl within a forty meter radius when they are

06:14

value is sniffing around zero like that Well it means

06:17

there's some kind of very weak correlation between the independent

06:20

and deep and it variables We can't stress enough that

06:23

this is in proof of any kind of cause no

06:25

matter how weak between the two variables just that some

06:28

kind of correlation exists and that it's weak pete has

06:32

some evidence that the increase daily revenue is almost all

06:35

about the burrito pizza and only a tiny bit due

06:37

to the virtual skee ball crowd But this is a

06:40

big but pete does not have proof they are Value

06:43

just suggests that there's some kind of link between the

06:46

two variables Not that a change in one variable causes

06:49

a change in the other Still with that significant of

06:52

a difference in our values pete is pretty safe in

06:55

thinking burrito pizza is probably more important in driving higher

06:58

revenues than virtual skee ball Pete used a regression analysis

07:02

on the two different variables he thought might influence his

07:05

bank account the most any decisions he makes killing forward

07:08

should probably be menu focused as opposed to you know

07:11

attraction focused and still he can't forget the virtual skee

07:14

ball entirely It is probably a teeny bit responsible for

07:17

the increased mullah in pete's case the correlation between the

07:20

variables was positive which means that as burrito pizza sales

07:24

or virtual skee ball plays increase well so does daily

07:28

revenue there also negative correlations here is well where as

07:32

one variable increases the other variable decreases Case in point

07:36

carla's customs right next to pete's place carla has customs

07:40

takes broken down golf carts and file suits them up

07:43

They recently made three distinct changes to their builds and

07:46

have noticed a huge decrease in the time it takes

07:48

one of their cards to complete the forty r dash

07:51

will car lot I wanted to figure out which change

07:54

might have been the most responsible for the decreased time's

07:57

Carlota plotted forty yard dash times versus the size of

08:01

the rims that these things right here they're diameter and

08:04

got an r value of negative point one seven nine

08:08

when she ran a linear regression of the data then

08:11

forty yard dash times versus the cylinder diameter there and

08:14

got in our value of negative point six to eight

08:18

when she ran a linear regression of that data then

08:21

the forty yard dash times versus the nitrous oxide concentration

08:24

Is what she ran and she got in our value

08:27

of negative point nine four eight when she ran a

08:29

linear regression of the data Well guess what The simple

08:32

fact here all three plots have some kind of linear

08:35

relationship It does mean that there's some kind of correlation

08:38

between each of these three variables rim size cylinder diameter

08:43

and nitrous oxide concentration you know in the forty yard

08:46

dash time of the golf carts with her mostly electric

08:49

But we won't get technicals here since all the grafts

08:52

have negative slopes and the correlation with nitrous oxide is

08:55

the close to the values to negative one The nitrous

08:57

oxide concentration has the strongest correlation to decrease forty yard

09:02

dash times like it's bad for speed reduced nitrous oxide

09:05

in your golf cart it's important to remember that carlotta

09:08

can't say that the nitrous oxide concentration is the direct

09:11

cause of the faster times All she knows is that

09:14

there's a link or a correlation between them Still with

09:17

further experimentation carlota could establish a causal relationship Carlotta explored

09:22

the relationship between three different variables and their possible effect

09:25

on the time to run the forty yard dash using

09:27

regression analysis She determined all three variables had some kind

09:30

of negative correlation of the times To run the course

09:32

as the nitrous concentration or the rim sides or the

09:35

cylinder diameter increased well the forty yard dash times decreased

09:39

Clearly the nitrous concentration had the strongest correlation Carla should

09:44

probably focus on that concentration for the greatest decrease in

09:47

times She knows she can't ignore the rim size nor

09:50

can she ignore the cylinder diameter as they all contribute

09:53

Toe overall Golf cart forty r dash speed times Right

09:57

regression analysis will never tell us which variable is the

10:00

actual cause It just kind of gives us it's along

10:03

the way it's best to make decisions informed by all

10:06

the variables that are correlated to the dependent variable And

10:09

as kelly clarkson famously saying you know this independent variable 00:10:13.231 --> [endTime] something like that miss independent variable

Find other enlightening terms in Shmoop Finance Genius Bar(f)