See: Probability Distribution.
A probability density function, a.k.a. PDF (and no, that’s not the doc format), is a formula or expression that generates all the possible outcomes and associated probabilities for a continuous random variable. It's the Dr. Strange of functions.
When we take all the possible outputs and plot them, the resulting graph pairs the outcomes, on the x-axis, with their probabilities, on the y-axis. Any areas under the curves, either for the entire distribution or between any two outcomes, will represent the likelihood of getting any outcome between those beginning and ending values.
Related or Semi-related Video
Finance: What is probability distributio...17 Views
finance a la shmoop what is probability distribution this this graph with the
curvy line line down the middle sitting nicely on an x and y axis it represents [standard distribution bell curve]
the total sum of probabilities of all these outcomes out here on the far right
the probability that in three years you sell your screenplay for five million [check changes hands]
dollars and over here still far the right in the middle is probability you
sell your screenplay in the next five years for a hundred grand over here just
right in the middle is the probability you sell your screenplay for a dollar
but to the shady guy to coffee bean who's promising you and a picture deal
at Paramount and over here just left in the middle is the probability that
you're still a barista forever the most likely stuff lives in the middle as we
slide toward either end things get less and less likely so why is it called a
distribution well because the potential outcomes ie things like winning a
lottery selling a screenplay yeah they're kind of the same thing or [happy people with money]
meeting someone on tinder whose picture was taken less than ten years and twenty [old man walks into park]
pounds ago carries a range that is the potential outcomes are distributed on a
long line that then gets visually mapped to explain the character or feelings
that describes this set of potentialities well the most common
continuous probability distribution is the normal curve or normal distribution
you may know it better by its somewhat common nickname this hilly looking thing
called the bell curve well the mean located in the middle where the peak is [bell curve analysis]
right there is usually labeled mu which represents a population mean the units
on each side are plus and minus one two and three standard deviations Sigma is
the symbol for a population standard deviation well the normal curve was
developed when researchers started comparing tons of measurements of things
like heights of giraffes or diameters of plastic lids for drink cups or lengths
of well just say it got a little competitive there in the lab turns out
that tons of things both man-made and nature made to end up having a normal
curve shape to their measurements
with heights of women they found that a certain height 5 foot 4 inches showed up [woman on a graph]
more than any other that one height showed up with the greatest probability
Heights taller than 5'4 and height shorter than 5'4 showed up less often
well the farther the height was from 5'4 the less likely it was to occur because
while really tall women in really short women aren't that common and average [short, medium and tall women in a row]
height women are very common when they plotted the heights and their associated
probabilities with thousands of results they got a shape that became as the
normal curve because of the shape of the normal curve 68% of all the possible
data lands between the first tick marks on each side of the mean plus and minus
1 Sigma 95 percent of all possible data lands between the second tick marks on
each side of the mean plus and minus 2 Sigma and ninety-nine point seven
percent of all the possible data lands between the third tick marks on each
side of the mean plus and minus the three Sigma there so think about the
height of women where they would map here and we're going to show you for no
extra charge where they go on one two and three Sigma there yeah those are the [sleeping man falls out of chair]
heights well graphically the empirical rule
shakes out like this in the words of Master Yoda worth memorizing this curve
is well we can use these percentages to determine how much of the possible data
plans between different values on the normal curve so let's say we get curious
and decide to measure the length of every tail of every ring-tailed lemur we [lemurs playing in grass]
come across which on the streets around here in Silicon Valley is actually more [lab technician measuring tail]
than you would think all right well then we plot those tail lengths along with
how often they showed up we'd get a normal curve of tail lengths the mean or
average tail length would be at the peak in the middle meaning that it was the
measurement we got most often well the tick marks on the x-axis would be found
by adding the standard deviation of the tail links to the mean once twice and
thrice and then subtracting the standard deviation from the mean once twice and
thrice about 68% of the lemurs we measured would have tail links between
one sigma negative one sigma there you go
95% of the lemurs we measured would have tail lengths between two sigma and
negative two Sigma there we go ninety-nine point seven percent of the
lemurs we measured would have tail lengths between three sigma there and
negative three segments right all in that area as another example the machine
that makes the lids for drink cups doesn't make them the same size every [drinking lid production line]
time because of variations in the temperature of the plastic and of the
mold and of the quality of the plastic can't because a butterfly flapped its [butterfly on flower]
wings in Jamaica the machine will produce lids that are usually around a
targeted diameter but also slightly large or slightly smaller in fact the
diameter of plastic lids for a certain size of drink cups are known to be
normally distributed those lids have a mean diameter of MU equals 3.8 one to
five inches and a standard deviation of Sigma equals point zero five one inches
well this means that we can create a normal curve with actual numbers on the
x axis the mean value in the middle will be the three point eight one to five
inches will then add point zero five one inches once twice and three times a lady
to the mean to get the values on the right and subtract point O five one from
three point eight one two five three times to get the value on the left only
lids in a range of the sweetspot diameters will fit tightly on the cup
this sweet spot ranges between three point seven 105 inches and three point
nine one four five inches well what percentage of lids will be between three
point seven to 105 inches and three point nine one four five inches in
diameter and therefore unusable while we're trying to find the percentage of
lids that will be produced that are between those values at negative two
sigma 3 point 7 5 and two sigma three point nine one four five yeah well
according to the empirical rule ninety-five percent of the data lies
between these two values an empirical rule that's the Empire rule [hand places lid on cup successfully]
the rule of the trying things out and seeing what happens so that's what the
data is telling us well 95 percent of the lids produced on
this machine will be in the sweetspot range and fit tightly on the cups the [surfer holding up]
empirical rule isn't the only game in town when it comes to normal curving but
we'll save the other ways to play on the normal curve for a separate video where [monopoly game]
the normal curve gets the spotlight all to itself well there are other kinds of
probability distributions that don't cover every possible number decimal
infraction they're called discrete probability distributions they usually
hang out in tables and sometimes in formulas turning 18 is great you can
vote you can be drafted you can buy lottery tickets one quick scratch off
and you could be on easy street right and maybe not so easy grab a magnifying
glass peep at the backside of the lottery ticket yep there's a probability
tribution on the back it shows all the prizes you could win it also shows the
probabilities or likelihood that you win those prizes and it's a total downer so [woman disappointed with numbers]
maybe you should ignore it and just scratch and pray you want that new
jacuzzi with shiatsu massaging jets and it ain't cheap right well specifically [fancy jacuzzi]
this is a discrete probability distribution which just means we have a
fixed number of outcomes in this case there are six possible outcomes we can
win five different dollar amounts and we can also win zilch well check out the
probability of winning $0 happens 78% of the time you get nothing and then
there's a one in two thousand or 0.05 percent chance of winning $100 you know
it would have been better if grandma had just given us the money she used to buy
the ticket instead of the tickets themselves yeah there are a few other
kinds of discrete probability distributions here all of them can be
placed in tables if we want to one specifically has a swaggy formula that
helps us generate the probabilities for each possible outcome and it's known as
the binomial probability distribution or BPD for short all to see this thing in
action we need to have a situation where there are only two things that can
happen we'll call winning any kind of moolah on [woman happy with being given money]
that lottery ticket a success we'll call ending up with squat failure well the
BPD requires exactly two possible outcomes if there are more than two we
can't use the BPD the BPD also requires that the chance of a success always
stays the same if there's a twenty two percent chance of winning on the first
ticket of that kind well there needs to be a twenty two percent chance of [stack of scratch and win tickets]
winning for all the same kinds of tickets right so it can't be like you're
picking cards off a deck and all of a sudden there's one less jack so the odds [card dealer fanning cards]
change every card well the BPD also requires that we don't just spend the
rest of our lives scratching off those tickets we have to pick a set number of
tickets we're gonna scratch and you know stick to one meeting those conditions is
vital if we meet them we can answer questions like if you splurge on ten
tickets how likely is it that you win on five of them if there's a 22 percent
chance of winning on each ticket all right well we'd pop in ten for n 5 for K
and point to 2 for P there all right with all the numbers plugged in we have
this ugly looking equation yeah grab a calculator there but we need to
deal with that combinations thing you know the 10 c5 thing yeah it also has [formulas on screen]
its own formula involving factorials which actually finds all the different
orders of what could happen on those 10 cards like we could go win win loss loss
loss win loss win loss win or maybe we get lost lost lost lost win win win win
lost win yeah well the 10 c5 finds the total
number of possible ways the card combos could shake out for those of you with a
TI graphing calculator like a ti-84 or similar we've got you type in the N and [hand using graphing calculator]
in this case press math go over the PRB menu choose NCR then type in the K it's
5 in this instance and it should look like 10 NCR 5 on your screen and then
hit enter well with our combinations number 252 there safely in hand we can
knock out the rest our answer will tell us how likely it is to win on half of [formulas on screen]
the 10 tickets you buy when there's a 22 percent chance of winning on each one
hikes only a 3.75 percent chance of winning on half those tickets you'd have
been better off spending the money on gas station nachos at least then you'd [woman with nachos]
have a stomachache to remember your money by maybe just toss your money in
the trash and then skip the middleman [woman throws nachos away]
Up Next
What is the normal distribution/normal curve? The normal distribution or normal curve is when data transposed into a graph shows a fairly strong ad...