Probability in statistics

Profile

I Made Krisna Gupta (Imed)
Politeknik APP Jakarta, Universitas Indonesia, Center for Indonesian Policy Studies
S3 di Australian National University, S2 di UI/VU Amsterdam
Fokus riset di perdagangan internasional dan kebijakan publik (particularly kebijakan industri)
more at krisna.or.id atau @imedkrisna

Tentang course

Dibuat berdasarkan materi “Introduction to Probability and Statistics” dari ocw.mit.edu oleh Jeremy Orlff dan Jonathan Bloom.
Kontennya sama dengan berbagai standard intro to probability for statistics & econometrics.

slides here

On today

Left out counting and motivation because these are trivial.
Didn’t have time to go through central tendency and variance properly.
Borrow materials from Dr, Uka Wikarya.
Use english and proper notation so you can have a nice transition to VU guys.

Frequentist vs Bayesian

Frequentists: probability measures the frequency of various outcomes of an experiment.
- e.g.,: a 50% probability of heads = if we toss N number of times, roughly 0.5N are heads.
Bayesians: probability is an abstract concept that measures a state of knowledge of a degree of belief in a given preoposition.
- no single value of . Instead: what is the probability P(head)=0.5? 0.4? 0.1? etc.
Most of our tools are developed by frequentists, but increasingly powerful computers lead to the resurgence of bayesians.

Terminology

Experiment: a repeatable procedure with well-defined possible outcomes.
sample space: the set of all possible outcomes, denoted by sometimes , sometimes by .
fair: all has the same probability.
Event: a subset of the sample space.
Probability function: a function giving the probability of each outcome.

Examples

Tossing a fair coin.

Experiment: toss the coin, report if it lands heads or tails. Sample space: . Probability function: .

Toss a fair coin 3 times.

Experiment: toss the coin 3 times, report outcomes. Sample space: Probability function:

Examples

Taxis (An infinite discrete sample space)

Experiment: count the number of taxis that pass UI Salemba during class. Sample space: Probability function: Poisson distributin , where is the average number of taxis.

Outcome	0	1	2	3		k
Probability

Events

An event is a collection of outcomes. i.e., a subset of the sample space .
For example, using 3 coin experiment, what is the probability of exactly two heads show up?
We can write it as ’exactly 2 heads’, or . Note that .
Since we know that , we can compute

Discrete sample space

A discrete sample space is one that is listable, can be either finite of infinite.
are all discrete sets. The first two are finite, the last one is infinite.
The interval is not discrete. It is continuous.

Probability function

For a discrete sample space , a probability function assigns to each outcome a number called the probability of .
must satisfy two rules:
- Rule 1:
- Rule 2: the sum of the probability of all possible outcomes is 1.
Rule 2: if , then

Probability rules

For events A, L and R contained in a sample space .

Rule 1.

Rule 2. If and are disjoint then

Rule 3. Inclusion-exclusion principle: if L and R are not disjoint (i.e., overlap), then

Conditional probability

Conditional probability answers the question ‘how does the proobability of an event change if we have extra information’?

Example 1. Toss a fair coin 3 times.

What is the possibility of 3 heads? Since we have only 1 in our sample space, then P(HHH)=1/8.

Conditional probability

What is P(HHH) if we know the first toss is H? We have a new, reduced sample space . Can you answer ?

This is called conditional probability, since it takes into account additional conditions.

Conditional probability

Rephrase (b) as events: Let be the event ‘all three tosses are heads’ = . Let be the event ‘the first toss is heads’ = .

The conditional probability of A knowing that B has happened is written .

This is read as ‘the conditional probability of A given B’, or ‘the probability of A conditioned on B’, or simply ‘the probability of A given B’.

Conditional probability

We can assign a formal definition of conditional probability as such:

Let and be events. Conditional probability of A given B defined as

Conditional probability

Let’s redo our previous calculation of 3 heads using this @eq1. Recall that , and the first toss is .

For more complicated events, using @eq1 is often preferred to counting.

Multiplication rule

From @eq1 we can manipulate the algebra to get the multiplication rule:

example: Draw two cards from a deck. Using multiplication rule, show that the chance of drawing two spades is 3/51.

Law of total probability

Suppose the sample space is divided into 3 disjoint events , then for any event :

If A is divided into 3 pieces, then is the sum of the probabilities of the pieces. The second equation is called the law of total probability.

Probability urn

An urn contains 5 red balls and 2 green balls. We draw 2 balls. What is the probability the second ball is red?

Sample space . Let =‘first ball red’, =‘second ball red’, =‘first ball green’, =‘second ball green’. The question is .

Probability urn

under a slightly complex rule, we can’t count on counting.

Suppose if the first draw is green, a red ball is added to the urn, and if the first draw is red, a green ball is added. The first ball isn’t returned. find .

, therefore

Using a probability tree is useful in this type of question.

Independence

Two events are independent if knowledge that one occured does not change the probability that the other occurred.
A is independent of B if
If A is independent of B, then .
2 events A and B are independent if .
A is independent of B if and only if B is independent of A.

Testing independence

Toss a fair coin twice. =‘first toss is H’ and =‘second toss is H’. Are and independent?

Toss a fair coin 3 times. Let =‘total 2 heads’. Are and independent? Hint: find then check if .

Bayes’ rule

For two events A and B, Bayes’ rule says

Bayes’ rule tells us how to ‘invert’ conditional probabilities. In practice, is often computed using the law of total probability.

Bayes’ rule

It is common to confuse and .

Toss a coin 5 times. Let ’first toss is heads’ and let ‘all 5 tosses are heads’. but .

The base rate fallacy

Consider a routine screening test for a disease. Suppose the frequency of the disease in the population (the base rate) is 0.5%. The test is fairly accurate with 5% false positive rate and 10% false negative rate. You take the test and it comes back positive. What’s the probability you actually get the disease?

Lets define events: you have the disease, you dont get the disease, you tested positive, and you tested negative.

therefore . The false positive and false negative are conditional probability.

The base rate fallacy

and

The complements are true negative and true positive rates, which are:

and

You can actually put this in a probability tree.

The base rate fallacy

The question is what’s the probability that you have the disease that your test is positive. i.e., what is the value of . We don’t have the value but we can use Bayes’ rule:

We use the law of total probability to compute (or just use the tree)

The answer is something like 8.3%.

THe base rate fallacy

This is called “the base rate fallacy” because the base rate of the disease in the population is so low that majority of people taking the test are actually healthy. To summarize:

95% of all tests are accurate does not imply 95% of positive tests are accurate

The base rate fallacy also often calculated using a table

The base rate fallacy

	D+	D-	total
T+
T-
total	50	9950	10000

	D+	D-	total
T+	45	498	543
T-	5	9452	9457
total	50	9950	10000

Discrete random variables

A random variable assigns a number to each outcome in a sample space.

Let be a sample space. A discrete random variable is a function

that takes a discrete set of values. It’s random because its value depends on a random outcome of an experiment.

A game of dice

For any value we write to mean event consisting of all outcomes with

Roll a fair ice twice and record the outcome as where is the outcome of the first roll while is the outcome of the second roll. The sample space thus

A game of dice

In this game, you win $500 if the sum is 7 and lose $100 otherwise. The payoff function X:

The event is the set , so .

Probability mass function

The probability mass function (pmf) of a discrete random variable is the function that:

always satisfy
can be any number. for that never takes, then .

Let is a sample space for rolling 2 dice. Let be the max value of the 2 dice.

a	1	2	3	4	5	6
pmf p(a):	1/36	3/36	5/36	7/36	9/36	11/36

Cummulative distribution function

The cummulative distribution function (cdf) of a random variable is the function given by .

a	1	2	3	4	5	6
pmf	1/36	3/36	5/36	7/36	9/36	11/36
cdf	1/36	4/36	9/36	16/36	25/36	36/36

is called the cumulative distribution function because gives the total probability that accumulates by adding up the probabilities as runs from to .