STT 861 Theory of Prob and STT I Lecture Note - 5
2017-10-04
Sample mean and sample variance, biased and unbiased estimation; covariance, Hypergeometric distribution and its example; correlation coefficients; discrete distribution, Poisson distribution, Poisson approximation for the Binomial distribution.
Portal to all the other notes
- Lecture 01 - 2017.09.06
- Lecture 02 - 2017.09.13
- Lecture 03 - 2017.09.20
- Lecture 04 - 2017.09.27
- Lecture 05 - 2017.10.04 -> This post
- Lecture 06 - 2017.10.11
- Lecture 07 - 2017.10.18
- Lecture 08 - 2017.10.25
- Lecture 09 - 2017.11.01
- Lecture 10 - 2017.11.08
- Lecture 11 - 2017.11.15
- Lecture 12 - 2017.11.20
- Lecture 13 - 2017.11.29
- Lecture 14 - 2017.12.06
Lecture 05 - Oct 05 2017
Sample mean and sample variance
Recall: The proposition .
Now consider some data , . We imagine that this data comes from an experiment which is repeated times independently. This means that represents a r.v. , where the s are i.i.d.
We are accustomed to using the notation
This is called “sample mean”.
This is called “sample variance”.
Now investigate the statistical properties of the two “estimators”. Replace by and try this.
Notation: is for data points, while is for the model notation.
Find the . If , then we say is unbiased.
Find . Is it ? It might be biased.
The left-hand side of the formula in the previous proposition applied to a r.v. , which is equal to with prob .
The stuff inside the parenthesis is actually the expectation of a r.v. equal to .
As a result, this is not exactly . Thus, is biased.
Let’s define an unbiased estimator for , We just need to take
It is unbiased estimation of .
Covariance (Chapter 1.7)
Definition: Let & be two r.v.s living on the same prob space.
Property: If & are independent, then . Be aware, The statement is usually false, i.e. .
Note: if , .
Property: Let , be r.v.’s.
Hypergeometric distribution
Application of the previous formula: The variance of the Hypergeometric distribution (no details here, see the book).
Definition: The hypergeometric distribution with parameter is the distribution of the r.v. of the number of elements from a distinguish subset of size , when one picks a sample of size without replacement from the elements.
Example 1
The number of women in a sample of size taken without replacement from a group with 8 women & 12 men has this hypergeometric distribution with and .
It turns out that
Comments: use notation , then
Notice: If is large, the is almost . So this variance is almost the variance of a binomial with success parameter . This is because if is much smaller than , sampling without replacement is almost like sampling with replacement.
This “binomial approximation to the hypergeometric law” works well if , except if is too close to 1 or 1.
Correlation coefficients
Let and be two r.v.’s. We standardize them let
where , , , .
Notice that , .
Definition: The correlation coefficient between and is
Note: The correlation between and is a value .
Example 2
Let , then .
What if , where and are constants?
if , and if .
If and are independent, .
In general, measures the linear relationship between and .
Main idea: If we have a scatter plot of and data, which lines up very well along a straight line, then will be close to 1 if the line slope up and close to -1 if slop down.
Property: Because is defined using the standardized and , then
Discrete Distributions (Chapter 2)
Some distributions: ,
Important expression:
- , .
- , .
- , .
- , .
Recall: the intuition behind the formula : For example, if for a success and we should expect wo wait 20 units of time until the first success.
Exercise at home
Prove the and for .
Poisson Distribution
Definition: is Poisson distribution distributed with parameter if takes the values and
Compute the expectation,
(Recall: , Taylor series)
It turns out [Prove it at home, easier to calculate ].
Quick question: What is ? .
Poisson approximation for the Binomial distribution
Idea: if events are rare, they usually follow a Poisson law.
Fact: Let be and assume is proportional to : .
Then PMF of is almost the same as for . Specifically we mean this:
If is small (of order of ), then , .
Because of this, Poisson distribution is a good model for number of arrival (of some phenomenon) in a fixed interval of time.
This interpreted as successive units of time (e.g. minutes) in an interval of time, also explains the next property:
Fact: let and be two independent Poisson r.v.’s with parameters and , then is Poisson too, with parameter .
Because .
We can use Binomial distribution visualization to prove the fact that is Poisson.
Exercise
Try to prove is Poisson using only PMF.
- ← Older-STT 861 Theory of Prob and STT I Lecture Note - 4
- STT 861 Theory of Prob and STT I Lecture Note - 6-Newer →