My Avatar

LanternD's Castle

An electronics enthusiast - survive technically

STT 861 Theory of Prob and STT I Lecture Note - 5

2017-10-04

Sample mean and sample variance, biased and unbiased estimation; covariance, Hypergeometric distribution and its example; correlation coefficients; discrete distribution, Poisson distribution, Poisson approximation for the Binomial distribution.

Portal to all the other notes

Lecture 05 - Oct 05 2017

Sample mean and sample variance

Recall: The proposition E((Xc)2).

Now consider some data xi, i=1,2,3,,n. We imagine that this data comes from an experiment which is repeated n times independently. This means that xi represents a r.v. xi, where the xis are i.i.d.

We are accustomed to using the notation

x¯=1ni=1nxi

This is called “sample mean”.

σ^2=1ni=1n(xix¯)2

This is called “sample variance”.

Now investigate the statistical properties of the two “estimators”. Replace xi by Xi and try this.

Notation: x¯ is for data points, while X¯ is for the model notation.

Find the E(X¯). If E(X¯)=μ, then we say X¯ is unbiased.

Find E(σ^2). Is it Var(X)? It might be biased.

E(X¯)=E(1ni=1nXi)=1ni=1nE(Xi)=μ E(σ^2)=E(1ni=1n(XiX¯)2)

The left-hand side of the formula in the previous proposition applied to a r.v. X, which is equal to Xi with prob =1n.

The stuff inside the parenthesis is actually the expectation of a r.v. equal to XiX¯.

1ni=1n(XiX¯)2=1ni=1n(Xic)2(X¯c)2=1ni=1n(Xiμ)2(X¯μ)2E[σ^2]=1nE(Xiμ)E((X¯μ)2)=1nnVar(X)E((1nxiμ)2)=Var(X)E(1n(xiμ)2)=E((xiμ)2)=Var(X)1nVar(X)=(11n)Var(X)

As a result, this is not exactly =Var(X). Thus, σ^2 is biased.

Let’s define an unbiased estimator for Var(X), We just need to take

S2=1n1i=1n(xiX¯)2

It is unbiased estimation of Var(X).


Covariance (Chapter 1.7)

Definition: Let X & Y be two r.v.s living on the same prob space.

cov(X,Y)=E((XE(X))(YE(Y))

Property: If X & Y are independent, then cov(X,Y)=0. Be aware, The statement is usually false, i.e. cov(X,Y)0.

Note: if X=Y, cov(X,Y)=Var(X).

Property: Let Xi, i=1,2,,n be r.v.’s.

Var(i=1nXi)=i=1nj=1ncov(Xi,Xj)=i=1nVar(Xi)+i=1nj=1incov(Xi,Xj)

Hypergeometric distribution

Application of the previous formula: The variance of the Hypergeometric distribution (no details here, see the book).

Definition: The hypergeometric distribution with parameter (n,N) is the distribution of the r.v. X of the number of elements from a distinguish subset of size n, when one picks a sample of size k without replacement from the N elements.

Example 1

The number X of women in a sample of size k=5 taken without replacement from a group with 8 women & 12 men has this hypergeometric distribution with N=8+12=20 and n=8.

It turns out that

Var(X)=knNnN1

Comments: use notation p=n/N, then

Var(X)=kNp1p11N

Notice: If N is large, the NN1 is almost =1. So this variance is almost the variance of a binomial with success parameter p. This is because if k is much smaller than N, sampling without replacement is almost like sampling with replacement.

This “binomial approximation to the hypergeometric law” works well if kN, except if p=n/N is too close to 1 or 1.

Correlation coefficients

Let X and Y be two r.v.’s. We standardize them let

ZX=(XμX)/σX ZY=(YμY)/σY

where μX=E(X), μY=E(Y), σX=Var(X), σY=Var(Y).

Notice that E(ZX)=E(ZY)=0, Var(ZX)=Var(ZY)=1.

Definition: The correlation coefficient between X and Y is

Corr(X,Y)=Cov(ZX,ZY)

Note: The correlation between X and Y is a value [1,1].

Example 2

Let X=Y, then Corr(X,Y)=1.

What if Y=aX+b, where a and b are constants?

Corr(X,Y)=1 if a>0, and =1 if a<0.

If X and Y are independent, Corr(X,Y)=0.

In general, Corr(X,Y) measures the linear relationship between X and Y.

Main idea: If we have a scatter plot of x and y data, which lines up very well along a straight line, then Corr(X,Y)ρ will be close to 1 if the line slope up and close to -1 if slop down.

Property: Because Corr(X,Y) is defined using the standardized ZX and ZY, then

Corr(aX+b,cY+d)=Corr(X,Y)

Discrete Distributions (Chapter 2)

Some distributions: Binom(n,p), Geom(p)

Important expression:

Recall: the intuition behind the formula E(Geom(p))=1/p: For example, if p=1/20 for a success and we should expect wo wait 20 units of time until the first success.

Exercise at home

Prove the E and Var for Geom(p).

Poisson Distribution

Definition: X is Poisson distribution distributed with parameter λ if X takes the values k=0,1,2, and

P(X=k)=eλλkk!

Compute the expectation,

E(X)=k=0keλλkk!=eλk=1λk1(k1)!=λeλk=0λk1k!=λeλ×eλ=λ

(Recall: ex=k=0xk1k!, Taylor series)

It turns out Var(X)=λ [Prove it at home, easier to calculate E(X(X1))].

Quick question: What is E(X2)? =λ+λ2.

Poisson approximation for the Binomial distribution

Idea: if events are rare, they usually follow a Poisson law.

Fact: Let X be Bin(n,p) and assume p is proportional to 1/n: p=λ/n.

Then PMF of Binom(n,p) is almost the same as for Poi(λ). Specifically we mean this:

limnCnk(λn)k(1λn)nk=eλλkk!

If p is small (of order of 1/n), then #successBin(n,p)Por(λ), λ=E(#success)=np.

Because of this, Poisson distribution is a good model for number of arrival (of some phenomenon) in a fixed interval of time.

This interpreted as successive units of time (e.g. minutes) in an interval of time, also explains the next property:

Fact: let n and M be two independent Poisson r.v.’s with parameters λ and μ, then X=N+M is Poisson too, with parameter (λ+μ).

Because E(X)=E(N)+E(M)=λ+μ.

We can use Binomial distribution visualization to prove the fact that X is Poisson.

Exercise

Try to prove X is Poisson using only PMF.



Disqus Comment 0