My Avatar

LanternD's Castle

An electronics enthusiast - survive technically

STT 861 Theory of Prob and STT I Lecture Note - 13

2017-11-29

Recap of linear predictor; almost surely convergence, converge in probability, converge in distribution; central limit theorem, theorem of DeMoivre-Laplace.

Portal to all the other notes

Lecture 13 - Nov 29 2017

For Video Recording

Linear Prediction

Recall X,Y let

g(x)=E[Y|X=x]

The r.v. g(x) is the best predictor of Y given X in the least square sense: g minimizes E(g(X)Y)2) = MSE.

But what about making this MSE as small as possible when g is linear? So use notation h(x)=a+bx (instead of g).

We want to minimize

MSE=E((Y(a+bX))2)

Find a and b to make this as small as possible. Let

ZY=YμYσY ZX=XμXσX

We also know,

E(ZXZY)=Corr(X,Y)=ρ Y(a+bX)=(ZYcZX)σY+(da)

where c=bσYσX, and d=μYbμX,

MSE=σY2E((ZycZX)2)+(da)2=σY2(12cρ+c2)+(da)2=σY2(1ρ2+(cρ)2)+(da)2

We see immediately that this is minimal for a=d and c=ρ.

Therefore,

b=ρσYσX

This answers the question of what the best linear predictor of Y given X is the mean square sense.

We see the smallest MSE is therefore σY2(1ρ2).

Therefore, we see that the proportion of Y’s variance which is not explained by X is

MSEσY2=1ρ2

Finally, the proportion of Y’s variance which is explained by X is ρ2.


Chapter 6 - Convergences

Definition: We say that the sequence of r.v.’s (Xn)nN converges (“a.s.”(almost surely)) to the r.v. X if

limnXn=X

with probability 1. In other word,

P(limn|XnX|)=1

Definition: (A weaker notion of convergence) A sequence of r.v.’s (Xn)nN converges in probability to X if

ε>0:P(|XXn|>ε)0

Note: [Convince yourself as an exercise at home] Convergence in probability is (easier to achieve) than converge a.s.

Definition: (even weaker version) Let sequence of r.v.’s (Xn)nN as above but now let F be the CDF of some distribution. We say Xn converges in distribution to the law F is

FXn(X)F(X)

as n for every fixed x where F(x) is continuous.

Note: unlike the previous two notions, here there is no need for a limiting r.v. X and the Xn’s. Do not need to share a probability space with X or anyone else.

Example 1

Let Yi, i=1,2,3, be i.i.d with μ and σ2 is finite. We proved that, with

Xn=Y1+Y2++Ynn=μ

then

P(|Xnμ|>ε)σ2ε2n

(By Chebyshev’s inequality)

The whole thing goes to 0 as n, this proves that Xnμ in probability.

Note: assuming only μ exists (σ2 could be infinity), conclusion still holds. See W. Feller’s book in 1950.

Let Xn=Uniform{1,2,,n}. This is a stepper function, with 1/n increment each step. Let’s try to find out FXn(x)=1n[x] for x[0,n] (integer function, the integer larger than x ).

For fixed xR+, as n, FXn(x)0().

Since the function , is not the CDF of any random variable, this proves that Xn does not converge in distribution, And therefore, Xn cannot converges in any stronger sense (in probability or a.s.).

How about Yn=Uniform{1/n,2/n,,n/n}?

Since

FYn(x)x

for x[0,1], this is the CDF of Uniform(0,1).

Example 2

Let UnUnif(0,1), i.i.d. Let Mn=maxi=1,2,,n(Ui). We can tell that Mn1 in some sense. Let’s prove it in probability:

Let ε>0, P(|Mn1|>ε)=P(1Mn>ε)=P(Mn<1ε)=P(maxi=1,2,...,nUi<1ε)=P(i:Ui<1ε)=P(i=1n{Ui<1ε})=i=1nP(Ui<1ε)=(1ε)nlimP(|Mn1|>ε)=0

We proved that Mn1 in probability.

Now consider Yn=(1Mn)n. Let’s see about CDF of Yn.

1FYn(y)=P((1Mn)>y)=P((1Mn)>yn)=P(Mn<1y/n)=P({Ui<1y/n})=(1y/n)ney

This CDF is exponential. Thus YnExp(λ=1) in distribution.


Theorem (6.3.6): Let (Xn)nN be a sequence of r.v.’s. If Xn has MGF MXn(t) and MXn(t)MX(t) for t not 0, then XnX in distribution.

Example 3

Let Xn be Bin(n,p=λ/n). We know that the PMF of Xn converges to the PMF of Poisson(λ). Try it again here. We will find

MXn(t)=eλ(et1)

and here we recognize that this is the MGF of Poisson(λ). By Theorem 6.3.6, XnPoiss(λ) in distribution.

Let Xi,i=1,2,,n be i.i.d with E(Xi)=μ and variance Var(Xi)=σ2.

Let Zi=Xiμσ, Sn=i=1nZin,

(We divide by (n) as a standardization and we know Var(Sn)=1).

Central Limit Theorem (CLT)

As n, then SnNormal(0,1) in distribution.

This means:

limP(Snx)=FN(0,1)(x)=x12πexp(z2/2)dz

The proof just combines the Taylor formula up to the order 2 and Theorem 6.3.6.


Next, consider

Yn=(i=1nxi)nμn=σSn

A corollary of the CLT says this:

YnN(0,σ2)

in distribution.

Proof:

P(Ynx)=P(σSnx)=P(Snxσ)=x/σ12πexp(z2/2)dz

and conclude by changing of variable.


Theorem of DeMoivre-Laplace

Let Xi,i=1,2,3,..,n be i.i.d Bernoulli. Let Wn=X1+X2++Xn. We know WnBin(n,p).

Let Sn=Wnnpnp(1p),

Therefore, by CLT, SnN(0,1) as n in distribution.

In other words, to be precise,

limP(Snx)=12πxexp(z2/2)dz P(Wnnpnp(1p)x)=P(Wnxnp(1p)+np)

The speed of convergence in the CLT is known as a “Berry-Esseen” theorem. But the speed of convergence for Binomial CLT is much faster and rule of thrum is p[1/10,9/10] and n30. CLT is good within 1% convergence error.


Example 4

Let

Mn=U1+U2++Unn

where Ui’s are i.i.d Uniform(0, 1). We know by Weak law of large numbers, that Mn12 in probability as n. But how spread out is Mn around 1/2? For example, can we estimate the chance that Mn is more than 0.02 away from its mean value 1/2?

Answer:

P(|Mn12|>0.02)=1P(0.02<Mn12<0.02)=1P(0.02<(Ui1/2)n)=1P(0.02n<(Ui1/2)n<0.02n)=1P(0.02n/1/12<(Ui1/2)n1/12<0.02n1/12)1P(0.02n1/12<N(0,1)<0.02n1/12)

We will feel comfortable if n is large enough to make this greater than 0.95. What value should n at least be.

In order to get this setup, it is known that the right-hand value should be = 1.96.

This value

0.975=P(N(0,1)1.95)

So 1.96 is therefore know as the 97.5th percentile of N(0,1). Therefore, we must take

0.02n1/121.96n>800.33

Disqus Comment 0