Processing math: 25%
My Avatar

LanternD's Castle

An electronics enthusiast - survive technically

STT 861 Theory of Prob and STT I Lecture Note - 13

2017-11-29

Recap of linear predictor; almost surely convergence, converge in probability, converge in distribution; central limit theorem, theorem of DeMoivre-Laplace.

Portal to all the other notes

Lecture 13 - Nov 29 2017

For Video Recording

Linear Prediction

Recall X,Y let

g(x)=E[Y|X=x]

The r.v. g(x) is the best predictor of Y given X in the least square sense: g minimizes E(g(X)Y)2) = MSE.

But what about making this MSE as small as possible when g is linear? So use notation h(x)=a+bx (instead of g).

We want to minimize

MSE=E((Y(a+bX))2)

Find a and b to make this as small as possible. Let

ZY=YμYσY ZX=XμXσX

We also know,

E(ZXZY)=Corr(X,Y)=ρ Y(a+bX)=(ZYcZX)σY+(da)

where c=bσYσX, and d=μYbμX,

MSE=σ2YE((ZycZX)2)+(da)2=σ2Y(12cρ+c2)+(da)2=σ2Y(1ρ2+(cρ)2)+(da)2

We see immediately that this is minimal for a=d and c=ρ.

Therefore,

b=ρσYσX

This answers the question of what the best linear predictor of Y given X is the mean square sense.

We see the smallest MSE is therefore σ2Y(1ρ2).

Therefore, we see that the proportion of Y’s variance which is not explained by X is

MSEσ2Y=1ρ2

Finally, the proportion of Y’s variance which is explained by X is ρ2.


Chapter 6 - Convergences

Definition: We say that the sequence of r.v.’s (Xn)nN converges (“a.s.”(almost surely)) to the r.v. X if

lim

with probability 1. In other word,

P(\lim_{n\rightarrow\infty}|X_n-X|) = 1

Definition: (A weaker notion of convergence) A sequence of r.v.’s (X_n)_{n\in\mathbb{N}} converges in probability to X if

\forall \varepsilon >0: P(|X-X_n|>\varepsilon) \rightarrow 0

Note: [Convince yourself as an exercise at home] Convergence in probability is (easier to achieve) than converge a.s.

Definition: (even weaker version) Let sequence of r.v.’s (X_n)_{n\in\mathbb{N}} as above but now let F be the CDF of some distribution. We say X_n converges in distribution to the law F is

F_{X_n}(X) \rightarrow F(X)

as n\rightarrow\infty for every fixed x where F(x) is continuous.

Note: unlike the previous two notions, here there is no need for a limiting r.v. X and the X_n ’s. Do not need to share a probability space with X or anyone else.

Example 1

Let Y_i , i=1,2,3,… be i.i.d with \mu and \sigma^2 is finite. We proved that, with

X_n= \frac{Y_1 + Y_2 + \cdots + Y_n}{n} = \mu

then

P(|X_n-\mu|>\varepsilon) \leq \frac{\sigma^2}{\varepsilon^2n}

(By Chebyshev’s inequality)

The whole thing goes to 0 as n\rightarrow \infty , this proves that X_n\rightarrow\mu in probability.

Note: assuming only \mu exists ( \sigma^2 could be infinity), conclusion still holds. See W. Feller’s book in 1950.

Let X_n=Uniform\{1,2,…,n\} . This is a stepper function, with 1/n increment each step. Let’s try to find out F_{X_n}(x) =\frac{1}{n}[x] for x\in [0,n] (integer function, the integer larger than x ).

For fixed x\in\mathbb{R^+} , as n\rightarrow\infty , F_{X_n}(x)\rightarrow 0(\star) .

Since the function \star , is not the CDF of any random variable, this proves that X_n does not converge in distribution, And therefore, X_n cannot converges in any stronger sense (in probability or a.s.).

How about Y_n=Uniform \{1/n, 2/n, …, n/n\} ?

Since

F_{Y_n}(x)\rightarrow x

for x\in[0,1] , this is the CDF of Uniform(0,1).

Example 2

Let U_n\sim Unif(0,1) , i.i.d. Let M_n =\max_{i=1,2,…,n}(U_i). We can tell that M_n\rightarrow1 in some sense. Let’s prove it in probability:

Let \varepsilon>0 , \begin{align*} P(\vert M_n-1\vert > \varepsilon) &= P(1-M_n>\varepsilon) \\ &=P(M_n<1-\varepsilon)\\ &=P(\max_{i=1,2,...,n}U_i<1-\varepsilon) \\ &=P(\forall i: U_i<1-\varepsilon)\\ &=P(\cap_{i=1}^{n}\{U_i<1-\varepsilon\}) \\ &=\prod_{i=1}^{n}P(U_i<1-\varepsilon) =(1-\varepsilon)^n\\ \lim P(\vert M_n-1 \vert >\varepsilon) &=0 \end{align*}

We proved that M_n\rightarrow1 in probability.

Now consider Y_n=(1-M_n)n . Let’s see about CDF of Y_n .

\begin{align*} 1-F_{Y_n}(y) & = P((1-M_n)>y) \\ &= P((1-M_n)>\frac{y}{n}) \\ &=P(M_n<1-y/n) \\ &=P(\prod \{U_i<1-y/n\}) \\ &=(1-y/n)^n\\ &\rightarrow e^{-y} \end{align*}

This CDF is exponential. Thus Y_n \rightarrow Exp(\lambda = 1) in distribution.


Theorem (6.3.6): Let (X_n)_{n\in\mathbb{N}} be a sequence of r.v.’s. If X_n has MGF M_{X_n}(t) and M_{X_n}(t)\rightarrow M_X(t) for t not 0, then X_n \rightarrow X in distribution.

Example 3

Let X_n be Bin( n,p=\lambda/n ). We know that the PMF of X_n converges to the PMF of Poisson( \lambda ). Try it again here. We will find

M_{X_n}(t)=e^\lambda (e^t-1)

and here we recognize that this is the MGF of Poisson( \lambda ). By Theorem 6.3.6, X_n\rightarrow Poiss(\lambda) in distribution.

Let X_i, i=1,2,…,n be i.i.d with E(X_i)=\mu and variance Var(X_i)=\sigma^2 .

Let Z_i =\frac{X_i-\mu}{\sigma} , S_n=\frac{\sum_{i=1}^{n}Z_i}{\sqrt{n}} ,

(We divide by \sqrt(n) as a standardization and we know Var(S_n)=1 ).

Central Limit Theorem (CLT)

As n\rightarrow\infty , then S_n\rightarrow Normal(0,1) in distribution.

This means:

\lim P(S_n\leq x) = F_{N(0,1)}(x) = \int_{-\infty}^{x}\frac{1}{\sqrt{2\pi}}\exp(-z^2/2) dz

The proof just combines the Taylor formula up to the order 2 and Theorem 6.3.6.


Next, consider

Y_n = \frac{(\sum_{i=1}^{n}x_i)-n\mu}{\sqrt{n}}=\sigma S_n

A corollary of the CLT says this:

Y_n\rightarrow N(0,\sigma^2)

in distribution.

Proof:

\begin{align*} P(Y_n\leq x) & = P(\sigma S_n\leq x) \\ &=P(S_n\leq\frac{x}{\sigma}) \\ &=\int_{-\infty}^{x/\sigma}\frac{1}{\sqrt{2\pi}}\exp(-z^2/2) dz \end{align*}

and conclude by changing of variable.


Theorem of DeMoivre-Laplace

Let X_i, i=1,2,3,..,n be i.i.d Bernoulli. Let W_n = X_1+X_2+\cdots+ X_n . We know W_n\sim Bin(n,p) .

Let S_n =\frac{W_n-np}{\sqrt{np(1-p)}},

Therefore, by CLT, S_n\rightarrow N(0,1) as n\rightarrow\infty in distribution.

In other words, to be precise,

\lim P(S_n\leq x) = \frac{1}{\sqrt{2\pi}}\int_{-\infty}^{x}\exp(-z^2/2)dz P(\frac{W_n-np}{\sqrt{np(1-p)}}\leq x) =P(W_n\leq x\sqrt{np(1-p)}+np)

The speed of convergence in the CLT is known as a “Berry-Esseen” theorem. But the speed of convergence for Binomial CLT is much faster and rule of thrum is p\in [1/10, 9/10] and n\geq 30 . CLT is good within 1\% convergence error.


Example 4

Let

M_n =\frac{U_1+U_2+\cdots+U_n}{n}

where U_i ’s are i.i.d Uniform(0, 1). We know by Weak law of large numbers, that M_n\rightarrow \frac{1}{2} in probability as n\rightarrow\infty . But how spread out is M_n around 1/2? For example, can we estimate the chance that M_n is more than 0.02 away from its mean value 1/2?

Answer:

\begin{align*} P(|M_n-\frac{1}{2}|>0.02) & = 1- P(-0.02 < M_n-\frac{1}{2} <0.02) \\ &= 1-P(-0.02<\frac{\sum (U_i-1/2)}{n}) \\ &=1- P(-0.02\sqrt{n}<\frac{\sum (U_i-1/2)}{\sqrt{n}}<0.02\sqrt{n}) \\ &=1- P(-0.02\sqrt{n}/\sqrt{1/12}<\frac{\sum (U_i-1/2)}{\sqrt{n}}\sqrt{1/12}<0.02\sqrt{n}\sqrt{1/12}) \\ &\approx 1-P(-\frac{0.02\sqrt{n}}{\sqrt{1/12}}<N(0,1)<\frac{0.02\sqrt{n}}{\sqrt{1/12}}) \end{align*}

We will feel comfortable if n is large enough to make this greater than 0.95. What value should n at least be.

In order to get this setup, it is known that the right-hand value should be = 1.96.

This value

0.975 = P(N(0,1)\leq 1.95)

So 1.96 is therefore know as the 97.5^{th} percentile of N(0,1) . Therefore, we must take

\frac{0.02\sqrt{n}}{\sqrt{1/12}}\geq 1.96 \Rightarrow n>800.33

Disqus Comment 0