# STT 861 Theory of Prob and STT I Lecture Note - 13

2017-11-29

Recap of linear predictor; almost surely convergence, converge in probability, converge in distribution; central limit theorem, theorem of DeMoivre-Laplace.

# Portal to all the other notes

- Lecture 01 - 2017.09.06
- Lecture 02 - 2017.09.13
- Lecture 03 - 2017.09.20
- Lecture 04 - 2017.09.27
- Lecture 05 - 2017.10.04
- Lecture 06 - 2017.10.11
- Lecture 07 - 2017.10.18
- Lecture 08 - 2017.10.25
- Lecture 09 - 2017.11.01
- Lecture 10 - 2017.11.08
- Lecture 11 - 2017.11.15
- Lecture 12 - 2017.11.20
- Lecture 13 - 2017.11.29 -> This post
- Lecture 14 - 2017.12.06

# Lecture 13 - Nov 29 2017

## For Video Recording

### Linear Prediction

Recall $ X, Y $ let

The r.v. $ g(x) $ is the best predictor of $ Y $ given $ X $ in the least square sense: $ g $ minimizes $ E(g(X)-Y)^2) $ = MSE.

But what about making this MSE as small as possible when $ g $ is linear? So use notation $ h(x)=a+bx $ (instead of $ g $).

We want to minimize

Find $ a $ and $ b $ to make this as small as possible. Let

We also know,

where $ c=b \frac{\sigma_Y}{\sigma_X} $, and $ d=\mu_Y-b\mu_X $,

We see immediately that this is minimal for $ a=d $ and $ c=\rho $.

Therefore,

This answers the question of what the best linear predictor of $ Y $ given $ X $ is the mean square sense.

We see the smallest MSE is therefore $ \sigma_Y^2(1-\rho^2) $.

Therefore, we see that the proportion of $ Y $’s variance which is not explained by $ X $ is

Finally, the proportion of $ Y $’s variance which is explained by $ X $ is $ \rho^2 $.

## Chapter 6 - Convergences

**Definition**: We say that the sequence of r.v.’s $ (X_n)_{n\in\mathbb{N}}
$ converges (“a.s.”(almost surely)) to the r.v. $ X $ if

with probability 1. In other word,

**Definition**: (A weaker notion of convergence) A sequence of r.v.’s $ (X_n)_{n\in\mathbb{N}}$ converges in probability to $X$ if

Note: [Convince yourself as an exercise at home] Convergence in probability is (easier to achieve) than converge a.s.

**Definition**: (even weaker version) Let sequence of r.v.’s $ (X_n)_{n\in\mathbb{N}}$ as above but now let $ F $
be the CDF of some distribution. We say $ X_n $ converges in distribution to
the law $ F $ is

as $ n\rightarrow\infty$ for every fixed $ x $ where $ F(x) $ is continuous.

Note: unlike the previous two notions, here there is no need for a limiting r.v. $ X $ and the $ X_n $’s. Do not need to share a probability space with $ X $ or anyone else.

### Example 1

Let $ Y_i $, $ i=1,2,3,… $ be i.i.d with $ \mu $ and $ \sigma^2 $ is finite. We proved that, with

then

(By Chebyshev’s inequality)

The whole thing goes to 0 as $ n\rightarrow \infty $, this proves that $ X_n\rightarrow\mu $ in probability.

Note: assuming only $ \mu $ exists ($ \sigma^2 $ could be infinity), conclusion still holds. See W. Feller’s book in 1950.

Let $ X_n=Uniform\{1,2,…,n\} $. This is a stepper function, with $ 1/n $ increment each step. Let’s try to find out $ F_{X_n}(x) =\frac{1}{n}[x]$ for $ x\in [0,n] $ (integer function, the integer larger than $ x $ ).

For fixed $ x\in\mathbb{R^+} $, as $ n\rightarrow\infty $, $ F_{X_n}(x)\rightarrow 0(\star) $.

Since the function $ \star $, is not the CDF of any random variable, this proves that $ X_n $ does not converge in distribution, And therefore, $ X_n $ cannot converges in any stronger sense (in probability or a.s.).

How about $ Y_n=Uniform \{1/n, 2/n, …, n/n\} $?

Since

for $ x\in[0,1] $, this is the CDF of $Uniform(0,1)$.

### Example 2

Let $ U_n\sim Unif(0,1) $, i.i.d. Let $ M_n =\max_{i=1,2,…,n}(U_i)$. We can tell that $ M_n\rightarrow1 $ in some sense. Let’s prove it in probability:

Let $ \varepsilon>0 $,

We proved that $ M_n\rightarrow1 $ in probability.

Now consider $ Y_n=(1-M_n)n $. Let’s see about CDF of $ Y_n $.

This CDF is exponential. Thus $ Y_n \rightarrow Exp(\lambda = 1)$ in distribution.

**Theorem** (6.3.6): Let $ (X_n)_{n\in\mathbb{N}} $ be a sequence of
r.v.’s. If $ X_n $ has MGF $ M_{X_n}(t) $ and $ M_{X_n}(t)\rightarrow
M_X(t) $ for $ t $ not 0, then $ X_n \rightarrow X$ in distribution.

### Example 3

Let $ X_n $ be Bin($ n,p=\lambda/n $). We know that the PMF of $ X_n $ converges to the PMF of Poisson($ \lambda $). Try it again here. We will find

and here we recognize that this is the MGF of Poisson($ \lambda $). By Theorem 6.3.6, $ X_n\rightarrow Poiss(\lambda) $ in distribution.

Let $ X_i, i=1,2,…,n $ be i.i.d with $ E(X_i)=\mu $ and variance $ Var(X_i)=\sigma^2 $.

Let $ Z_i =\frac{X_i-\mu}{\sigma} $, $ S_n=\frac{\sum_{i=1}^{n}Z_i}{\sqrt{n}} $,

(We divide by $ \sqrt(n) $ as a standardization and we know $ Var(S_n)=1 $).

## Central Limit Theorem (CLT)

As $ n\rightarrow\infty $, then $ S_n\rightarrow Normal(0,1)$ in distribution.

This means:

The proof just combines the Taylor formula up to the order 2 and Theorem 6.3.6.

Next, consider

A corollary of the CLT says this:

in distribution.

Proof:

and conclude by changing of variable.

## Theorem of DeMoivre-Laplace

Let $ X_i, i=1,2,3,..,n $ be i.i.d Bernoulli. Let $ W_n = X_1+X_2+\cdots+ X_n $. We know $ W_n\sim Bin(n,p) $.

Let $ S_n =\frac{W_n-np}{\sqrt{np(1-p)}}$,

Therefore, by CLT, $ S_n\rightarrow N(0,1) $ as $ n\rightarrow\infty $ in distribution.

In other words, to be precise,

The speed of convergence in the CLT is known as a “Berry-Esseen” theorem. But the speed of convergence for Binomial CLT is much faster and rule of thrum is $ p\in [1/10, 9/10] $ and $ n\geq 30 $. CLT is good within $ 1\% $ convergence error.

### Example 4

Let

where $ U_i $’s are i.i.d Uniform(0, 1). We know by Weak law of large numbers, that $ M_n\rightarrow \frac{1}{2} $ in probability as $ n\rightarrow\infty $. But how spread out is $ M_n $ around 1/2? For example, can we estimate the chance that $ M_n $ is more than 0.02 away from its mean value 1/2?

Answer:

We will feel comfortable if $ n $ is large enough to make this greater than 0.95. What value should $ n $ at least be.

In order to get this setup, it is known that the right-hand value should be = 1.96.

This value

So 1.96 is therefore know as the 97.5$^{th}$ percentile of $ N(0,1) $. Therefore, we must take

- ← Older-Thinks to Do After Installing Ubuntu
- STT 861 Theory of Prob and STT I Lecture Note - 14-Newer →