# STT 861 Theory of Prob and STT I Lecture Note - 13

2017-11-29

Recap of linear predictor; almost surely convergence, converge in probability, converge in distribution; central limit theorem, theorem of DeMoivre-Laplace.

# Lecture 13 - Nov 29 2017

## For Video Recording

### Linear Prediction

Recall $X, Y$ let

The r.v. $g(x)$ is the best predictor of $Y$ given $X$ in the least square sense: $g$ minimizes $E(g(X)-Y)^2)$ = MSE.

But what about making this MSE as small as possible when $g$ is linear? So use notation $h(x)=a+bx$ (instead of $g$).

We want to minimize

Find $a$ and $b$ to make this as small as possible. Let

We also know,

where $c=b \frac{\sigma_Y}{\sigma_X}$, and $d=\mu_Y-b\mu_X$,

We see immediately that this is minimal for $a=d$ and $c=\rho$.

Therefore,

This answers the question of what the best linear predictor of $Y$ given $X$ is the mean square sense.

We see the smallest MSE is therefore $\sigma_Y^2(1-\rho^2)$.

Therefore, we see that the proportion of $Y$’s variance which is not explained by $X$ is

Finally, the proportion of $Y$’s variance which is explained by $X$ is $\rho^2$.

## Chapter 6 - Convergences

Definition: We say that the sequence of r.v.’s $(X_n)_{n\in\mathbb{N}}$ converges (“a.s.”(almost surely)) to the r.v. $X$ if

with probability 1. In other word,

Definition: (A weaker notion of convergence) A sequence of r.v.’s $(X_n)_{n\in\mathbb{N}}$ converges in probability to $X$ if

Note: [Convince yourself as an exercise at home] Convergence in probability is (easier to achieve) than converge a.s.

Definition: (even weaker version) Let sequence of r.v.’s $(X_n)_{n\in\mathbb{N}}$ as above but now let $F$ be the CDF of some distribution. We say $X_n$ converges in distribution to the law $F$ is

as $n\rightarrow\infty$ for every fixed $x$ where $F(x)$ is continuous.

Note: unlike the previous two notions, here there is no need for a limiting r.v. $X$ and the $X_n$’s. Do not need to share a probability space with $X$ or anyone else.

### Example 1

Let $Y_i$, $i=1,2,3,…$ be i.i.d with $\mu$ and $\sigma^2$ is finite. We proved that, with

then

(By Chebyshev’s inequality)

The whole thing goes to 0 as $n\rightarrow \infty$, this proves that $X_n\rightarrow\mu$ in probability.

Note: assuming only $\mu$ exists ($\sigma^2$ could be infinity), conclusion still holds. See W. Feller’s book in 1950.

Let $X_n=Uniform\{1,2,…,n\}$. This is a stepper function, with $1/n$ increment each step. Let’s try to find out $F_{X_n}(x) =\frac{1}{n}[x]$ for $x\in [0,n]$ (integer function, the integer larger than $x$ ).

For fixed $x\in\mathbb{R^+}$, as $n\rightarrow\infty$, $F_{X_n}(x)\rightarrow 0(\star)$.

Since the function $\star$, is not the CDF of any random variable, this proves that $X_n$ does not converge in distribution, And therefore, $X_n$ cannot converges in any stronger sense (in probability or a.s.).

How about $Y_n=Uniform \{1/n, 2/n, …, n/n\}$?

Since

for $x\in[0,1]$, this is the CDF of $Uniform(0,1)$.

### Example 2

Let $U_n\sim Unif(0,1)$, i.i.d. Let $M_n =\max_{i=1,2,…,n}(U_i)$. We can tell that $M_n\rightarrow1$ in some sense. Let’s prove it in probability:

Let $\varepsilon>0$, % \varepsilon) &= P(1-M_n>\varepsilon) \\ &=P(M_n<1-\varepsilon)\\ &=P(\max_{i=1,2,...,n}U_i<1-\varepsilon) \\ &=P(\forall i: U_i<1-\varepsilon)\\ &=P(\cap_{i=1}^{n}\{U_i<1-\varepsilon\}) \\ &=\prod_{i=1}^{n}P(U_i<1-\varepsilon) =(1-\varepsilon)^n\\ \lim P(\vert M_n-1 \vert >\varepsilon) &=0 \end{align*} %]]>

We proved that $M_n\rightarrow1$ in probability.

Now consider $Y_n=(1-M_n)n$. Let’s see about CDF of $Y_n$.

This CDF is exponential. Thus $Y_n \rightarrow Exp(\lambda = 1)$ in distribution.

Theorem (6.3.6): Let $(X_n)_{n\in\mathbb{N}}$ be a sequence of r.v.’s. If $X_n$ has MGF $M_{X_n}(t)$ and $M_{X_n}(t)\rightarrow M_X(t)$ for $t$ not 0, then $X_n \rightarrow X$ in distribution.

### Example 3

Let $X_n$ be Bin($n,p=\lambda/n$). We know that the PMF of $X_n$ converges to the PMF of Poisson($\lambda$). Try it again here. We will find

and here we recognize that this is the MGF of Poisson($\lambda$). By Theorem 6.3.6, $X_n\rightarrow Poiss(\lambda)$ in distribution.

Let $X_i, i=1,2,…,n$ be i.i.d with $E(X_i)=\mu$ and variance $Var(X_i)=\sigma^2$.

Let $Z_i =\frac{X_i-\mu}{\sigma}$, $S_n=\frac{\sum_{i=1}^{n}Z_i}{\sqrt{n}}$,

(We divide by $\sqrt(n)$ as a standardization and we know $Var(S_n)=1$).

## Central Limit Theorem (CLT)

As $n\rightarrow\infty$, then $S_n\rightarrow Normal(0,1)$ in distribution.

This means:

The proof just combines the Taylor formula up to the order 2 and Theorem 6.3.6.

Next, consider

A corollary of the CLT says this:

in distribution.

Proof:

and conclude by changing of variable.

## Theorem of DeMoivre-Laplace

Let $X_i, i=1,2,3,..,n$ be i.i.d Bernoulli. Let $W_n = X_1+X_2+\cdots+ X_n$. We know $W_n\sim Bin(n,p)$.

Let $S_n =\frac{W_n-np}{\sqrt{np(1-p)}}$,

Therefore, by CLT, $S_n\rightarrow N(0,1)$ as $n\rightarrow\infty$ in distribution.

In other words, to be precise,

The speed of convergence in the CLT is known as a “Berry-Esseen” theorem. But the speed of convergence for Binomial CLT is much faster and rule of thrum is $p\in [1/10, 9/10]$ and $n\geq 30$. CLT is good within $1\%$ convergence error.

### Example 4

Let

where $U_i$’s are i.i.d Uniform(0, 1). We know by Weak law of large numbers, that $M_n\rightarrow \frac{1}{2}$ in probability as $n\rightarrow\infty$. But how spread out is $M_n$ around 1/2? For example, can we estimate the chance that $M_n$ is more than 0.02 away from its mean value 1/2?

We will feel comfortable if $n$ is large enough to make this greater than 0.95. What value should $n$ at least be.
So 1.96 is therefore know as the 97.5$^{th}$ percentile of $N(0,1)$. Therefore, we must take