# STT 861 Theory of Prob and STT I Lecture Note - 14

2017-12-06

Proof of central limit theorem; multivariate normal distribution and its example; review of the second half of the semester, Poisson approximation vs. normal approximation to binomial distribution; miscellaneous notes of convergence.

# Lecture 14 - Dec 06 2017

## Proof of CLT (Central Limit Theorem)

This is the last lecture of this course.

Let $X_i$ be $n$ i.i.d distribution. $\mu=E(X_i)$, $\sigma^2=Var(X_i)$. WOLOG (without loss of generality), $\mu=0$, $\sigma^2=1$.

It is sufficient to show that MGF of $S_n \rightarrow$ MGF of $Normal(0,1)$ for all $t$ near 0.

where $u=t/\sqrt{n}$.

We will use the following fundamental property of MGF.

In our case: $E(X_1)=0, E(X^2)=1$.

Then expand $M_X(u)$ using Taylor’s formula.

(This is for $u$ near 0. )

Therefore,

for small $u$.

Next,

When $n\rightarrow\infty$,

which is the MGF of N(0,1).

Calculation of the MGF of $Normal(0,\sigma^2)$:

(this part is in the homework 6.)

## Multivariate Normal

Quick review from a new example.

### Example 1

Let $X=(X_1,X_2,…,X_k)$ be a normal vector. Assume $E(X_i)=\mu_i$ and $Cov(X_i,X_j)=\Sigma_{ij}$. Find the MGF of the entire vector.

Let $Y=t_iX_i$. Since $X$ is multivariate normal, then $Y$ is normal. We just need to compute the expectation and variance of $Y$.

where $Z\sim N(0,1)$.

We have proved that

where $t=\sum_{i=1}^{k}\mu_it_i$, $\sigma^2=t^T\Sigma t$. Finally, the full form of MGF of $Y$ is:

We can identify what it means for multivariate normal vectors to be independent of each other.

$X=((X^{(A)})(X^{(B)}))^T$ with length $k=k_A+k_B$, then if $X^{(A)}$ is independent of $X^{(B)}$,

This also implies that if a normal vector has a block diagonal covariance matrix, then the corresponding two subvectors are independent.

### Really Important Example - Example 2

Let $X=(X_1, X_2,…,X_k)$ be i.i.d $N(\mu,\sigma^2)$. Let $\bar{X}=\frac{1}{n}(X_1+X_2+\cdots+X_k )$ (empirical mean)

• what is the covariance $Cov(\bar{X}, X_i)$?

• Forget the above one. Next, what is $Cov(\bar{X}, X_i-\bar{X})$ (this is the actual important question)?

this proves that the bivariate $(\bar{X}, X_i-\bar{X})$ is bivariate normal since the original $X$ is multivariate normal and also $\bar{X}$ and $X_i-\bar{X}$ are independent because their covariance is 0.

• Next part of the example, let $S^2=$ empirical variance $=\frac{1}{n-1}\sum_{i=1}^{n}(X_i-\bar{X})^2$. What’s the relation between $\bar{X}$ and $S^2$?

Since $X_i-\bar{X}$ and $\bar{X}$ are independent for fixed $i$, this is still true of $(X_i-\bar{X})^2$ and $\bar{X}$.

Therefore, the entire vector $Y=(X_i-\bar{X})^n_{i=1}$ is independent of $\bar{X}$. In fact, any function of $Y$ is independent of $\bar{X}$. In particular $S^2$ is independent of $\bar{X}$.

Also,

This is all i.i.d.

This is Chi-Square distribution $\chi^2(n-1)$.

## Review of the Second Part of the Semester

When to use Poisson and when to use Normal to approximate a series of Bernoulli trials?

### Poisson Approximation Case

$X_i \sim Bernoulli(p)$ i.i.d. For example, $X_i=1$ vote for Dr. Jill Stein (or Gary Johnson), $X_i = 0$ otherwise.

$P=P(X_i=1)$ is small. What if $p=\lambda n$ where $n$ is the # of trials?

Then $Y_n=X_1+X_2+\cdots + X_n, Y\sim Bin (n,p)$,

Since average # of successes is constant $\lambda$ then $Bin(n,p)\approx Poisson(\lambda)$. Range $0.2 \leq \lambda \leq 10$.

Note: $Var(Y_n)=np(1-p) = \lambda(1-\frac{\lambda}{n}) \approx\lambda$. If $p=\frac{\lambda}{n}+O(\frac{1}{n})$ then $Bin(n,p) \approx Poisson(\lambda)$ still holds.

### Normal Approximation Case

it works for Binomial and many other partial sums’’ (like Negative Binomial).

In Binomial case, assume $n\rightarrow \infty$ and $p$ is fixed. $\lambda\neq np$ here, not constant.

Let $Y=X_1+\cdots+X_n \sim Bin(n,p)$

Let

then $Z_n\rightarrow N(0,1)$, by CLT.

Note: Unlike Poisson approximation, CLT (Normal approximation) works for any $X_i$ i.i.d as long as $Var(X_i)=\sigma^2<\infty$.

$X_n\rightarrow X$ in distribution $\Leftrightarrow F_{X_n}(x)\rightarrow F_X(x)$ for all $x$

($X_n$ is a sequence of random variables, $X$ is a random variable)

Quick notes:

• If $F$ is not continuous, the convergence does not need to hold for those $x$’s where $F$ is not continuous.

• If there exists $\lambda$ such that $F_{X_n}(x)\rightarrow 0$ for all $x\in \mathbb{R}$ then $X_n$ does not converges in distribution, because there is no r.v. $X$ such that $F_X(x)=0$ all the time.

• Convergence in probability: $X_n\rightarrow X$ in probability if $X_n$ and $X$ all live in the same probability space because we need to evaluate $X_n - X$. And we ask, for any $\varepsilon>0$ no matter how small.

• $P(\vert X_n-X\vert> \varepsilon)$ is really small when $n\rightarrow\infty$.

• $X_n\rightarrow X$ almost surely if $P(\lim X_n=X) = 1$.