STT 861 Theory of Prob and STT I Lecture Note - 14

2017-12-06

Proof of central limit theorem; multivariate normal distribution and its example; review of the second half of the semester, Poisson approximation vs. normal approximation to binomial distribution; miscellaneous notes of convergence.

Lecture 14 - Dec 06 2017

Proof of CLT (Central Limit Theorem)

This is the last lecture of this course.

Let $X_i$ be $n$ i.i.d distribution. $\mu=E(X_i)$, $\sigma^2=Var(X_i)$. WOLOG (without loss of generality), $\mu=0$, $\sigma^2=1$.

$S_n=\frac{\sum_{i=1}^{n}X_i}{\sqrt{n}} \rightarrow N(0,1)$

It is sufficient to show that MGF of $S_n \rightarrow$ MGF of $Normal(0,1)$ for all $t$ near 0.

\begin{align*} M_{S_n}(t)=E(e^{tS_n})&=E(\prod_{i=1}^{n}e^{t/\sqrt{n}\cdot X_i}) \\ &=\prod E(e^{t/\sqrt{n}X_i}) \\ &=(E(\exp(t/\sqrt{n}X_i)))^n \\ E(\exp(t/\sqrt{n}X_1))&\triangleq M_{X_1}(u) \end{align*}

where $u=t/\sqrt{n}$.

We will use the following fundamental property of MGF.

$\frac{d}{du}M_X(0)=E(\frac{d}{du}e^{uX})|_{u=0}=E(X)$ $\frac{d^2}{du^2}M_X(0)=E(X^2), \frac{d^p}{du^p}M_X(0)=E(X^p)$

In our case: $E(X_1)=0, E(X^2)=1$.

Then expand $M_X(u)$ using Taylor’s formula.

$M_{X_1}(u)=M_{X_1}(0) + \frac{dM_X(0)}{du}u + \frac{1}{2}\frac{d^2M(0)}{du^2}+\cdots$

(This is for $u$ near 0. )

Therefore,

$M_{X_1}(u)=1+\frac{1}{2}u^2+O(u^3)$

for small $u$.

Next,

$M_{S_n}(t)=(1+\frac{1}{2}(\frac{t}{\sqrt{n}})^2+\varepsilon(u))^n$

When $n\rightarrow\infty$,

$M_{S_n}(t) \rightarrow \exp(\frac{1}{2}t^2)$

which is the MGF of N(0,1).

Calculation of the MGF of $Normal(0,\sigma^2)$:

\begin{align*} M_Z(t) & = E(e^{tZ}) \\ &= \int_{-\infty}^{\infty} \exp(zt) \frac{1}{\sqrt{2\pi\sigma^2}} \exp(-\frac{z^2}{2\sigma^2})dz \\ &= \int_{-\infty}^{\infty} \frac{1}{\sqrt{2\pi\sigma^2}} \exp(-\frac{1}{2\sigma^2}(z^2-2\sigma^2tz+\sigma^4t^2-\sigma^4t^2))dz \\ &= \int_{-\infty}^{\infty}\frac{1}{\sqrt{2\pi\sigma^2}} \exp(-\frac{1}{2\sigma^2}(z-2\sigma^2t)^2+\sigma^4t^2)) dz \\ &= \exp(\frac{1}{2}\sigma^2t^2) \end{align*}

(this part is in the homework 6.)

Multivariate Normal

Quick review from a new example.

Example 1

Let $X=(X_1,X_2,…,X_k)$ be a normal vector. Assume $E(X_i)=\mu_i$ and $Cov(X_i,X_j)=\Sigma_{ij}$. Find the MGF of the entire vector.

$M_X(t_1,...,t_k)=E(\exp (\sum_{i=1}^{k}t_iX_i))$

Let $Y=t_iX_i$. Since $X$ is multivariate normal, then $Y$ is normal. We just need to compute the expectation and variance of $Y$.

\begin{align*} E(Y) & = \sum_{i=1}^{k} t_iE(X_i) = \sum_{i=1}^{k}t_i\mu_i \\ Var(Y)&= \sum_{i=1}^{k}\sum_{j=1}^{k}Cov(t_iX_i,t_jX_j) \\ &=\sum\sum t_it_j \Sigma_{ij} =\sigma^2\\ M_t(X)&=E(e^Y)=E(e^{\mu+\sigma Z}) \end{align*}

where $Z\sim N(0,1)$.

$M_t(X) = e^\mu E(e^{\sigma Z}) =M_Z(\sigma)e^\mu = e^{\mu+\frac{1}{2}\sigma^2}$

We have proved that

$M_X(t) = e^{\mu+\frac{1}{2}\sigma^2}$

where $t=\sum_{i=1}^{k}\mu_it_i$, $\sigma^2=t^T\Sigma t$. Finally, the full form of MGF of $Y$ is:

$M_X(t) = \exp(\sum_{i=1}^{k}\mu_it_i + \frac{1}{2}t^T\Sigma t)$

We can identify what it means for multivariate normal vectors to be independent of each other.

$X=((X^{(A)})(X^{(B)}))^T$ with length $k=k_A+k_B$, then if $X^{(A)}$ is independent of $X^{(B)}$,

$\Sigma= \begin{pmatrix} \Sigma^{(A)} & 0\\ 0 & \Sigma^{(B)} \end{pmatrix}$

This also implies that if a normal vector has a block diagonal covariance matrix, then the corresponding two subvectors are independent.

Really Important Example - Example 2

Let $X=(X_1, X_2,…,X_k)$ be i.i.d $N(\mu,\sigma^2)$. Let $\bar{X}=\frac{1}{n}(X_1+X_2+\cdots+X_k )$ (empirical mean)

• what is the covariance $Cov(\bar{X}, X_i)$?

\begin{align*} Cov(\bar{X}, X_i) & =E\Big((\bar{X}-\mu)(X_i-\mu)\Big) \\ &=\frac{1}{n}E\Big(\big(\sum_{j=1}^{n}(X_j-\mu)\big)(X_i-\mu)\Big) \\ &=\frac{1}{n}\sum_{j=1}^{n}E\Big((X_j-\mu)(X_i-\mu)\Big)\\ &=\Sigma_{j\neq i} + \frac{\sigma^2}{n} = \frac{\sigma^2}{n} \end{align*}
• Forget the above one. Next, what is $Cov(\bar{X}, X_i-\bar{X})$ (this is the actual important question)?
\begin{align*} Cov(\bar{X},X_i-\bar{X}) & = Cov(\bar{X}, X_i) -Cov(\bar{X}, \bar{X})\\ &= \frac{\sigma^2}{n} - \frac{\sigma^2}{n} = 0 \end{align*}

this proves that the bivariate $(\bar{X}, X_i-\bar{X})$ is bivariate normal since the original $X$ is multivariate normal and also $\bar{X}$ and $X_i-\bar{X}$ are independent because their covariance is 0.

• Next part of the example, let $S^2=$ empirical variance $=\frac{1}{n-1}\sum_{i=1}^{n}(X_i-\bar{X})^2$. What’s the relation between $\bar{X}$ and $S^2$?

Since $X_i-\bar{X}$ and $\bar{X}$ are independent for fixed $i$, this is still true of $(X_i-\bar{X})^2$ and $\bar{X}$.

Therefore, the entire vector $Y=(X_i-\bar{X})^n_{i=1}$ is independent of $\bar{X}$. In fact, any function of $Y$ is independent of $\bar{X}$. In particular $S^2$ is independent of $\bar{X}$.

Also,

$\frac{(n-1)S^2}{\sigma^2}=\sum_{i=1}^{n}(\frac{X_i-\bar{X}}{\sigma})^2$

This is all i.i.d.

This is Chi-Square distribution $\chi^2(n-1)$.

Review of the Second Part of the Semester

When to use Poisson and when to use Normal to approximate a series of Bernoulli trials?

Poisson Approximation Case

$X_i \sim Bernoulli(p)$ i.i.d. For example, $X_i=1$ vote for Dr. Jill Stein (or Gary Johnson), $X_i = 0$ otherwise.

$P=P(X_i=1)$ is small. What if $p=\lambda n$ where $n$ is the # of trials?

Then $Y_n=X_1+X_2+\cdots + X_n, Y\sim Bin (n,p)$,

$E(Y_n) = np=\lambda$

Since average # of successes is constant $\lambda$ then $Bin(n,p)\approx Poisson(\lambda)$. Range $0.2 \leq \lambda \leq 10$.

Note: $Var(Y_n)=np(1-p) = \lambda(1-\frac{\lambda}{n}) \approx\lambda$. If $p=\frac{\lambda}{n}+O(\frac{1}{n})$ then $Bin(n,p) \approx Poisson(\lambda)$ still holds.

Normal Approximation Case

it works for Binomial and many other partial sums’’ (like Negative Binomial).

In Binomial case, assume $n\rightarrow \infty$ and $p$ is fixed. $\lambda\neq np$ here, not constant.

Let $Y=X_1+\cdots+X_n \sim Bin(n,p)$

$E(Y_n) = np\rightarrow\infty$ $Var(Y_n)= np(1-p)\rightarrow\infty$

Let

$Z_n=\frac{Y_n-np}{\sqrt{np(1-p)}}$

then $Z_n\rightarrow N(0,1)$, by CLT.

Note: Unlike Poisson approximation, CLT (Normal approximation) works for any $X_i$ i.i.d as long as $Var(X_i)=\sigma^2<\infty$.

$X_n\rightarrow X$ in distribution $\Leftrightarrow F_{X_n}(x)\rightarrow F_X(x)$ for all $x$

($X_n$ is a sequence of random variables, $X$ is a random variable)

Quick notes:

• If $F$ is not continuous, the convergence does not need to hold for those $x$’s where $F$ is not continuous.

• If there exists $\lambda$ such that $F_{X_n}(x)\rightarrow 0$ for all $x\in \mathbb{R}$ then $X_n$ does not converges in distribution, because there is no r.v. $X$ such that $F_X(x)=0$ all the time.

• Convergence in probability: $X_n\rightarrow X$ in probability if $X_n$ and $X$ all live in the same probability space because we need to evaluate $X_n - X$. And we ask, for any $\varepsilon>0$ no matter how small.

• $P(\vert X_n-X\vert> \varepsilon)$ is really small when $n\rightarrow\infty$.

• $X_n\rightarrow X$ almost surely if $P(\lim X_n=X) = 1$.