STT 861 Theory of Prob and STT I Lecture Note - 14

2017-12-06

Proof of central limit theorem; multivariate normal distribution and its example; review of the second half of the semester, Poisson approximation vs. normal approximation to binomial distribution; miscellaneous notes of convergence.

Portal to all the other notes

Lecture 14 - Dec 06 2017

Proof of CLT (Central Limit Theorem)

This is the last lecture of this course.

Let $X_{i}$ be $n$ i.i.d distribution. $μ = E (X_{i})$ , $σ^{2} = V a r (X_{i})$ . WOLOG (without loss of generality), $μ = 0$ , $σ^{2} = 1$ .

S_{n} = \frac{\sum_{i = 1}^{n} X_{i}}{\sqrt{n}} \to N (0, 1)

It is sufficient to show that MGF of $S_{n} \to$ MGF of $N o r m a l (0, 1)$ for all $t$ near 0.

\begin{aligned} M_{S_{n}} (t) = E (e^{t S_{n}}) & = E (\prod_{i = 1}^{n} e^{t / \sqrt{n} \cdot X_{i}}) \\ = \prod E (e^{t / \sqrt{n} X_{i}}) \\ = (E (\exp (t / \sqrt{n} X_{i})))^{n} \\ E (\exp (t / \sqrt{n} X_{1})) & ≜ M_{X_{1}} (u) \end{aligned}

where $u = t / \sqrt{n}$ .

We will use the following fundamental property of MGF.

\frac{d}{d u} M_{X} (0) = E (\frac{d}{d u} e^{u X}) |_{u = 0} = E (X)

\frac{d^{2}}{d u^{2}} M_{X} (0) = E (X^{2}), \frac{d^{p}}{d u^{p}} M_{X} (0) = E (X^{p})

In our case: $E (X_{1}) = 0, E (X^{2}) = 1$ .

Then expand $M_{X} (u)$ using Taylor’s formula.

M_{X_{1}} (u) = M_{X_{1}} (0) + \frac{d M_{X} (0)}{d u} u + \frac{1}{2} \frac{d^{2} M (0)}{d u^{2}} + \dots

(This is for $u$ near 0. )

Therefore,

M_{X_{1}} (u) = 1 + \frac{1}{2} u^{2} + O (u^{3})

for small $u$ .

Next,

M_{S_{n}} (t) = (1 + \frac{1}{2} (\frac{t}{\sqrt{n}})^{2} + ε (u))^{n}

When $n \to \infty$ ,

M_{S_{n}} (t) \to \exp (\frac{1}{2} t^{2})

which is the MGF of N(0,1).

Calculation of the MGF of $N o r m a l (0, σ^{2})$ :

\begin{aligned} M_{Z} (t) & = E (e^{t Z}) \\ = \int_{- \infty}^{\infty} \exp (z t) \frac{1}{\sqrt{2 π σ^{2}}} \exp (- \frac{z^{2}}{2 σ^{2}}) d z \\ = \int_{- \infty}^{\infty} \frac{1}{\sqrt{2 π σ^{2}}} \exp (- \frac{1}{2 σ^{2}} (z^{2} - 2 σ^{2} t z + σ^{4} t^{2} - σ^{4} t^{2})) d z \\ = \int_{- \infty}^{\infty} \frac{1}{\sqrt{2 π σ^{2}}} \exp (- \frac{1}{2 σ^{2}} (z - 2 σ^{2} t)^{2} + σ^{4} t^{2})) d z \\ = \exp (\frac{1}{2} σ^{2} t^{2}) \end{aligned}

(this part is in the homework 6.)

Multivariate Normal

Quick review from a new example.

Example 1

Let $X = (X_{1}, X_{2}, \dots, X_{k})$ be a normal vector. Assume $E (X_{i}) = μ_{i}$ and $C o v (X_{i}, X_{j}) = Σ_{i j}$ . Find the MGF of the entire vector.

M_{X} (t_{1}, . . ., t_{k}) = E (\exp (\sum_{i = 1}^{k} t_{i} X_{i}))

Let $Y = t_{i} X_{i}$ . Since $X$ is multivariate normal, then $Y$ is normal. We just need to compute the expectation and variance of $Y$ .

\begin{aligned} E (Y) & = \sum_{i = 1}^{k} t_{i} E (X_{i}) = \sum_{i = 1}^{k} t_{i} μ_{i} \\ V a r (Y) & = \sum_{i = 1}^{k} \sum_{j = 1}^{k} C o v (t_{i} X_{i}, t_{j} X_{j}) \\ = \sum \sum t_{i} t_{j} Σ_{i j} = σ^{2} \\ M_{t} (X) & = E (e^{Y}) = E (e^{μ + σ Z}) \end{aligned}

where $Z \sim N (0, 1)$ .

M_{t} (X) = e^{μ} E (e^{σ Z}) = M_{Z} (σ) e^{μ} = e^{μ + \frac{1}{2} σ^{2}}

We have proved that

M_{X} (t) = e^{μ + \frac{1}{2} σ^{2}}

where $t = \sum_{i = 1}^{k} μ_{i} t_{i}$ , $σ^{2} = t^{T} Σ t$ . Finally, the full form of MGF of $Y$ is:

M_{X} (t) = \exp (\sum_{i = 1}^{k} μ_{i} t_{i} + \frac{1}{2} t^{T} Σ t)

We can identify what it means for multivariate normal vectors to be independent of each other.

$X = ((X^{(A)}) (X^{(B)}))^{T}$ with length $k = k_{A} + k_{B}$ , then if $X^{(A)}$ is independent of $X^{(B)}$ ,

Σ = (\begin{matrix} Σ^{(A)} & 0 \\ 0 & Σ^{(B)} \end{matrix})

This also implies that if a normal vector has a block diagonal covariance matrix, then the corresponding two subvectors are independent.

Really Important Example - Example 2

Let $X = (X_{1}, X_{2}, \dots, X_{k})$ be i.i.d $N (μ, σ^{2})$ . Let $\bar{X} = \frac{1}{n} (X_{1} + X_{2} + \dots + X_{k})$ (empirical mean)

what is the covariance $C o v (\bar{X}, X_{i})$ ?

Answer:

\begin{aligned} C o v (\bar{X}, X_{i}) & = E ((\bar{X} - μ) (X_{i} - μ)) \\ = \frac{1}{n} E ((\sum_{j = 1}^{n} (X_{j} - μ)) (X_{i} - μ)) \\ = \frac{1}{n} \sum_{j = 1}^{n} E ((X_{j} - μ) (X_{i} - μ)) \\ = Σ_{j \neq i} + \frac{σ^{2}}{n} = \frac{σ^{2}}{n} \end{aligned}

Forget the above one. Next, what is $C o v (\bar{X}, X_{i} - \bar{X})$ (this is the actual important question)?

\begin{aligned} C o v (\bar{X}, X_{i} - \bar{X}) & = C o v (\bar{X}, X_{i}) - C o v (\bar{X}, \bar{X}) \\ = \frac{σ^{2}}{n} - \frac{σ^{2}}{n} = 0 \end{aligned}

this proves that the bivariate $(\bar{X}, X_{i} - \bar{X})$ is bivariate normal since the original $X$ is multivariate normal and also $\bar{X}$ and $X_{i} - \bar{X}$ are independent because their covariance is 0.

Next part of the example, let $S^{2} =$ empirical variance $= \frac{1}{n - 1} \sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}$ . What’s the relation between $\bar{X}$ and $S^{2}$ ?

Since $X_{i} - \bar{X}$ and $\bar{X}$ are independent for fixed $i$ , this is still true of $(X_{i} - \bar{X})^{2}$ and $\bar{X}$ .

Therefore, the entire vector $Y = (X_{i} - \bar{X})_{i = 1}^{n}$ is independent of $\bar{X}$ . In fact, any function of $Y$ is independent of $\bar{X}$ . In particular $S^{2}$ is independent of $\bar{X}$ .

Also,

\frac{(n - 1) S^{2}}{σ^{2}} = \sum_{i = 1}^{n} (\frac{X_{i} - \bar{X}}{σ})^{2}

This is all i.i.d.

This is Chi-Square distribution $χ^{2} (n - 1)$ .

Review of the Second Part of the Semester

When to use Poisson and when to use Normal to approximate a series of Bernoulli trials?

Poisson Approximation Case

$X_{i} \sim B e r n o u l l i (p)$ i.i.d. For example, $X_{i} = 1$ vote for Dr. Jill Stein (or Gary Johnson), $X_{i} = 0$ otherwise.

$P = P (X_{i} = 1)$ is small. What if $p = λ n$ where $n$ is the # of trials?

Then $Y_{n} = X_{1} + X_{2} + \dots + X_{n}, Y \sim B i n (n, p)$ ,

E (Y_{n}) = n p = λ

Since average # of successes is constant $λ$ then $B i n (n, p) \approx P o i s s o n (λ)$ . Range $0.2 \leq λ \leq 10$ .

Note: $V a r (Y_{n}) = n p (1 - p) = λ (1 - \frac{λ}{n}) \approx λ$ . If $p = \frac{λ}{n} + O (\frac{1}{n})$ then $B i n (n, p) \approx P o i s s o n (λ)$ still holds.

Normal Approximation Case

it works for Binomial and many other partial sums’’ (like Negative Binomial).

In Binomial case, assume $n \to \infty$ and $p$ is fixed. $λ \neq n p$ here, not constant.

Let $Y = X_{1} + \dots + X_{n} \sim B i n (n, p)$

E (Y_{n}) = n p \to \infty

V a r (Y_{n}) = n p (1 - p) \to \infty

Let

Z_{n} = \frac{Y_{n} - n p}{\sqrt{n p (1 - p)}}

then $Z_{n} \to N (0, 1)$ , by CLT.

Note: Unlike Poisson approximation, CLT (Normal approximation) works for any $X_{i}$ i.i.d as long as $V a r (X_{i}) = σ^{2} < \infty$ .

$X_{n} \to X$ in distribution $\Leftrightarrow F_{X_{n}} (x) \to F_{X} (x)$ for all $x$

( $X_{n}$ is a sequence of random variables, $X$ is a random variable)

Quick notes:

If $F$ is not continuous, the convergence does not need to hold for those $x$ ’s where $F$ is not continuous.
If there exists $λ$ such that $F_{X_{n}} (x) \to 0$ for all $x \in R$ then $X_{n}$ does not converges in distribution, because there is no r.v. $X$ such that $F_{X} (x) = 0$ all the time.
Convergence in probability: $X_{n} \to X$ in probability if $X_{n}$ and $X$ all live in the same probability space because we need to evaluate $X_{n} - X$ . And we ask, for any $ε > 0$ no matter how small.
$P (| X_{n} - X | > ε)$ is really small when $n \to \infty$ .
$X_{n} \to X$ almost surely if $P (lim X_{n} = X) = 1$ .