STT 861 Theory of Prob and STT I Lecture Note - 9

2017-11-01

Review of the important concepts of previous section; moment generation function; Gamma distribution, chi-square distribution.

Portal to all the other notes

Lecture 09 - Nov 01 2017

Quick Review Session (For the mid-term exam)

Bayes’ theorem

Suppose we have data: an even $B$ that happened.

Possible outcomes: $A_{1}, A_{2}, \dots, A_{n}$ .

Model for each $A_{i}$ : $P (A_{i})$ given. This is the “prior” model.

Model for each relation between $A_{i}$ and $B$ : $P (B | A_{i})$ . This is the “likelihood” model.

Theorem:

P (A_{i} | B) = \frac{P (B | A_{i}) P (A_{i})}{\sum_{j = 1}^{n} P (B | A_{j}) P (A_{j})}

Quick Example (The Chevalier de Méré example in Note 3).

$P$ (one six in 4 rolls of a die) = 1 - $P$ (no six in 4 rolls of a die) = $1 - (\frac{5}{6})^{4} \approx 0.5177$

$P$ (one double-six in 24 rolls of 2 dice) = $1 - (\frac{35}{36})^{24} \approx 0.4914$

Discrete and continuous variables

Discrete r.v.’s: $P (X = x_{k}) = P_{k}$ . $E (X) = \sum_{k} x_{k} p_{k}$ .

Continuous Case: $P (a \leq x \leq b) = \int_{a}^{b} f (x) d x$ . $E (X) = \int_{- \infty}^{\infty} x f (x) d x$ .

Linearity

$E (\sum a_{i} X_{i}) = \sum a_{i} E (X_{i})$ .
If $X_{i}$ ’s are independent, then $V a r (a_{i} X_{i}) = \sum (a_{i})^{2} V a r (X_{i})$ .

Chebyshev and Weak law of large numbers

$X$ is a r.v. $V a r (X)$ exists. Then

P (| X - E (X) | > ε) \leq \frac{V a r (X)}{ε^{2}}

This is true no matter how small $ε > 0$ is.

Apply this to $\bar{X} = \frac{1}{n} \sum (X_{i} - E (X_{i}))$ , where $X_{i}$ ’s are i.i.d. and $V a r (X_{i}) < \infty$ .

Note $E (X) = μ = E (X)$ , $V a r (\bar{X}) = \frac{σ^{2}}{n}$ .

By Chebyshev:

P (| \bar{X} - μ | > ε) \leq \frac{σ^{2}}{h ε^{2}}

As $n \to \infty$ , this probability $\to 0$ .

Special discrete distributions

Bernoulli: $X_{1} \sim B e r (p)$ , $E (X_{1}) = p$ , $V a r (X_{1}) = p (1 - p)$ .
Binomial: $X_{2} = X_{11} + X_{12} + \dots + X_{1 n} \sim B i n (n, p)$ , $E (X_{2}) = n p$ , $V a r (X_{2}) = n p (1 - p)$ .
Geometric: $X_{3} \sim G e o m (p)$ , $E (X_{3}) = \frac{1}{p}$ , $V a r (X_{3}) = \frac{1 - p}{p^{2}}$ .
HyperGeometric: pass.
Multinomial: pass.
Negative binomial: $X_{6} \sim N B (r, p)$ , each of them is i.i.d Geometric. $E (X_{6}) = \frac{r}{p}$ , $V a r (X_{6}) = \frac{r (1 - p)}{p^{2}}$ .
Poisson: $X_{7} \sim P o i (λ) = e^{- λ} \frac{λ^{k}}{k!}$ , $E (X_{7}) = λ$ , $V a r (X_{7}) = λ$ . # of arrivals in a fixed interval of time, where $λ$ is the average frequency of arrival.
- Property: let $N_{1} \sim P o i (λ_{1})$ , $N_{2} \sim P o i (λ_{2})$ , $N_{1}$ and $N_{2}$ are independent. then $N = N_{1} + N_{2} \sim P o i (λ_{1} + λ_{2})$ .
- Property 2: Assume arrivals fall in one of $k$ different categories. It turns out that if the total # of arrivals $N \sim P o i (λ)$ and the category of each arrival is independent of $N$ and $P$ (arrivals is category $i$ ) $= p_{i}$ .Then with $N_{i} =$ # arrivals of category $i$ is $N_{i} \sim P o i (λ p_{i})$ .
- More: relation between Poisson and Exponential.

Let $X \sim E x p (λ)$ , the density is $f (x) = λ e^{- λ x}$ for $x \geq 0$ .

Let $X_{i}$ be i.i.d $E x p (λ)$ . Let $N (t)$ be the # of arrivals in time interval $[0, t]$ . Assume $N =$ Poisson process. Then $X_{i}$ is a model for the amount of time between $i - 1$ th and the $i$ th arrivals.

[Use step functions to illustrate.]

Theorem: if $N (t)$ is Poisson( $λ$ ) process, and $T_{i}$ ’s are its jump times (arrival times) and $X_{i} = T_{i} - T i - 1$ , then $X_{i} \sim E x p (λ)$ (i.i.d).

What about the distribution of $T_{i}$ ? $T_{i} \sim Γ (i, θ = \frac{1}{λ})$ .

Here recall $λ$ is a rate parameter, so $θ$ is a scale parameter.

The density of $T_{n}$ is

f (x) = \frac{λ}{Γ (n)} (λ x)^{n - 1} e^{- λ x}

where $x \geq 1$ .

Moment generation function

Method for doing problem 2.2.5.

Let $X$ have the binomial distribution with parameters $n$ and $p$ . Conditionally on $X = k$ , let $Y$ have the binomial distribution with parameters $k$ and $r$ . What is the marginal distribution of $Y$ ?

There are lots of way to solve this problem, here we use moment generate functions (mgf).

Definition: Let $X$ be a r.v. Let

M_{X} (t) = E (e^{t X})

where $t$ is fixed. $M_{X} (t)$ is the moment generation function of $X$ .

It turns out, the function usually characterizes the distribution of $X$ .

Example: let $X \sim B i n (n, p)$ . we know $X = X_{1} + X_{2} + \dots + X_{n}$ (i.i.d Bernoulli( $p$ )).

Now,

\begin{aligned} M_{X} (t) & = E (e^{t X}) \\ = E (e^{X_{1} + X_{2} + \dots + X_{n}}) \\ = E (e^{t X_{1}}) E (e^{t X_{2}}) \dots E (e^{t X_{n}}) \\ = (E (e^{t X_{1}}))^{n} \end{aligned}

and

E (e^{t X_{1}}) = p (e^{t}) + (1 - p) \times 1 = 1 + p (e^{t} - 1)

Therefore,

M_{X} (t) = (1 + p (e^{t} - 1))^{n}

Now look at Problem 2.2.5.

Y \sim B i n (B i n (n, p), r)

Therefore

Y = Y_{1} + Y_{2} + \dots + Y_{X}

where $Y_{i}$ are i.i.d Bernoulli $(r)$ .

Hunch: $Y$ is $B i n o m i a l (a, b)$ . To prove it: compute $M_{Y} (t) = (1 + b (e^{t} - 1))^{a}$

\begin{aligned} M_{Y} (t) & = E (e^{t Y}) \\ = E (e^{t (Y_{1} + Y_{2} + \dots + Y_{X})}) \\ = \sum_{k = 0}^{n} (E (. . . | X = k)) P (X = k) \\ = \sum_{k = 0}^{n} E (e^{t \cdot B i n (k, r)}) P (X = k) \\ = \sum_{k = 0}^{n} (1 + r (e^{t} - 1))^{k} P (X = k) \\ = \sum_{k = 0}^{n} e^{k \ln (1 + r (e^{t} - 1))} \\ = \sum_{k = 0}^{n} e^{k u} P_{k} \end{aligned}

This is the defeinition of $E (e^{u X}) ≜ M_{X} (u)$ .

\begin{aligned} M_{X} (u) & = (1 + p (e^{u} - 1))^{n} \\ = (1 + p (e^{\ln (1 + r (e^{t} - 1))}) - 1)^{n} \\ = (1 + p (1 + r (e^{t} - 1) - 1))^{n} \\ = (1 + p r (e^{t} - 1))^{n} \end{aligned}

Therefore we recognize that $Y \sim B i n o m (n, p r)$ .

Gamma Distribution

Go back to Gamma distribution.

Example

let $Z \sim N (0, 1)$ (standard normal). the $f_{Z} (z) = \frac{1}{\sqrt{2 π}} \exp (\frac{z^{2}}{2})$ . Find the density of $Y = Z^{2}$ .

\begin{aligned} F_{Y} (y) & = P (Y \leq y) = P (Z^{2} \leq y) \\ = P (- \sqrt{y} \leq Z \leq \sqrt{y}) \\ = F_{Z} (\sqrt{y}) - F_{Z} (- \sqrt{y}) \end{aligned}

Use chain rule to compute $f_{Y} (y)$ .

\begin{aligned} f_{Y} (y) & = \frac{d F_{Y}}{d y} = \frac{d}{d y} F (\sqrt{y}) - \frac{d}{d y} F_{Z} (- \sqrt{y}) \\ = f_{Z} (\sqrt{y}) \frac{1}{2 \sqrt{y}} - f_{Z} (- \sqrt{y}) \frac{- 1}{2 \sqrt{y}} \\ = \frac{1}{\sqrt{2 π}} y^{\frac{1}{2}} e^{- \frac{y}{2}} \end{aligned}

We recognize this is the density of $Γ (α = \frac{1}{2}, θ = 2)$ .

Chi-square distribution and degree of freedom

This Gamma and every Gamma for which $α = \frac{n}{2}$ , where $n$ is an integer, is called $χ^{2} (n)$ (“Chi-squared” with $n$ degrees of freedom).

We see $χ^{2} (n) \sim Z_{1}^{2} + Z_{2} + 6 + \dots + Z_{n}^{2}$ . where $Z_{i} \sim i . i . d N (0, 1)$ .

χ^{2} (n) \equiv Γ (\frac{n}{2}, 2)

Q: What about $χ^{2} (2)$ ?

A: $\sim G a m m a (1, 2)$ , which is exponential distribution with parameter $λ = \frac{1}{2}$ .

Q: Now to create $X \sim E x p (λ = 1)$ using only i.i.d normals $N (0, 1)$ .

Try this: $X = Z_{1}^{2} + Z_{2}^{2} \sim E x p (\frac{1}{2})$ .

X = \frac{1}{2} (Z_{1}^{2} + Z_{2}^{2}) \sim E x p (1)

When we need to multiply a scale parameter $θ$ by a constant $c$ , we multiply the random variable by $c$ .

Equivalently, when we need to multiply a rate parameter $λ$ by $c$ , just divide the random variable by $c$ .

← Older-Spacemacs Rocks Note Day 5

Spacemacs Rocks Note Day 6-Newer →

Disqus Comment （0）

LanternD's Castle

An electronics enthusiast - survive technically