STT 861 Theory of Prob and STT I Lecture Note - 5

2017-10-04

Sample mean and sample variance, biased and unbiased estimation; covariance, Hypergeometric distribution and its example; correlation coefficients; discrete distribution, Poisson distribution, Poisson approximation for the Binomial distribution.

Portal to all the other notes

Lecture 05 - Oct 05 2017

Sample mean and sample variance

Recall： The proposition $E ((X - c)^{2})$ .

Now consider some data $x_{i}$ , $i = 1, 2, 3, \dots, n$ . We imagine that this data comes from an experiment which is repeated $n$ times independently. This means that $x_{i}$ represents a r.v. $x_{i}$ , where the $x_{i}$ s are i.i.d.

We are accustomed to using the notation

\bar{x} = \frac{1}{n} \sum_{i = 1}^{n} x_{i}

This is called “sample mean”.

{\hat{σ}}^{2} = \frac{1}{n} \sum_{i = 1}^{n} (x_{i} - \bar{x})^{2}

This is called “sample variance”.

Now investigate the statistical properties of the two “estimators”. Replace $x_{i}$ by $X_{i}$ and try this.

Notation: $\bar{x}$ is for data points, while $\bar{X}$ is for the model notation.

Find the $E (\bar{X})$ . If $E (\bar{X}) = μ$ , then we say $\bar{X}$ is unbiased.

Find $E ({\hat{σ}}^{2})$ . Is it $V a r (X)$ ? It might be biased.

E (\bar{X}) = E (\frac{1}{n} \sum_{i = 1}^{n} X_{i}) = \frac{1}{n} \sum_{i = 1}^{n} E (X_{i}) = μ

E ({\hat{σ}}^{2}) = E (\frac{1}{n} \sum_{i = 1}^{n} (X_{i} - \bar{X})^{2})

The left-hand side of the formula in the previous proposition applied to a r.v. $X$ , which is equal to $X_{i}$ with prob $= \frac{1}{n}$ .

The stuff inside the parenthesis is actually the expectation of a r.v. equal to $X_{i} - \bar{X}$ .

\begin{aligned} \frac{1}{n} \sum_{i = 1}^{n} (X_{i} - \bar{X})^{2} & = \frac{1}{n} \sum_{i = 1}^{n} (X_{i} - c)^{2} - (\bar{X} - c)^{2} \\ = \frac{1}{n} \sum_{i = 1}^{n} (X_{i} - μ)^{2} - (\bar{X} - μ)^{2} \\ \Rightarrow E [{\hat{σ}}^{2}] & = \frac{1}{n} \sum E (X_{i} - μ) - E ((\bar{X} - μ)^{2}) \\ = \frac{1}{n} n V a r (X) - E ((\frac{1}{n} \sum x_{i} - μ)^{2}) \\ = V a r (X) - E (\frac{1}{n} \sum (x_{i} - μ)^{2}) \\ = - E ((\sum x_{i} - μ)^{2}) \\ = V a r (X) - \frac{1}{n} V a r (X) \\ = (1 - \frac{1}{n}) V a r (X) \end{aligned}

As a result, this is not exactly $= V a r (X)$ . Thus, ${\hat{σ}}^{2}$ is biased.

Let’s define an unbiased estimator for $V a r (X)$ , We just need to take

S^{2} = \frac{1}{n - 1} \sum_{i = 1}^{n} (x_{i} - \bar{X})^{2}

It is unbiased estimation of $V a r (X)$ .

Covariance (Chapter 1.7)

Definition: Let $X$ & $Y$ be two r.v.s living on the same prob space.

c o v (X, Y) = E ((X - E (X)) (Y - E (Y))

Property: If $X$ & $Y$ are independent, then $c o v (X, Y) = 0$ . Be aware, The statement is usually false, i.e. $c o v (X, Y) \neq 0$ .

Note: if $X = Y$ , $c o v (X, Y) = V a r (X)$ .

Property: Let $X_{i}$ , $i = 1, 2, \dots, n$ be r.v.’s.

\begin{aligned} V a r (\sum_{i = 1}^{n} X_{i}) & = \sum_{i = 1}^{n} \sum_{j = 1}^{n} c o v (X_{i}, X_{j}) \\ = \sum_{i = 1}^{n} V a r (X_{i}) + \sum_{i = 1}^{n} \sum_{j = 1 \neq i}^{n} c o v (X_{i}, X_{j}) \end{aligned}

Hypergeometric distribution

Application of the previous formula: The variance of the Hypergeometric distribution (no details here, see the book).

Definition: The hypergeometric distribution with parameter $(n, N)$ is the distribution of the r.v. $X$ of the number of elements from a distinguish subset of size $n$ , when one picks a sample of size $k$ without replacement from the $N$ elements.

Example 1

The number $X$ of women in a sample of size $k = 5$ taken without replacement from a group with 8 women & 12 men has this hypergeometric distribution with $N = 8 + 12 = 20$ and $n = 8$ .

It turns out that

V a r (X) = k n \frac{N - n}{N - 1}

Comments: use notation $p = n / N$ , then

V a r (X) = k N p \frac{1 - p}{1 - \frac{1}{N}}

Notice: If $N$ is large, the $\frac{N}{N - 1}$ is almost $= 1$ . So this variance is almost the variance of a binomial with success parameter $p$ . This is because if $k$ is much smaller than $N$ , sampling without replacement is almost like sampling with replacement.

This “binomial approximation to the hypergeometric law” works well if $k ≪ N$ , except if $p = n / N$ is too close to 1 or 1.

Correlation coefficients

Let $X$ and $Y$ be two r.v.’s. We standardize them let

Z_{X} = (X - μ_{X}) / σ_{X}

Z_{Y} = (Y - μ_{Y}) / σ_{Y}

where $μ_{X} = E (X)$ , $μ_{Y} = E (Y)$ , $σ_{X} = \sqrt{V a r (X)}$ , $σ_{Y} = \sqrt{V a r (Y)}$ .

Notice that $E (Z_{X}) = E (Z_{Y}) = 0$ , $V a r (Z_{X}) = V a r (Z_{Y}) = 1$ .

Definition: The correlation coefficient between $X$ and $Y$ is

C o r r (X, Y) = C o v (Z_{X}, Z_{Y})

Note: The correlation between $X$ and $Y$ is a value $\in [- 1, 1]$ .

Example 2

Let $X = Y$ , then $C o r r (X, Y) = 1$ .

What if $Y = a X + b$ , where $a$ and $b$ are constants?

$C o r r (X, Y) = 1$ if $a > 0$ , and $= - 1$ if $a < 0$ .

If $X$ and $Y$ are independent, $C o r r (X, Y) = 0$ .

In general, $C o r r (X, Y)$ measures the linear relationship between $X$ and $Y$ .

Main idea: If we have a scatter plot of $x$ and $y$ data, which lines up very well along a straight line, then $C o r r (X, Y) ≜ ρ$ will be close to 1 if the line slope up and close to -1 if slop down.

Property: Because $C o r r (X, Y)$ is defined using the standardized $Z_{X}$ and $Z_{Y}$ , then

C o r r (a X + b, c Y + d) = C o r r (X, Y)

Discrete Distributions (Chapter 2)

Some distributions: $B i n o m (n, p)$ , $G e o m (p)$

Important expression:

$E (B e r (p)) = p$ , $V a r (B e r (p)) = p (1 - p)$ .
$E (B i n o m (n, p)) = n p$ , $V a r (B i n o m (n, p)) = n p (1 - p)$ .
$E (G e o m (p)) = \frac{1}{p}$ , $V a r (G e o m (p)) = \frac{1 - p}{p^{2}}$ .
$E (N B (r, p)) = \frac{r}{p}$ , $V a r (N B (r, p)) = r \frac{1 - p}{p^{2}}$ .

Recall: the intuition behind the formula $E (G e o m (p)) = 1 / p$ : For example, if $p = 1 / 20$ for a success and we should expect wo wait 20 units of time until the first success.

Exercise at home

Prove the $E$ and $V a r$ for $G e o m (p)$ .

Poisson Distribution

Definition: $X$ is Poisson distribution distributed with parameter $λ$ if $X$ takes the values $k = 0, 1, 2, \dots$ and

P (X = k) = e^{- λ} \frac{λ^{k}}{k!}

Compute the expectation,

\begin{aligned} E (X) & = \sum_{k = 0}^{\infty} k e^{- λ} \frac{λ^{k}}{k!} \\ = e^{- λ} \sum_{k = 1}^{\infty} λ^{k} \frac{1}{(k - 1)!} \\ = λ e^{- λ} \sum_{k = 0}^{\infty} λ^{k} \frac{1}{k!} \\ = λ e^{- λ} \times e^{λ} = λ \end{aligned}

(Recall: $e^{x} = \sum_{k = 0}^{\infty} x^{k} \frac{1}{k!}$ , Taylor series)

It turns out $V a r (X) = λ$ [Prove it at home, easier to calculate $E (X (X - 1))$ ].

Quick question: What is $E (X^{2})$ ? $= λ + λ^{2}$ .

Poisson approximation for the Binomial distribution

Idea: if events are rare, they usually follow a Poisson law.

Fact: Let $X$ be $B i n (n, p)$ and assume $p$ is proportional to $1 / n$ : $p = λ / n$ .

Then PMF of $B i n o m (n, p)$ is almost the same as for $P o i (λ)$ . Specifically we mean this:

lim_{n \to \infty} C_{n}^{k} (\frac{λ}{n})^{k} (1 - \frac{λ}{n})^{n - k} = e^{- λ} \frac{λ^{k}}{k!}

If $p$ is small (of order of $1 / n$ ), then $# s u c c e s s \sim B i n (n, p) \approx P o r (λ)$ , $λ = E (# s u c c e s s) = n p$ .

Because of this, Poisson distribution is a good model for number of arrival (of some phenomenon) in a fixed interval of time.

This interpreted as successive units of time (e.g. minutes) in an interval of time, also explains the next property:

Fact: let $n$ and $M$ be two independent Poisson r.v.’s with parameters $λ$ and $μ$ , then $X = N + M$ is Poisson too, with parameter $(λ + μ)$ .

Because $E (X) = E (N) + E (M) = λ + μ$ .

We can use Binomial distribution visualization to prove the fact that $X$ is Poisson.

Exercise

Try to prove $X$ is Poisson using only PMF.

← Older-STT 861 Theory of Prob and STT I Lecture Note - 4

STT 861 Theory of Prob and STT I Lecture Note - 6-Newer →

Disqus Comment （0）

LanternD's Castle

An electronics enthusiast - survive technically