STT 861 Theory of Prob and STT I Lecture Note - 13
2017-11-29
Recap of linear predictor; almost surely convergence, converge in probability, converge in distribution; central limit theorem, theorem of DeMoivre-Laplace.
Portal to all the other notes
- Lecture 01 - 2017.09.06
- Lecture 02 - 2017.09.13
- Lecture 03 - 2017.09.20
- Lecture 04 - 2017.09.27
- Lecture 05 - 2017.10.04
- Lecture 06 - 2017.10.11
- Lecture 07 - 2017.10.18
- Lecture 08 - 2017.10.25
- Lecture 09 - 2017.11.01
- Lecture 10 - 2017.11.08
- Lecture 11 - 2017.11.15
- Lecture 12 - 2017.11.20
- Lecture 13 - 2017.11.29 -> This post
- Lecture 14 - 2017.12.06
Lecture 13 - Nov 29 2017
For Video Recording
Linear Prediction
Recall let
The r.v. is the best predictor of given in the least square sense: minimizes = MSE.
But what about making this MSE as small as possible when is linear? So use notation (instead of ).
We want to minimize
Find and to make this as small as possible. Let
We also know,
where , and ,
We see immediately that this is minimal for and .
Therefore,
This answers the question of what the best linear predictor of given is the mean square sense.
We see the smallest MSE is therefore .
Therefore, we see that the proportion of ’s variance which is not explained by is
Finally, the proportion of ’s variance which is explained by is .
Chapter 6 - Convergences
Definition: We say that the sequence of r.v.’s converges (“a.s.”(almost surely)) to the r.v. if
with probability 1. In other word,
Definition: (A weaker notion of convergence) A sequence of r.v.’s converges in probability to if
Note: [Convince yourself as an exercise at home] Convergence in probability is (easier to achieve) than converge a.s.
Definition: (even weaker version) Let sequence of r.v.’s as above but now let be the CDF of some distribution. We say converges in distribution to the law is
as for every fixed where is continuous.
Note: unlike the previous two notions, here there is no need for a limiting r.v. and the ’s. Do not need to share a probability space with or anyone else.
Example 1
Let , be i.i.d with and is finite. We proved that, with
then
(By Chebyshev’s inequality)
The whole thing goes to 0 as , this proves that in probability.
Note: assuming only exists ( could be infinity), conclusion still holds. See W. Feller’s book in 1950.
Let . This is a stepper function, with increment each step. Let’s try to find out for (integer function, the integer larger than ).
For fixed , as , .
Since the function , is not the CDF of any random variable, this proves that does not converge in distribution, And therefore, cannot converges in any stronger sense (in probability or a.s.).
How about ?
Since
for , this is the CDF of .
Example 2
Let , i.i.d. Let . We can tell that in some sense. Let’s prove it in probability:
Let ,
We proved that in probability.
Now consider . Let’s see about CDF of .
This CDF is exponential. Thus in distribution.
Theorem (6.3.6): Let be a sequence of r.v.’s. If has MGF and for not 0, then in distribution.
Example 3
Let be Bin(). We know that the PMF of converges to the PMF of Poisson(). Try it again here. We will find
and here we recognize that this is the MGF of Poisson(). By Theorem 6.3.6, in distribution.
Let be i.i.d with and variance .
Let , ,
(We divide by as a standardization and we know ).
Central Limit Theorem (CLT)
As , then in distribution.
This means:
The proof just combines the Taylor formula up to the order 2 and Theorem 6.3.6.
Next, consider
A corollary of the CLT says this:
in distribution.
Proof:
and conclude by changing of variable.
Theorem of DeMoivre-Laplace
Let be i.i.d Bernoulli. Let . We know .
Let ,
Therefore, by CLT, as in distribution.
In other words, to be precise,
The speed of convergence in the CLT is known as a “Berry-Esseen” theorem. But the speed of convergence for Binomial CLT is much faster and rule of thrum is and . CLT is good within convergence error.
Example 4
Let
where ’s are i.i.d Uniform(0, 1). We know by Weak law of large numbers, that in probability as . But how spread out is around 1/2? For example, can we estimate the chance that is more than 0.02 away from its mean value 1/2?
Answer:
We will feel comfortable if is large enough to make this greater than 0.95. What value should at least be.
In order to get this setup, it is known that the right-hand value should be = 1.96.
This value
So 1.96 is therefore know as the 97.5 percentile of . Therefore, we must take
- ← Older-Things to Do After Installing Ubuntu
- STT 861 Theory of Prob and STT I Lecture Note - 14-Newer →