My Avatar

LanternD's Castle

An electronics enthusiast - survive technically

STT 861 Theory of Prob and STT I Lecture Note - 11

2017-11-15

Proof of the "Tower" property; discrete conditional distribution; continuous conditional distribution expectation and variance and their examples; linear predictor and mean squared error.

Portal to all the other notes

Lecture 11 - Nov 15 2017

Reinterpretation of last part of item (c) in proof of Theorem 5.2.1 in textbook.

Recall : let X, Y be 2 random variables. Just to make things simple, assume X and Y are discrete, with PMF’s Px and Py and joint PMF PX,Y.

Generally, for h a function RR,

E(h(X)Y)=E(h(X)g(X))

where g(x)=E(Y|X=x).

We want to think of that result in the following way:

g(X)=E(Y|X)

(think of this as a definition.)

Now we reinterpret the formula above like this:

E(h(x)|Y)=E(E(h(X)Y|X))()=E(h(X)E(Y|X))

The first line means “An expectation can always be written as the expectation of a conditional expectation”. It is known as the “tower” property of conditional expectation.

The second line means: “when conditioning by X, X can be considered as known (non-random) and any factor depending on X can be pulled out of the conditional expectation”.

Proof the Star ():

RHS=E(h(X)g(X))=E(h(X)E(Y|X))=Xh(x)E(Y|X=x)PX(x)=xh(x)yfPX,Y(x,y)PX(x)=xh(x)yyPX,Y(x,y)=xyh(x)yPX,Y(x,y)=E(h(X)Y)

(RHS = right hand side) This is the proof of star.

Next, we use star () to compute the unconditional variance Var(Y) by conditional by X.

Var(Y)=E((YE(Y))2)=E((Yg(X)+g(X)E(Y))2)E(A2+2AB+B2)

where A=Yg(X), B=g(X)E(Y).

We will first compute

E(AB)=E((Yg(X))(g(X)E(Y)))=E(E((Yg(X))(g(X)E(Y))|X))=E(E(Yg(X)|X)(g(X)E(Y)))=E((g(X)g(X))g(X)E(Y))=0

We have proved

Var(Y)=E(A2)+E(B2)

Now we interpret

E(A2)=E((Yg(X))2)=E(E((Yg(X))2|X))=E(Var(Y|X))

Var(Y|X) is also known as v(X).

So we see E(A2)=E(v(X)) (expectation of conditional variance).

Finally,

E(B2)=E((g(X)E(Y))2)=E((g(X)E(E(Y|X)))2)=E((g(X)E(g(X)))2)=Var(g(X))

this is the variance of the conditional expectation.

Homework Problem 5.2.5 Part b

X takes value 0,1,2 with probability 0.3,0.4,0.3, ε=±1 with probability 0.5 and 0.5. Y=5X2+ε.

Q: Find ρ=ρ(X,Y)

ρ(X,Y)=cov(X,Y)Var(X)Var(Y)

E((XμX)(YμY))=E(XY)μXμY=E(X(5X2+ε))=E(5XX3+Xε)=5E(X)E(X3)=E(Xε)=5μXE(X3)

Example 1

N people come into a store in a given day, customer spends Xi dollars. Let T be the total $ of sales for the day.

T=X1+X2++XN=i=1NXi

A: Find E(T) and Var(T).

Assume:

Let’s compute

E(T)=E(E(Xi|N))

E(T)=E(E(Xi|N))=E(NE(Xi))=E(X1)E(N)

We know T is related to N. Therefore we must compute conditional variance

Var(T|N=n)=Var(Xi|N=n)=Var(Xi)=nVar(X1)

We have just proved that v(n)=nVar(X1).

Next,

E(T|N=n)=E(Xi|N=n)=E(Xi)=nE(Xi)

We proved here g(n)=nE(X1), now finally go back to the original formula,

Var(T)=E(v(N)+Var(g(N))=E(NVar(X1))+Var(N(E(X1)))=Var(X1)E(N)+E(X1)2Var(N)

where T=Xi, where Xi are i.i.d and N independent of Xi’s.

Exercise: Prove the following (using similar method of proof as for the Var(T) formula).

Cov(N,T)=E(X1)Var(N)

and therefore,

ρ(N,T)=11+θ

where θ=Var(X)/(E(X1)Var(N)).

Also, for NPoi(λ) and XBer(θ), compute Var(T) and ρ(N,T).

Example 2

let XGeom(p), DNegBin(p,r=X), therefore, D is a certain T, where the N is the X above and each Xi is Geom(p), i.i.d.

Let Y=X+D, find E(Y)F.

E(Y)+E(X)+E(D)=1p+E(X1)E(X)=1p+1p2

Var(Y)=Var(X+D)=Var(X)+Var(D)+2cov(X,D)

Continuous Case

Example 5.3.2

XΓ(α,1), YΓ(β,1).

Let V=X+Y, therefore, VGamma(α+β,1).

Let V=XX+Y, this is called Beta random variable B(α+β).

Let’s now try to prove that U and V are independent.

A: Let g(u)=E(X|U=u). It turns out (Wikipedia) E(V)=αβ.

Therefore E(UV|U=u)=uE(V|U=u)=uE(V)=uαα+β.

This gives us an example where the function g is linear as a function of u because E(X|U=u)=E(UV|U=u)=uαα+β.

This situation where X is linear given U is pretty exceptional.

We call g(x)=E(Y|X=x) the predictor of Y given X. But what is the linear predictor?

Linear Predictor and Mean Squared Error

We would like to predict Y using a linear function of X.

Let aX+b be the linear predictor. Consider the error in replacing Y by aX+b.

We can choose a and b such that E(YaXb)=0.

More systematically, let’s consider what statistic cases might called the mean square error (MSE)

E((Y(aX+b))2)

we want to minimize MSE over all possible choices of the 2 values a and b. It turns out that a=Corr(X,Y)σXσY and best b=E(Y)aE(X).

Note: this is the closely allied to the question of linear regression. It turns out the MSE fir that pair of (a,b) is

1ρ2Var(Y)

This says: the uncertainty level on Y is Var(Y). The proposition of that variance which is explained by X is the variance of aX+b is

Var(aX)=a2Var(X)=ρσY2σX2σX2

and what is not explained by X is the MSE (1ρ2)Var(Y).

Summary: with (a,b) as above and σX2=Var(X), σY2. we see that the amount of variance of Y explained by X is Var(aX)=ρ2σY2 The MSE =(1ρ2)ρY2 is the variance of Yunexplained by X.



Disqus Comment 0