islr notes and exercises from An Introduction to Statistical Learning

5. Resampling Methods

Conceptual Exercises

Exercise 1: Minimize the weighted sum of two random variables

Using basic statistical properties of the variance, as well as single- variable calculus, derive (5.6). In other words, prove that α given by (5.6) does indeed minimize Var(αX+(1α)Y)(\alpha X + (1 − \alpha)Y)

Using properties of variance we have

Var(αX+(1α)Y)=α2σX2+(1α)2σY2+2α(1α)σXY\text{Var}(\alpha X + (1 - \alpha) Y) = \alpha^2\sigma^2_X + (1 - \alpha)^2\sigma^2_Y + 2\alpha(1-\alpha)\sigma_{XY}

Taking the derivative with respect to α\alpha, set to zero

2ασX22(1α)σY2+2(12α)σXY=02\alpha\sigma^2_X - 2(1 - \alpha)\sigma^2_Y + 2(1-2\alpha)\sigma_{XY} = 0

solve for α\alpha to find

α=σY2σXYσX2+σY22σXY\alpha = \frac{\sigma^2_Y - \sigma_{XY}}{\sigma^2_X + \sigma^2_Y - 2\sigma_{XY}}

Exercise 2: Derive the probability an observation appears in a bootstrap sample

a.

What is the probability that the first bootstrap observation is not the jth observation from the original sample? Justify your answer.

P(first bootstrap observation is not jth observation)==1P(first bootstrap observation is jth observation)=11n \begin{aligned} P(\text{first bootstrap observation is not}\ j-\text{th observation}) &= \\ &= 1 - P(\text{first bootstrap observation is}\ j-\text{th observation})\\ &= 1 - \frac{1}{n} \end{aligned}

Since the boostrap observations are chosen uniformly at random

b.

What is the probability that the second bootstrap observation is not the jth observation from the original sample?

The probability is still 11n1 - \frac{1}{n} since the bootstrap samples are drawn with replacement

c.

Let

A=the jth observation is not in the bootstrap sampleA = \text{the}\ j-\text{th observation is not in the bootstrap sample} Ak=the kth bootstrap observation is not the jth observationA_k = \text{the}\ k-\text{th bootstrap observation is not the}\ j-\text{th observation}

Then since the bootstrap observations are drawn uniformly at random the AkA_k are independent and P(Ak)=11nP(A_k) = 1- \frac{1}{n} hence

Pa.=P(k=1nAk)=k=1nP(Ak)=k=1n(11n)=(11n)n \begin{aligned} Pa. &= P\left(\cap_{k = 1}^n A_k\right)\\ &= \prod_{k = 1}^n P(A_k)\\ &= \prod_{k = 1}^n \left(1 - \frac{1}{n}\right)\\ &= \left(1 - \frac{1}{n}\right)^n \end{aligned}

d.

We have

Ac=the jth observation is in the bootstrap sampleA^c = \text{the}\ j-\text{th observation is in the bootstrap sample}

So

P(Ac)=1Pa.=1(11n)nP(A^c) = 1 - Pa. = 1 - (1 - \frac{1}{n})^n

When n=5n=5, P(Ac)=P(A^c) =

1 - (1 - 1/5)**5
0.6723199999999999

e.

When n=100,Pa.n=100, Pa. is

1 - (1 - 1/100)**100
0.6339676587267709

f.

When n=104,Pa.n=10^4, Pa. is

1 - (1 - 1/10e4)**10e4
0.6321223982317534

g.

import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline
plt.style.use('seaborn-white')
x = np.arange(1, 100000, 1)
y = 1 - (1 - 1/x)**x

plt.plot(x, y, color='r')
[<matplotlib.lines.Line2D at 0x11d180860>]

png

The probability rapidly drops to around 23\frac{2}{3}

x = np.arange(1, 10, 1)
y = 1 - (1 - 1/x)**x

plt.plot(x, y, color='r')
[<matplotlib.lines.Line2D at 0x121806ac8>]

png

then slowly asymptotically approaches the limit

limn1(11n)n=1e10.6321 \underset{n \rightarrow \infty}{\lim} 1 - (1 - \frac{1}{n})^n = 1 - e^{-1} \approx 0.6321

h.

data = np.arange(1, 101, 1)

sum([4 in np.random.choice(data, size=100, replace=True) for i in range(10000)])/10000
0.6308

Very close to the expected value of

1 - (1 - 1/100)**100
0.6339676587267709

Exercise 3: kk-fold Cross Validation

See section 5.1.3 in the notes

Exercise 4: Estimate the standard deviation of a predicted reponse

Suppose given (X,Y)(X, Y) we predict Y^\hat{Y}. This is an estimator [^0]. To estimate its standard error using data (x1,y1),,(xn,yn)(x_1, y_1), \dots, (x_n, y_n) use the “plug-in” estimator 1.

se^(Y^)=1ni=1n(y^iy^)2\hat{se}(\hat{Y}) = \sqrt{\frac{1}{n} \sum_{i = 1}^ n \left(\hat{y}_i - \overline{\hat{y}}\right)^2}

where y^i\hat{y}_i is the predicted value for xix_i and y^\overline{\hat{y}} is the mean predicted value.

In other words, use the sample standard deviation of the predicted values.

Footnotes

0.

  1. An estimator is a statistic (a function of the data) used to estimate a population quantity – it is a random variable corresponding to the statistical learning method we use and dependent on the observed data.