5. Resampling Methods

Exercise 6: Bootstrap estimates of standard errors of logistic regression coefficient estimates

Get estimates of standard errors from statsmodels
Get boostrap estimates of standard errors

Given the differences between R and Python in this case, I’m not following the structure of this exercise

Get estimates of standard errors from `statsmodels`

import pandas as pd
import statsmodels.formula.api as smf


#import data
default = pd.read_csv("../../datasets/Default.csv", index_col=0)

# add constant
default['const'] = 1
columns = list(default.columns)
columns.remove('const')
default = default[['const'] + columns]

# convert to numeric
default['default'] = [int(value=='Yes') for value in default['default']]
default['student'] = [int(value=='Yes') for value in default['student']]

# fit model
logit = smf.logit(formula='default ~ income + balance', 
                  data=default).fit(disp=0)

logit.summary()

Logit Regression Results
Dep. Variable:	default	No. Observations:	10000
Model:	Logit	Df Residuals:	9997
Method:	MLE	Df Model:	2
Date:	Mon, 03 Dec 2018	Pseudo R-squ.:	0.4594
Time:	09:29:05	Log-Likelihood:	-789.48
converged:	True	LL-Null:	-1460.3
		LLR p-value:	4.541e-292

	coef	std err	z	P>\|z\|	[0.025	0.975]
Intercept	-11.5405	0.435	-26.544	0.000	-12.393	-10.688
income	2.081e-05	4.99e-06	4.174	0.000	1.1e-05	3.06e-05
balance	0.0056	0.000	24.835	0.000	0.005	0.006

Possibly complete quasi-separation: A fraction 0.14 of observations can be
perfectly predicted. This might indicate that there is complete
quasi-separation. In this case some parameters will not be identified.

The estimated standard errors of the coefficient estimates are

logit.bse

Intercept    0.434772
income       0.000005
balance      0.000227
dtype: float64

Get boostrap estimates of standard errors

from sklearn.utils import resample

boot_std_errs = {}
n_boot_samples = 1000

for i in range(n_boot_samples):
    default_boot_sample = resample(default)
    logit = smf.logit(formula='default ~ income + balance', 
                      data=default_boot_sample).fit(disp=0)
    boot_std_errs[i] = logit.bse

df = pd.DataFrame.from_dict(boot_std_errs, orient='index')
df.head()

	Intercept	income	balance
0	0.455075	0.000005	0.000242
1	0.486832	0.000005	0.000253
2	0.454962	0.000005	0.000236
3	0.440095	0.000005	0.000230
4	0.420974	0.000005	0.000220

df.std()

Intercept    2.213157e-02
income       1.415595e-07
balance      1.109741e-05
dtype: float64

These estimates are considerably smaller, and likely more precise.

For more details, see the chapter on bootstrapping in Wasserman’s All of Statistics

islr notes and exercises from An Introduction to Statistical Learning

5. Resampling Methods

Exercise 6: Bootstrap estimates of standard errors of logistic regression coefficient estimates

Get estimates of standard errors from statsmodels

Get boostrap estimates of standard errors

Get estimates of standard errors from `statsmodels`