# Chapter 1 - Location Estimates and Estimates of Variability¶

This blog post is 1) my attempt to learn the material in the book Practical Statistics for Data Scientists by Bruce and Bruce and 2) better learn the different Python packages needed for statistics and data science.

I have made every attempt to convert the code supplied in the book from R to Python correctly. Any mistakes and errors must be assumed to be mine and mine alone.

#### Objectives¶

• Compute mean, trimmed mean, and median
• Compute standard deviation, IQR, percentiles

• state.csv

#### Packages¶

• pandas
• scipy (for trim_mean)
• Numpy (will be installed with pandas)
In [9]:
import pandas as pd
from scipy import stats
import numpy as np

In [ ]:
state = pd.read_csv('../data/state.csv')

In [3]:
# See Table 1-2 pg 12

Out[3]:
State Population Murder.Rate Abbreviation
0 Alabama 4779736 5.7 AL
2 Arizona 6392017 4.7 AZ
3 Arkansas 2915918 5.6 AR
4 California 37253956 4.4 CA
6 Connecticut 3574097 2.4 CT
7 Delaware 897934 5.8 DE
In [4]:
state["Population"].mean()

Out[4]:
6162876.2999999998
In [7]:
# Trimmed mean -- the mean after removing 10% of data points from either side
stats.trim_mean(state["Population"], 0.1)

Out[7]:
4783697.125
In [8]:
state["Population"].median()

Out[8]:
4436369.5
In [16]:
# need to use NumPy's average function to get a weighted mean
np.average(state["Murder.Rate"], weights=state["Population"])

Out[16]:
4.4458339811233927

#### Standard Deviation¶

In [17]:
state["Population"].std()

Out[17]:
6848235.3474011421

#### Percentiles and IQR¶

In [19]:
# First quantile is 0.25 or the 25th percentile
Q1 = state["Population"].quantile(0.25)
Q1

Out[19]:
1833004.25
In [21]:
# Third quantile is 0.75 or 75th percentile
Q3 = state["Population"].quantile(0.75)
Q3

Out[21]:
6680312.25
In [23]:
# pandas does not have an IQR function but it is easy to compute
# it is just the difference between Q3 and Q1
IQR = Q3 - Q1
IQR

Out[23]:
4847308.0