This blog post is 1) my attempt to learn the material in the book Practical Statistics for Data Scientists by Bruce and Bruce and 2) better learn the different Python packages needed for statistics and data science.
I have made every attempt to convert the code supplied in the book from R to Python correctly. Any mistakes and errors must be assumed to be mine and mine alone.
import pandas as pd from scipy import stats import numpy as np
state = pd.read_csv('../data/state.csv')
# See Table 1-2 pg 12 state.head(8)
# Trimmed mean -- the mean after removing 10% of data points from either side stats.trim_mean(state["Population"], 0.1)
# need to use NumPy's average function to get a weighted mean np.average(state["Murder.Rate"], weights=state["Population"])
# First quantile is 0.25 or the 25th percentile Q1 = state["Population"].quantile(0.25) Q1
# Third quantile is 0.75 or 75th percentile Q3 = state["Population"].quantile(0.75) Q3
# pandas does not have an IQR function but it is easy to compute # it is just the difference between Q3 and Q1 IQR = Q3 - Q1 IQR