Book: Practical Statistics for Data Scientist

For the past several months I have been trying to get a better understanding of statistics and a better feel of when, where and how it should be applied. Several years ago I took an online statistic course from the local community college. The course didn't have video lectures and it was basically read the book, do some problems, take tests, oh and you had to answer a question and post a question on the forum. It was a decent course as far as "plug-n-chug" equations go but I didn't get a good grasp of where the equations were applied.

I looked at some online courses given by Coursera and edX but they usually entailed having to use R. And again I really not interested in learning R right now.

While on vacation I stopped at my favorite brick and mortar store, Barnes and Nobles when I came across Practical Statistics for Data Scientists by Peter Bruce and Andrew Bruce. Flipping through it I got the feeling this was more of what I was looking for as far as HOW to use statistics. I also liked the format of the book. There were fairly detailed explanations of the topic, graphs and code, as well as section at the end of each topic that listed other resources if you wanted to go more into the topic that was just covered.

Once again though, the code examples were written in R.

I have jumped on the Python bandwagon as of late and fairly comfortable with straight imperative programming and getting more comfortable with OOP aspects of the language. Some packages, like matplotlib I have to Google heavily to get anything done. Other packages like scikit-learn I haven't even touched.

So I figured what better way to learn the material in the book and get better at using different packages than to convert the material in the book from R to Python.

The first thing I did was create a script to download the data files: Data Download

Here is a link to the book: Practical Statistics for Data Scientists