(PSDS): Download data sets

The following code downloads the required data sets that are used throughout the book.

The ids were copied from download_data.r which can be found in the src/ directory at: https://github.com/andrewgbruce/statistics-for-data-scientists

GitHub repository for this notebook: https://github.com/fm2606/python-psds

In [6]:
import urllib.request as ur
In [9]:
# key = filename
# value = id of data file on google drive
files = {
    "state": '0B98qpkK5EJembFc5RmVKVVJPdGc',
    "dfw_airline": "0B98qpkK5EJemcmZYX2VhMHBXelE",
    "sp500_px": "0B98qpkK5EJemV2htZWdhVFRMNlU",
    "sp500_sym": "0B98qpkK5EJemV2htZWdhVFRMNlU",
    "kc_tax": "0B98qpkK5EJemck5VWkszN3F3RGM",
    "lc_loans": "0B98qpkK5EJemRXpfa2lONlFRSms",
    "loan200": "0B98qpkK5EJemd0JnQUtjb051dTA",
    "loan300": "0B98qpkK5EJemQXYtYmJUVkdsN1U",
    "loan_data": "0B98qpkK5EJemZzdoQ2I3SWlBYzg",
    "loans_income": "0B98qpkK5EJemRXVld0NSbWhYNVU",
    "web_page_data": "0B98qpkK5EJemOC0xMHBTTEowYzg",
    "four_sessions": "0B98qpkK5EJemOFdZM1JsaEF0Mnc",
    "click_rates": "0B98qpkK5EJemVHB0ZzdtUG9SeTg",
    "imanishi_data": "0B98qpkK5EJemZTJnUDd5Ri1vRDA",
    "LungDisease": "0B98qpkK5EJemb25YYUFJZnZVSnM",
    "County_Zhvi_AllHomes": "0B98qpkK5EJemWGRWOEhYN1RabVk",
    "house_sales": "0B98qpkK5EJemVTRRN0dLakxwTmM"
}
In [15]:
# download a dataset given the id and the name (key)
# in the data directory.  
def download_from_google_drive(id, name):
    path = "./data/"
    fname = path + name + ".csv"
    url = "https://drive.google.com/uc?export=download&id={}".format(id)
    ur.urlretrieve(url, fname)
In [16]:
for f in files:
    print("Downloading " + f + "...")
    download_from_google_drive(files[f], f)
Downloading house_sales...
Downloading County_Zhvi_AllHomes...
Downloading state...
Downloading loan200...
Downloading loan300...
Downloading web_page_data...
Downloading lc_loans...
Downloading loan_data...
Downloading four_sessions...
Downloading click_rates...
Downloading LungDisease...
Downloading dfw_airline...
Downloading kc_tax...
Downloading sp500_px...
Downloading sp500_sym...
Downloading loans_income...
Downloading imanishi_data...