Analyzing NYC Citi Bike data with Azure Databricks

Let’s get started. I am using a subset of the Citi Bike System Data which you can explore and download from here -> tripdata. For this demo I am using data from 2017 only. Once I downloaded the required dataset I then uploaded it to my Azure Blob Storage account.

Next I created a cluster and named it “citibike”, disabled autoscaling and configure 4 workers and then set the auto termination to 60 minutes.

Then I create a Python notebook named “citibike_nyc_analysis” which you can download from GitHub here and import into your Azure Databricks Workspace and execute. The following is the ipynb file (IPython Notebook) rendered in html in an iframe for you to explore.
BTW for those of you near Ottawa, ON (Canada) I will be presenting “Introducing Microsoft Azure Databricks” next Thursday February 15 2018 at the Ottawa Data User Group Meetup
Enjoy!