Taxi trips in NYC

http://chriswhong.com/open-data/foil_nyc_taxi/ released a great dataset on taxi trips in NYC which he got through FOIL. The files are distributed rather inefficiently and without checksums so here is what I felt was missing.

I mirrored the files at https://archive.org/details/nycTaxiTripData2013 and added much smaller 7z files.

MD5 checksums of the original zip files:
16d7ea9735fc8806f2cba51e95f96c4b trip_data_1.csv.zip
5933a699bf289ec53de83970fcb7b4f6 trip_data_2.csv.zip
daed88fc85b71f4d36fe798edc938223 trip_data_3.csv.zip
a7d1a1ec83c95686d22e06b8668f938c trip_data_4.csv.zip
1421686e977907ab5566766660783742 trip_data_5.csv.zip
c5a6c614decd607f38686ea3a7f8a429 trip_data_6.csv.zip
73eebfd6dc7906f2bc758a4d4859ff9b trip_data_7.csv.zip
1ff1ffc336eb6517bc44cce25498eae6 trip_data_8.csv.zip
6eebd8c46ed66ad73b59f3f1445b8200 trip_data_9.csv.zip
052d12e2f9a825563b20e748da257845 trip_data_10.csv.zip
3df0f8292ee92bb0c96145b6f9069c3d trip_data_11.csv.zip
1c5a3a9353e7192a4cf32273bbd1458a trip_data_12.csv.zip
a3b8a092f9062c0431ea40031f2faf03 trip_fare_1.csv.zip
b1fe72a3bbd884e58618657c8150b179 trip_fare_2.csv.zip
d44fde041b643b05da799f3f60880690 trip_fare_3.csv.zip
8566bbb0084044139ac5ff125bc8c45a trip_fare_4.csv.zip
cebe886d4f8e6ef4a02c0453ade714d6 trip_fare_5.csv.zip
b39c3de4825a2e0199771d3499133773 trip_fare_6.csv.zip
b6660dbf87138bc1a03f78c892d2f150 trip_fare_7.csv.zip
e457df4423e910968e9cb0c437e89390 trip_fare_8.csv.zip
f300ff601fdfb8b08fce7165c598029d trip_fare_9.csv.zip
1f54ecba09415ab62ad4c2e2dbed8122 trip_fare_10.csv.zip
505e5b5da25abb0b30de4ab548b22714 trip_fare_11.csv.zip
af54f05a09540685e143f0f362f167ba trip_fare_12.csv.zip

MD5 checksums of the extracted files:
cb4cbb58fd0a679c2cc8b54e6b122752 trip_data_1.csv
6a23454e051bc791b5e9191e231623c6 trip_data_2.csv
c56b41f57cdeebf3f048bbd5b2cb0b22 trip_data_3.csv
d8d58033dfaeaaac4d9b0c4c2db65392 trip_data_4.csv
f3760655c5a86660e3da7689ab1a4d36 trip_data_5.csv
6ead7e108720ef7e5d42401e2e24446a trip_data_6.csv
5029e89ca6a4b9e1c5f7e41f7d9be7f7 trip_data_7.csv
0ba05fc2d13d1c565dc855a335204e59 trip_data_8.csv
266e326535f704a43c4d27f599599a3a trip_data_9.csv
484062a41cdf77a560ac22689e178dd9 trip_data_10.csv
82d1871807bfa0317dc6a655fa2e0e60 trip_data_11.csv
622bf18954cfa28c8dad4275163d437c trip_data_12.csv
8de2725ae9ebd0716c79a00cd7152f75 trip_fare_1.csv
e7a7b8c68dc752e1af1aa1338f6300e7 trip_fare_2.csv
ca76bfdf5216db38c2a632ed55b88a51 trip_fare_3.csv
c52e5ac23011c6e10ffa22601782a025 trip_fare_4.csv
f260ff7a0c97d023ff74a35ee21ee74c trip_fare_5.csv
a0080a3d6003aa1b67bea5efc0377c84 trip_fare_6.csv
588cce29ab1ff422770dd45c07afabad trip_fare_7.csv
ad3cc028b12dab8b20fbffcd75523db5 trip_fare_8.csv
af53ca5f2a6c517a3066ee8dadeb72b7 trip_fare_9.csv
c7c57109241825128781eb3b9968c689 trip_fare_10.csv
e9ce76ebbe19ce786e2cc378fe97bbb6 trip_fare_11.csv
1b6579fc2dad108ac27a1ce1b6c6d9b6 trip_fare_12.csv

The original release was csv files inside zip files inside a zip file. I extracted them and used the free 7-zip to compress. The results are much smaller. The original zip file for the trip data is 11 gigabytes, the 7z archive is 3.9 gigabytes. The original zip file for the fare data is 7.7 gigabytes, the 7z archive is 1.7 gigabytes. I also tried bzip2 but it was not as efficient.

MD5 checksums of the 7z files:
f03d0a7749f44db2a8999cc592e2c828 trip_data.7z
52cf3fdfc2af2705db40fc1cd5d6b079 trip_fare.7z

I planned to simply color the “pickup” and “dropoff” locations in different colors but of course the Mapbox media machine beat me to it by a long shot. See eg https://twitter.com/enf/status/479402050497691649 and the gorgeous https://twitter.com/enf/status/479689969590472704. Stay tuned for a epileptic CartoDB Torque flicker tomorrow I guess. ;)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.