{"id":193,"date":"2014-06-20T14:20:58","date_gmt":"2014-06-20T14:20:58","guid":{"rendered":"http:\/\/hannes.enjoys.it\/blog\/?p=193"},"modified":"2014-06-29T20:11:04","modified_gmt":"2014-06-29T20:11:04","slug":"nyc-taxi","status":"publish","type":"post","link":"https:\/\/hannes.enjoys.it\/blog\/2014\/06\/nyc-taxi\/","title":{"rendered":"Taxi trips in NYC"},"content":{"rendered":"<p><a href=\"http:\/\/chriswhong.com\/open-data\/foil_nyc_taxi\/\">http:\/\/chriswhong.com\/open-data\/foil_nyc_taxi\/<\/a> released a great dataset on taxi trips in NYC which he got through FOIL. The files are distributed rather inefficiently and without checksums so here is what I felt was missing.<\/p>\n<p>I mirrored the files at <a href=\"https:\/\/archive.org\/details\/nycTaxiTripData2013\">https:\/\/archive.org\/details\/nycTaxiTripData2013<\/a> and added much smaller 7z files.<\/p>\n<p>MD5 checksums of the original zip files:<br \/>\n<code>16d7ea9735fc8806f2cba51e95f96c4b  trip_data_1.csv.zip<br \/>\n5933a699bf289ec53de83970fcb7b4f6  trip_data_2.csv.zip<br \/>\ndaed88fc85b71f4d36fe798edc938223  trip_data_3.csv.zip<br \/>\na7d1a1ec83c95686d22e06b8668f938c  trip_data_4.csv.zip<br \/>\n1421686e977907ab5566766660783742  trip_data_5.csv.zip<br \/>\nc5a6c614decd607f38686ea3a7f8a429  trip_data_6.csv.zip<br \/>\n73eebfd6dc7906f2bc758a4d4859ff9b  trip_data_7.csv.zip<br \/>\n1ff1ffc336eb6517bc44cce25498eae6  trip_data_8.csv.zip<br \/>\n6eebd8c46ed66ad73b59f3f1445b8200  trip_data_9.csv.zip<br \/>\n052d12e2f9a825563b20e748da257845  trip_data_10.csv.zip<br \/>\n3df0f8292ee92bb0c96145b6f9069c3d  trip_data_11.csv.zip<br \/>\n1c5a3a9353e7192a4cf32273bbd1458a  trip_data_12.csv.zip<br \/>\na3b8a092f9062c0431ea40031f2faf03  trip_fare_1.csv.zip<br \/>\nb1fe72a3bbd884e58618657c8150b179  trip_fare_2.csv.zip<br \/>\nd44fde041b643b05da799f3f60880690  trip_fare_3.csv.zip<br \/>\n8566bbb0084044139ac5ff125bc8c45a  trip_fare_4.csv.zip<br \/>\ncebe886d4f8e6ef4a02c0453ade714d6  trip_fare_5.csv.zip<br \/>\nb39c3de4825a2e0199771d3499133773  trip_fare_6.csv.zip<br \/>\nb6660dbf87138bc1a03f78c892d2f150  trip_fare_7.csv.zip<br \/>\ne457df4423e910968e9cb0c437e89390  trip_fare_8.csv.zip<br \/>\nf300ff601fdfb8b08fce7165c598029d  trip_fare_9.csv.zip<br \/>\n1f54ecba09415ab62ad4c2e2dbed8122  trip_fare_10.csv.zip<br \/>\n505e5b5da25abb0b30de4ab548b22714  trip_fare_11.csv.zip<br \/>\naf54f05a09540685e143f0f362f167ba  trip_fare_12.csv.zip<\/code><\/p>\n<p>MD5 checksums of the extracted files:<br \/>\n<code>cb4cbb58fd0a679c2cc8b54e6b122752  trip_data_1.csv<br \/>\n6a23454e051bc791b5e9191e231623c6  trip_data_2.csv<br \/>\nc56b41f57cdeebf3f048bbd5b2cb0b22  trip_data_3.csv<br \/>\nd8d58033dfaeaaac4d9b0c4c2db65392  trip_data_4.csv<br \/>\nf3760655c5a86660e3da7689ab1a4d36  trip_data_5.csv<br \/>\n6ead7e108720ef7e5d42401e2e24446a  trip_data_6.csv<br \/>\n5029e89ca6a4b9e1c5f7e41f7d9be7f7  trip_data_7.csv<br \/>\n0ba05fc2d13d1c565dc855a335204e59  trip_data_8.csv<br \/>\n266e326535f704a43c4d27f599599a3a  trip_data_9.csv<br \/>\n484062a41cdf77a560ac22689e178dd9  trip_data_10.csv<br \/>\n82d1871807bfa0317dc6a655fa2e0e60  trip_data_11.csv<br \/>\n622bf18954cfa28c8dad4275163d437c  trip_data_12.csv<br \/>\n8de2725ae9ebd0716c79a00cd7152f75  trip_fare_1.csv<br \/>\ne7a7b8c68dc752e1af1aa1338f6300e7  trip_fare_2.csv<br \/>\nca76bfdf5216db38c2a632ed55b88a51  trip_fare_3.csv<br \/>\nc52e5ac23011c6e10ffa22601782a025  trip_fare_4.csv<br \/>\nf260ff7a0c97d023ff74a35ee21ee74c  trip_fare_5.csv<br \/>\na0080a3d6003aa1b67bea5efc0377c84  trip_fare_6.csv<br \/>\n588cce29ab1ff422770dd45c07afabad  trip_fare_7.csv<br \/>\nad3cc028b12dab8b20fbffcd75523db5  trip_fare_8.csv<br \/>\naf53ca5f2a6c517a3066ee8dadeb72b7  trip_fare_9.csv<br \/>\nc7c57109241825128781eb3b9968c689  trip_fare_10.csv<br \/>\ne9ce76ebbe19ce786e2cc378fe97bbb6  trip_fare_11.csv<br \/>\n1b6579fc2dad108ac27a1ce1b6c6d9b6  trip_fare_12.csv<\/code><\/p>\n<p>The original release was csv files inside zip files inside a zip file. I extracted them and used the free <a href=\"http:\/\/7-zip.org\/\">7-zip<\/a> to compress. The results are much smaller. The original zip file for the trip data is 11 gigabytes, the 7z archive is <strong>3.9 gigabytes<\/strong>. The original zip file for the fare data is 7.7 gigabytes, the 7z archive is <strong>1.7 gigabytes<\/strong>. I also tried bzip2 but it was not as efficient.<\/p>\n<p>MD5 checksums of the 7z files:<br \/>\n<code>f03d0a7749f44db2a8999cc592e2c828  trip_data.7z<br \/>\n52cf3fdfc2af2705db40fc1cd5d6b079  trip_fare.7z<\/code><\/p>\n<p>I planned to simply color the &#8220;pickup&#8221; and &#8220;dropoff&#8221; locations in different colors but of course the Mapbox media machine beat me to it by a long shot. See eg <a href=\"https:\/\/twitter.com\/enf\/status\/479402050497691649\">https:\/\/twitter.com\/enf\/status\/479402050497691649<\/a> and the gorgeous <a href=\"https:\/\/twitter.com\/enf\/status\/479689969590472704\">https:\/\/twitter.com\/enf\/status\/479689969590472704<\/a>. Stay tuned for a epileptic CartoDB Torque flicker tomorrow I guess. ;)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>http:\/\/chriswhong.com\/open-data\/foil_nyc_taxi\/ released a great dataset on taxi trips in NYC which he got through FOIL. The files are distributed rather inefficiently and without checksums so here is what I felt was missing. I mirrored the files at https:\/\/archive.org\/details\/nycTaxiTripData2013 and added much smaller 7z files. MD5 checksums of the original zip files: 16d7ea9735fc8806f2cba51e95f96c4b trip_data_1.csv.zip 5933a699bf289ec53de83970fcb7b4f6 trip_data_2.csv.zip [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7,10],"tags":[],"class_list":["post-193","post","type-post","status-publish","format-standard","hentry","category-archive-org","category-open-data"],"_links":{"self":[{"href":"https:\/\/hannes.enjoys.it\/blog\/wp-json\/wp\/v2\/posts\/193","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hannes.enjoys.it\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hannes.enjoys.it\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hannes.enjoys.it\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/hannes.enjoys.it\/blog\/wp-json\/wp\/v2\/comments?post=193"}],"version-history":[{"count":13,"href":"https:\/\/hannes.enjoys.it\/blog\/wp-json\/wp\/v2\/posts\/193\/revisions"}],"predecessor-version":[{"id":208,"href":"https:\/\/hannes.enjoys.it\/blog\/wp-json\/wp\/v2\/posts\/193\/revisions\/208"}],"wp:attachment":[{"href":"https:\/\/hannes.enjoys.it\/blog\/wp-json\/wp\/v2\/media?parent=193"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hannes.enjoys.it\/blog\/wp-json\/wp\/v2\/categories?post=193"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hannes.enjoys.it\/blog\/wp-json\/wp\/v2\/tags?post=193"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}