Fancy Free File Formats For Hamburg’s Open Geodata

While this is about data of the city of Hamburg, Germany, I decided to post in English as GeoPackage Propaganda should be accessible. ;)

I am casually working on converting open geodata released via the Transparenzportal Hamburg to more usable GeoPackages, GeoTIFFs and similar formats with free and open-source tools like GDAL and GMT where possible. This includes things like orthophotos, ALKIS, addresses, DEM, districts etc. You can get a list of most available source data here but there are some datasets “hidden” in other categories as well. The data is usually released in GML or as gridded files (e.g. JPEG or XYZ tiles/files). While this is pretty much perfect as source formats, working with them is cumbersome. My goal is to make this data more accessible for anyone in tools like QGIS.

For now you can find the 20cm orthophotos for 2013-2015 and the 1m DEM in https://www.datenatlas.de/geodata/public/hamburg/. Mind the licenses, see the readme file for a bit of info. More to come, I want to plan the pipeline a bit better first though. There should be a full script & log from source to GeoPackage for each file.

I will also provide mirrors of the source files. If you want to collaborate, please contact me. Apart from Hamburg I will also add free/open datasets for the whole of Germany, things related to (nearby) bathymetry and some global ones. If you want Shapefiles, ECW, MrSID or similar, you can pay me for converting.

A bad map about gender differences in literacy

Wrote this in 2014, not sure why I did not publish it. It was a response to this bad map.

world bank 2011 female vs male literacy

No need for expensive software, you can use the free and open-source QGIS for this: https://qgis.org/

1. Install QGIS

2. Download and unzip the data http://databank.worldbank.org/data/download/WDI_excel.zip (not sure what license, they want attribution “World Development Indicators, The World Bank”)

3. Download and unzip country geometries http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/50m/cultural/ne_50m_admin_0_countries.zip (public domain but be nice and add attribution “Geometries from Natural Earth”)

4. Open QGIS, Layer -> Add Vector Layer -> choose ne_50m_admin_0_countries.shp

5. Unfortunately the csv is not simple, it has more than one row per country as it includes time series. And it does not have the value we want to map precalculated.

Afghanistan,AFG,"Literacy rate, adult female (% of females ages 15 and above)",SE.ADT.LITR.FE.ZS,,,,,,,,,,,,,,,,,,,,4.98746100000000E+00,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Afghanistan,AFG,"Literacy rate, adult male (% of males ages 15 and above)",SE.ADT.LITR.MA.ZS,,,,,,,,,,,,,,,,,,,,3.03077500000000E+01,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Afghanistan,AFG,"Literacy rate, adult total (% of people ages 15 and above)",SE.ADT.LITR.ZS,,,,,,,,,,,,,,,,,,,,1.81576800000000E+01,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Afghanistan,AFG,"Literacy rate, youth female (% of females ages 15-24)",SE.ADT.1524.LT.FE.ZS,,,,,,,,,,,,,,,,,,,,1.11428000000000E+01,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Afghanistan,AFG,"Literacy rate, youth male (% of males ages 15-24)",SE.ADT.1524.LT.MA.ZS,,,,,,,,,,,,,,,,,,,,4.57960200000000E+01,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Afghanistan,AFG,"Literacy rate, youth total (% of people ages 15-24)",SE.ADT.1524.LT.ZS,,,,,,,,,,,,,,,,,,,,3.00663500000000E+01,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

This step is probably the hardest. I will use some Unix tools as I am used to them and they work well. Sorry! You can probably do this with a good texteditor or spreadsheet application as well.

We have “csvcut -c 2 WDI_Data.csv | uniq | wc -l” -> 253 -> 252 country codes (without the header). We have 6 lines per country for the literacy data. We should have at max 252 unique values per field then. TheZeitgeist only used data from 2011.

To make it more convenient to work, I first split off the Literacy data into a new file with

head -n 1 WDI_Data.csv > Literacy.csv; grep Literacy WDI_Data.csv >> Literacy.csv

No idea if TheZeitgeist mixed Adult and Youth, let’s just use the Adult data for now.

head -n 1 WDI_Data.csv > Literacy_adult.csv; grep -E "Literacy rate, adult (fe)*male" Literacy.csv >> Literacy_adult.csv

Next let’s isolate the data for 2011. csvcut seems to have a bug with numerically named columns so we have to use the field’s index (56) instead of its name “2011”.

csvcut -c 2,3,56 Literacy_adult.csv > Literacy_adult_2011.csv

We need to get the data into one line per country, I am lazy so:

grep "Literacy rate, adult female" Literacy_adult_2011.csv > Literacy_adult_2011_female.csv
grep "Literacy rate, adult male" Literacy_adult_2011.csv | sed 's/.*15 and above)",/,/' > Literacy_adult_2011_male.csv
echo "Country Name,Country Code, Literacy Female, Literacy Male" > Literacy_adult_2011_oneline.csv; paste -d "" Literacy_adult_2011_female.csv Literacy_adult_2011_male.csv >> Literacy_adult_2011_oneline.csv

Enough of that commandline mumbojumbo! QGIS time!

Natural Earth has a column named “wb_a3” which is the WB 3 letter country codes, yay!

toreal("Literacy_adult_2011_oneline_Literacy Female") - toreal("Literacy_adult_2011_oneline_Literacy Male")

Figure out the rest yourself. This is where I apparently lost interest in writing back then. ;)
—–

Now make the map better by choosing a projection that does not make Greenland as big as Africa. Also, I would try adding another “attribute” to the display, change the alpha value depending on the absolute literacy.

And finally realise that a map is not a good visualisation because you cannot see the values of tiny countries. Make a bar chart instead. ;)

world bank 2011 female vs male literacy plus bar

e-foto (free GNU/GPL educational digital photogrammetric workstation) on Archlinux

I wrote this a year ago and meant to write more. Turns out I did not but it still works. You might need to figure out dependencies yourself. Here you go:

e-foto is a free GNU/GPL educational digital photogrammetric workstation in active development.

svn checkout https://svn.code.sf.net/p/e-foto/code/trunk e-foto-code
cd e-foto-code/c
wget http://download.osgeo.org/shapelib/shapelib-1.3.0.tar.gz
tar xfv shapelib-1.3.0.tar.gz
mv shapelib-1.3.0 shapelib
cd ..
mkdir build
cd build
qmake-qt4 ../e-foto.pro 
make

Ready to run in

../bin/e-foto

A simple and hacky column chart in HTML & CSS

This is a draft from two years ago, I just thought I would publish it in case it is useful for anyone. I don’t remember anything about it, not even what it was for. Feel free to use it in any way you like.

It looks like this:

htmlcsscolumnchart

#histogramcontainer {
height: 20%;
width: 10%;
background-color:#deebf7;
position:relative; /* To make the children's percentages relative */
 
/* setting position="absolute" bottom="0" on the histodivs makes them all overlay each other horizontally */
 
/* http://stackoverflow.com/questions/2147303/how-can-i-send-an-inner-div-to-the-bottom-of-its-parent-div */
-moz-transform:rotate(180deg);-webkit-transform:rotate(180deg); -ms-transform:rotate(180deg);
 
}
 
div.histogrambar {
float:left;
width:19%;
background-color:#3182bd;
}
 
div#histogrambar_a {height: 20%; margin-left: 1%;}
div#histogrambar_b {height: 10%; margin-left: 1%;}
div#histogrambar_c {height: 40%; margin-left: 1%;}
div#histogrambar_d {height: 100%; margin-left: 1%;}
div#histogrambar_e {height: 30%;}
<div id="histogramcontainer">
<div class="histogrambar" id="histogrambar_e"></div>
<div class="histogrambar" id="histogrambar_d"></div>
<div class="histogrambar" id="histogrambar_c"></div>
<div class="histogrambar" id="histogrambar_b"></div>
<div class="histogrambar" id="histogrambar_a"></div>
</div>

ffmpeg on raspbian / Raspberry Pi

Since http://www.jeffreythompson.org/blog/2014/11/13/installing-ffmpeg-for-raspberry-pi/ is a bit messy, here is how you can compile ffmpeg with x264 on raspbian. Changes are building in your home directory, getting just a shallow git clone and building with all CPU cores. Also no unnecessary sudo…

Read the comments below!

# In a directory of your choosing (I used ~/ffmpeg):

# build and install x264
git clone --depth 1 git://git.videolan.org/x264
cd x264
./configure --host=arm-unknown-linux-gnueabi --enable-static --disable-opencl
make -j 4
sudo make install
 
# build and make ffmpeg
git clone --depth=1 git://source.ffmpeg.org/ffmpeg.git
cd ffmpeg
./configure --arch=armel --target-os=linux --enable-gpl --enable-libx264 --enable-nonfree
make -j4
sudo make install

Read the comments below!

Hopefully someone, somewhere will provide a repository for this kind of stuff some day.

It takes just 25 minutes on a Raspberry Pi 3. Not hours or days like some old internet sources on old Raspis say.

In case you are wondering v4l2 should work with this.

WFS-T on Geoserver

If you encounter an error like
ERROR 1: Error returned by server :

java.lang.IllegalArgumentException: argument type mismatch
argument type mismatch


you probably have a mismatch between WFS versions. Try changing your WFS url to a specific version. Using http://www.example.com/geoserver/sf/ows?strict=true&version=1.0.0 worked for me after I spent hours with that stupid error…

If you add features to your store but nothing new appears on the map, check if the features actually end up with geometries on your server… Eg if you try to add LineStrings to a MultiLineString layer via Geoserver WFS-T you will get objects without geometries. This happens silently, at least when using the osgeo ogr module in Python.

If every request takes 10 seconds your server might time out trying to parse your schema. Check your geoserver logs for things like
2016-03-10 15:48:00,486 WARN [geotools.xml] - Error parsing: http://123.123.123.123.:8080/geoserver/ows?strict=true&VERSION=1.0.0&SERVICE=WFS&REQUEST=DescribeFeatureType&TYPENAME=my:layer

If you know how to find out how to debug this please tell me:
2016-03-10 15:48:00,501 ERROR [geoserver.wfs] - Transaction failed
org.geoserver.wfs.WFSTransactionException: Error performing insert: null
at org.geoserver.wfs.InsertElementHandler.execute(InsertElementHandler.java:215)
at org.geoserver.wfs.Transaction.execute(Transaction.java:322)
(... a million more steps truncated...)

Properly splitting a file at specific intervals with ffmpeg

Ever wanted to split a media file (video, audio, both) into segments of 10 minutes or something like that? The internet is full of terrible hacks and shitty Stack Overflow answers for this. So here is how you easily, properly split a file into same-length segments with ffmpeg.

-f segment -segment_time SECONDS fileprefix%04d.ext

Done. segment_time takes seconds as argument. For example:

ffmpeg -i recording.opus -c:a libvorbis -f segment -segment_time 3600 recording_%04d.ogg

or

ffmpeg -i huge.wav -c:a copy -f segment -segment_time 600 huge-%04d.flac

Not so hard, is it?

Let’s Encrypt with Lighttpd

Save the page https://gethttpsforfree.com/ locally and use a clean browser to open the HTML file.
Create your keys and everything.
Cat domain.key and chained.pem into a new pemfile.
Grab the Let’s Encrypt Authority X1 (IdenTrust cross-signed) pemfile from https://letsencrypt.org/certificates/
Put those two files in a safe place on your server.
Don’t forget to set proper permissions on the files.
ssl.ca-file = "/path/to/lets-encrypt-x1-cross-signed.pem" # the Let's Encrypt certificate
ssl.pemfile = "/path/to/pemfile.pem" # your pemfile

Restart lighttpd.
Done.

Oakland: All license plate reader data (ALPR)

WTF!

https://data.oaklandnet.com/browse?q=alpr via https://news.ycombinator.com/item?id=10642139

817159 timestamped car plate locations.

oakland scans
oakland selected scans

Pages:

https://data.oaklandnet.com/Public-Safety/All-License-Plate-Reader-Data-ALPR-September-23-20/6dab-n9nd
https://data.oaklandnet.com/Public-Safety/All-license-plate-reader-data-ALPR-04-01-2014-thru/k76g-27ne
https://data.oaklandnet.com/Public-Safety/All-license-plate-reader-data-ALPR-3-1-2013-4-1-20/m64r-jeei
https://data.oaklandnet.com/Public-Safety/All-license-plate-reader-data-ALPR-3-19-2014-3-31-/wy2w-ue82
https://data.oaklandnet.com/Public-Safety/All-license-plate-reader-data-ALPR-4-1-14-thru-5-3/7axi-hi5i
https://data.oaklandnet.com/Public-Safety/All-license-plate-reader-data-ALPR-5-13-2012-7-3-2/f28j-9q95
https://data.oaklandnet.com/Public-Safety/All-license-plate-reader-data-ALPR-7-6-2011-12-19-/7xz6-yzxz
https://data.oaklandnet.com/Public-Safety/All-license-plate-reader-data-ALPR-8-6-2012-8-12-2/gyu7-qpwz
https://data.oaklandnet.com/Public-Safety/All-license-plate-reader-data-ALPR-8-26-2012-9-19-/bd9f-4pn8
https://data.oaklandnet.com/Public-Safety/All-license-plate-reader-data-ALPR-9-15-2012/vwcs-329i” clas
https://data.oaklandnet.com/Public-Safety/All-license-plate-reader-data-ALPR-10-20-2012-11-2/cyhz-jk8v
https://data.oaklandnet.com/Public-Safety/All-license-plate-reader-data-ALPR-11-20-2012-12-3/jxx3-67d2
https://data.oaklandnet.com/Public-Safety/All-license-plate-reader-data-ALPR-12-19-2013-1-31/h4qp-hsyy
https://data.oaklandnet.com/Public-Safety/All-license-plate-reader-data-ALPR-12-23-2010-thru/t7dd-x7dh

grab.sh:

ids=”6dab-n9nd
7axi-hi5i
7xz6-yzxz
bd9f-4pn8
cyhz-jk8v
f28j-9q95
gyu7-qpwz
h4qp-hsyy
jxx3-67d2
k76g-27ne
m64r-jeei
t7dd-x7dh
vwcs-329i
wy2w-ue82″

for id in ${ids}
do
echo “Grabbing ${id}”
wget -a wget.log -x –content-disposition “https://data.oaklandnet.com/api/views/${id}/rows.csv?accessType=DOWNLOAD”
# screw excel wget -a wget.log -x –content-disposition “https://data.oaklandnet.com/api/views/${id}/rows.csv?accessType=DOWNLOAD&bom=true”
wget -a wget.log -x –content-disposition “https://data.oaklandnet.com/api/views/${id}/rows.json?accessType=DOWNLOAD”
wget -a wget.log -x –content-disposition “https://data.oaklandnet.com/api/views/${id}/rows.pdf?accessType=DOWNLOAD”
wget -a wget.log -x –content-disposition “https://data.oaklandnet.com/api/views/${id}/rows.rdf?accessType=DOWNLOAD”
wget -a wget.log -x –content-disposition “https://data.oaklandnet.com/api/views/${id}/rows.rss?accessType=DOWNLOAD”
wget -a wget.log -x –content-disposition “https://data.oaklandnet.com/api/views/${id}/rows.xls?accessType=DOWNLOAD”
wget -a wget.log -x –content-disposition “https://data.oaklandnet.com/api/views/${id}/rows.xlsx?accessType=DOWNLOAD”
wget -a wget.log -x –content-disposition “https://data.oaklandnet.com/api/views/${id}/rows.xml?accessType=DOWNLOAD”
wget -a wget.log -x –content-disposition “https://data.oaklandnet.com/api/views/${id}/rows.xml?accessType=DOWNLOAD”
done

exit

Be aware that only the CSV format is guaranteed to have all records. At least that’s what some files say.

$ sed ‘s#,.*##g’ *.csv | sort | uniq | wc -l
817159

sed ‘s#,.*##g’ *.csv | sort | uniq -c | sort -h | tail
grep -h ‘^PLATENUMBER,’ *.csv | sed ‘s#,”(#,#g’ | sed ‘s#)”##’
grep -hEo “\(37.*\)\”” *.csv | sed ‘s#[()” ]*##g’

Happy stalking :(

Wie man eine bivariate Farbskala nicht erstellen sollte

Ich hatte diese Kritik im Rahmen des (wahnsinnig tollen) Daten-Labors 2015 nebenbei geäußert und dann aufgrund des Interesses versprochen meine Gedanken aufzuschreiben. Hier sind sie nun endlich.

Geld zieht Ärzte an, so titelte die Zeit Online vor einigen Monaten über einer Recherche zum Verhältnis der räumlichen Verteilung von Ärzten im Vergleich mit verschiedenen demographischen Faktoren. Integraler Bestandteil des Artikels sind komplexe Karten und Diagramme. Die Redakteure versuchten sich an der Verwendung einer bivariaten Klassen-/Farb-Skala, doch leider ging die Wahl der Farben daneben, so dass das Endprodukt ineffektiv und irreführend ist. Es geht mir hier ausschließlich um die kartografische Darstellung. Zum Inhalt und der Datenanalyse kann ich nichts sagen!

zeit karte
karten-prinzip
So funktionieren die Karten: Grau steht für Privatpatienten, Grün für Ärzte. Zu den drei Helligkeitsstufen (je dunkler, desto höher der Anteil der Privatversicherten) kommt die Farbe dazu (je intensiver, desto mehr Ärzte pro Einwohner) So ergeben sich neun verschiedene Werte für die Einfärbung der Karten.
Quelle: http://www.zeit.de/feature/gesundheit-arzt-privat-versicherung-praxis

In einer bivariaten Skala wird das Verhältnis zweier Variablen zueinander/miteinander in vollem Detail dargestellt. Anstelle einer einzelnen Verhältniszahl sind hier mehrere Achsen im Gebrauch und damit die einzelnen Werte der Variablen nachvollziehbar. Solche Skalen sind in der Kartographie an sich nichts neues, werden allerdings (aufgrund der Komplexität meiner Meinung nach zu Recht) eher selten verwendet. Im Frühjahr 2015 veröffentlichte Joshua Stevens einen fantastischen Artikel, dessen Lektüre ich vor dem Weiterlesen sehr empfehle.

Joshua zeigt dort, wie aus den jeweiligen Farbskalen der beiden Attribute eine gemischte “Matrix” entsteht. Die Diagonale wird hierbei zu einem neuen sequenziellen Farbverlauf, der das neutrale Verhältnis der Variablen anzeigt. Die Farbskalen müssen dementsprechend mit Bedacht gewählt werden, so dass sich bei ihrer Vermischung eine sinnvolle, geordnete und “eigenständige” Skala entsteht.

js_bivariateMix
js_bivariatePaths
Quelle: http://www.joshuastevens.net/cartography/make-a-bivariate-choropleth-map/

In Joshuas Beispiel sind (relativ) klar differenzierbar und identifizierbare Achsen entstanden, die dem Kartenbetrachter (mit etwas Anstrengung) ermöglichen, die Karte korrekt zu interpretieren. Man kann anhand der Farbe das jeweilige Verhältnis und die absoluten Werte lesen. Die Farbachsen sind intuitiv korrekt sortierbar.

Wie sieht es mit dem Farbschema der Zeit aus? Leider nicht gut.

Die Redakteure wählten für die eine Variable einen Farbverlauf von Grau nach Grün, für die andere einen von Grau nach Dunkelgrau (siehe oben). Die diagonale Farbskala entsteht also aus der Vermischung von Grün und Grau. Was passiert, wenn man Grün und Grau mischt? Man bekommt Farbtönen zwischen Grün und Grau… Die Farben auf der Diagonalen werden also sehr ähnlich zu zumindest einer der Hauptachsen. Damit zeigen sich Farben im Kartenbild, deren Ordnung der Betrachter unmöglich intuitiv und auch mithilfe der Legende kaum mental durchführen kann. Und genau das können wir hier sehen:

zeit karte exploded

Als kleine Demonstration wieviele Details und Strukturen tatsächlich in den Daten stecken, habe ich einfach mal eine bivariate Farbskala von Cynthia Brewer auf die Daten geworfen. Achtung: Ich habe die Klassen nicht genau so legen können (Faulheit), wie sie in der Ursprungskarte vorliegen! Grundsätzlich dürfte die Aussage der Karte aber stimmen. Die Ästhetik steht erstmal an zweiter Stelle. ;)

1iNCPHg

zeit karte vs