A GeoTIFF of the German 2011 census population density

tl;dr: GMT is documented for people who use it since the 80s.

update 2: You can use gmtinfo to calculate the extents, using -I- will make it output the -R parameter! xyz2grd $(gmtinfo -I- *.xyz) ...

update: xyz2grd supports GeoTIFF via its GDAL driver (now?)! For example -Gfile.tif=gd:GTiff. See eg http://gmt.soest.hawaii.edu/doc/5.2.1/grdconvert.html. So the way below is overly complicated. I do not know how to apply the advanced settings of GDAL though like compression and prediction etc.

zensus2011_berlinzensus2011_irgendwozensus2011_bremen

The Statistische Ämter des Bundes und der Länder offer a 100 meter grid of Germany’s population density: csv_Bevoelkerung_100m_Gitter.zip (110MB). Datensatzbeschreibung_Bevoelkerung_100m_Gitter.xlsx provides additional information.

Let’s turn that dataset into a GeoTIFF so we can use it in our GIS. We will use free and open-source tools from GMT and GDAL. GDAL loves to interpolate values but our data is discrete/regular. We do not want any kind of interpolation. So xyz2grd from GMT is the best choice for turning the xyz data into a “continuous” GIS format (tell me if not).

Getting to know the data

Inside the zip is a 1.3GB file Zensus_Bevoelkerung_100m-Gitter.csv with about 36 million lines.

Gitter_ID_100m;x_mp_100m;y_mp_100m;Einwohner
100mN26840E43341;4334150;2684050;-1
100mN26840E43342;4334250;2684050;-1

100mN27407E44044;4404450;2740750;3
100mN27407E44045;4404550;2740750;31
100mN27407E44046;4404650;2740750;13
100mN27407E44047;4404750;2740750;14
100mN27407E44048;4404850;2740750;10

Datensatzbeschreibung_Bevoelkerung_100m_Gitter.xlsx says the coordinates are in ETRS89-LAEA Europe – EPSG:3035.

First we need to find out the geographic extends of the data, you could use your favourite cli tools for that, I wrote a quick .vrt file and used ogrinfo on that:

$ cat Zensus_Bevoelkerung_100m-Gitter.csv.vrt

<OGRVRTDataSource>
 <OGRVRTLayer name="Zensus_Bevoelkerung_100m-Gitter">
  <LayerSRS>EPSG:3035</LayerSRS>
  <SrcDataSource>Zensus_Bevoelkerung_100m-Gitter.csv</SrcDataSource>
  <GeometryType>wkbPoint</GeometryType>
  <GeometryField encoding="PointFromColumns" x="x_mp_100m" y="y_mp_100m" />
 </OGRVRTLayer>
</OGRVRTDataSource>

$ ogrinfo -al Zensus_Bevoelkerung_100m-Gitter.csv.vrt

INFO: Open of `Zensus_Bevoelkerung_100m-Gitter.csv.vrt’ using driver `VRT’ successful.

Layer name: Zensus_Bevoelkerung_100m-Gitter
Geometry: Point
Feature Count: 35785840
Extent: (4031350.000000, 2684050.000000) – (4672550.000000, 3551450.000000)

Creating a netcdf/grd file from the xyz data with xyz2grd

xyz2grd wants xyz, nothing else. The Gitter_ID_100m column is redundant in any case, you can calculate it yourself from the x and y fields if needed. So first let’s convert it to a “x y z” format with awk. The separator is ;

awk 'FS=";" {print $2" "$3" "$4}' Zensus_Bevoelkerung_100m-Gitter.csv > Zensus_Bevoelkerung_100m-Gitter.xyz

Now we can write our xyz2grd commandline.

We have our extends:
-R4031350/4672550/2684050/3551450

We know the spacing is 100 units (meters):
-I100

There is one header line:
-h1

And of course we know that we want a “classic” netcdf4 chunk size, whatever that means. We knew that right away, not after googling helplessly for an hour and eventually finding the hint on some mailing list. Not knowing this might have lead to QGIS only seeing NaN values for z, R’s ncdf/raster saying “Error in substr(w, 1, 3) : invalid multibyte string at ‘<89>HDF” and GDAL “0ERROR 1: nBlockYSize = 130, only 1 supported when reading bottom-up dataset”.
--IO_NC4_CHUNK_SIZE=c

The resulting commandline:
xyz2grd -Vl -R4031350/4672550/2684050/3551450 -I100 -h1 --IO_NC4_CHUNK_SIZE=c -GZensus_Bevoelkerung_100m-Gitter.cdf Zensus_Bevoelkerung_100m-Gitter.xyz

You can inspect the file with grdinfo and gdalinfo now if you want.

Creating a GeoTIFF from the netcdf/grd file

Let’s turn it into a GeoTIFF with gdal_translate. We will need a bunch of commandline parameters.

The spatial reference system is EPSG:3035, so:
-a_srs EPSG:3035

Values of -1 mean “no data”:
-a_nodata -1

TIFF is uncompressed by default, we want good lossless compression:
-co COMPRESS=DEFLATE

The resulting commandline:
gdal_translate -co COMPRESS=DEFLATE -a_srs EPSG:3035 -a_nodata -1 Zensus_Bevoelkerung_100m-Gitter.cdf Zensus_Bevoelkerung_100m-Gitter.tif

The resulting file is about 8 Megabytes and should work in any reasonable GIS. Have fun!

Downloads

http://hannes.enjoys.it/opendata/Zensus_Bevoelkerung_100m-Gitter.tif
TODO: What license is this now?

zensus2011_hamburg
zensus2011_altersheimzensus2011_schrebergärten

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.