Category Archives: open data

How to build your own aerial image Twitter bot

About a year ago I built a small Twitter bot that posts an aerial image of Hamburg, Germany every day.

As I will shut it down (no reason, just decluttering) here’s how it works so you can run your own:

  1. Have aerial images with a permissive license
  2. Register a Twitter account and set it up for posting via API access
  3. Write code to pick an image and post it

Done!

Okok, I am kidding.

I used https://suche.transparenz.hamburg.de/dataset/digitale-orthophotos-belaubt-hamburg3 as data source because they have a CC-BY like license, allowing such a project without any legal complications. As a side benefit these images are already provided in tiles: There is not one big image of the whole region but 1km² image tiles. Perfect for random selection, quick resizing and posting.

Registering a Twitter account and setting it up is a privacy nightmare and, being a good human being as you are, you ought hate any part of it. I followed the Twython tutorial for OAuth1 in a live Python interpreter session which was fairly painless. If you do not want to link a phone number to your Twitter account (see above for how you should feel about that), this approach worked for me in the past but I bet they use regional profiling or worse so your luck might differ.

Next you need some code to do the work for you automatically. Here is what I wrote with PIL for the image processing and Twython for posting:

import glob
import random
from io import BytesIO

from PIL import Image
from twython import Twython

## Initialise Twitter
# use a Python interpreter to follow the stops on 
# https://twython.readthedocs.io/en/latest/usage/starting_out.html#oauth-1-user-authentication 
# don't rush, there are some intermediate keys iirc...
APP_KEY = '1234567890ABCDEFGHIJKLMNO'
APP_SECRET = '12345678901234567890123456F8U0CAKATAWAIATATAEARAAA'
OAUTH_TOKEN = '1234567890123456789123456F8U0CAKATAWAIATATAEARAAAA'
OAUTH_TOKEN_SECRET = '123456789012345678901234567890FFSFFSFFSFFSFFS'
twitter = Twython(APP_KEY, APP_SECRET, OAUTH_TOKEN, OAUTH_TOKEN_SECRET)

## Prepare the image
# via https://twython.readthedocs.io/en/latest/usage/advanced_usage.html#posting-a-status-with-an-editing-image

# Pick the image
jpegs = glob.glob("DOP20_HH_sommerbefliegung_2019.zip/*.jpg")
todays_image = random.choice(jpegs)
print(f"Today we will post {todays_image}")
photo = Image.open(todays_image)

# Resize the image
basewidth = 1000
wpercent = (basewidth / float(photo.size[0]))
height = int((float(photo.size[1]) * float(wpercent)))
resized_photo = photo.resize((basewidth, height), Image.ANTIALIAS)

# "Save" the resulting image in temporary memory
stream = BytesIO()
resized_photo.save(stream, format='JPEG')
stream.seek(0)

## We've got what we need, let's tweet
license = "dl-de/by-2-0 (Freie und Hansestadt Hamburg, Landesbetrieb Geoinformation und Vermessung)"
tweet = f"Das ist #IrgendwoInHH, aber wo denn nur?\n\n#codeforhamburg\n\nBild: {license}"

response = twitter.upload_media(media=stream)
twitter.update_status(status=tweet, media_ids=[response['media_id']])

First we initialize that Twython thingie, then we pick a random image from a directory of .jpg files, then we resize it to a maximum of 1000 pixels (you can simplify that if your images are square…) and finally we post it to Twitter.

Set up a cronjob or systemd timer or alarm clock to run the script as often as you like and that’s it.

Digitales Oberflächenmodell von Hamburg

Ach nee, der LGV hat ein bildbasiertes DOM von Hamburg im Transparenzportal veröffentlicht… Und sogar gleich zwei, eins von 2018, eins von 2020. Die Rasterweite ist 1 Meter, ob das wohl so vorlag oder für die Veröffentlichung gefiltert wurde?

Egal. Das ist ja großartig! Da werden eine Menge von Anwendungen ermöglicht (Sichtachsen! Verschattungen! Vermaschung! VR! AR!) und verschiedenste Akteure werden die Daten absolut feiern. Auch wenn es mit 1 Meter Auflösung wirklich mies grob ist, auf ein 1 Meter Gitter gerastert ist (nicht ausgedünnt, d. h. es ist teilweise stärker verfälscht und “daneben”) und “nur” bildbasiert (nicht gescannt) ist, geht da schon einiges mit.

Ausprobieren! Im Browser!

Achtung, frickelige Bedienung! Am besten den WASD-Möwen-Modus nutzen, mit Speed 1000. Oder mit einem Doppelklick irgendwo hinzoomen.

https://hamburg.datenatlas.de/DOM1_XYZ_HH_2018_04_30-colored.html
https://hamburg.datenatlas.de/DOM1_XYZ_HH_2020_04_30-colored.html

Datenaufbereitung als LAZ

Für 2018 liegen die Daten als 12768 einzelne XYZ-Kacheln vor, also als super ineffiziente Textdateien. Insgesamt sind es rund 22 Gigabyte. Für 2020 sind es stattdessen 827 größere Kacheln, aber ebenfalls in XYZ mit einem ähnlichem Platzbedarf.

Schönerweise gibt es freie Tools wie txt2las, was sie schnell und einfach ins super effiziente LAZ-Format umwandeln kann:

txt2las -i DOM1_XYZ_HH_2018_04_30/*.xyz -epsg 25832 -odir /tmp/laz/ -olaz

Hat bei mir ungefähr 5 Minuten gebraucht und da waren es nur noch 700 Megabyte. Das ZIP war übrigens mehr als 3 Gigabyte groß.

Zusammengefasst werden können die einzelnen Dateien mit lasmerge:

lasmerge -i /tmp/laz/*.laz -olaz -o DOM1_XYZ_HH_2018_04_30.laz

Interaktive 3D-Webanwendung

Dann noch schnell in den großartigen PotreeConverter von Markus Schütz geschmissen mit

PotreeConverter DOM1_XYZ_HH_2018_04_30.laz -o web/ --encoding BROTLI \
  --generate-page DOM1_XYZ_HH_2018_04_30 --title DOM1_XYZ_HH_2018_04_30

und 4 Minuten später ist die interaktive 3D-Webanwendung fertig (siehe unten), wegen der zusätzlichen Octree-Struktur jetzt bei rund 3 Gigabyte.

Punktwolke mit Farben aus Orthophoto einfärben

Die bereitgestellten Oberflächenmodelle sind so schlicht wie es nur geht, es sind reine XYZ-Daten ohne weitere Dimensionen wie Farbe o. ä.

Glücklicherweise gibt es ja auch die Orthophotos, eventuell wurde sogar dasselbe Bildmaterial genutzt? Da müsste mal jemand durch den Datenwust wühlen, die bei den DOPs werden die relevanten Metadaten nicht mitgeliefert…

Theoretisch könnte man sie also einfärben. Leider ist lascolor proprietär und kommt mit gruseligen, bösartigen Optionen, wenn man es wagt es “unlizenziert” zu nutzen (“Please note that the unlicensed version will (…) slightly change the LAS point order, and randomly add a tiny bit of white noise to the points coordinates once you exceed a certain number of points in the input file.”) und kann JPEG in GeoTIFF nicht lesen (so hab ich mir die DOPs aufbereitet). Eine Alternative ist das geniale PDAL. Mit einer Pipeline wie

{
    "pipeline": [
        "DOM1_XYZ_HH_2020_04_30.laz",
        {
            "type": "filters.colorization",
            "raster": "DOP20_HH_fruehjahrsbefliegung_2020.tif"
        },
        {
            "type": "writers.las",
            "compression": "true",
            "minor_version": "2",
            "dataformat_id": "3",
            "filename":"DOM1_XYZ_HH_2020_04_30-colored.laz"
        }
    ]
}

und

pdal pipeline DOM1_XYZ_HH_2020_04_30.laz+DOP20_HH_fruehjahrsbefliegung_2020_90.cog.tif.json

ist die Punktwolke innerhalb von Minuten coloriert und kann dann wie gehabt mit PotreeConverter in einen interaktiven 3D-Viewer gesteckt werden.

Das Ergebnis ist besser als erwartet, da es scheinbar tatsächlich die selben Bilddaten sind (für beide Jahre). Andererseits ist es auch nicht wirklich schick, da die DOPs nicht als True Orthophoto vorliegen und damit höhere Gebäude gekippt in den Bilder abgebildet sind. Sieht man hier schön am Planetarium.

DOM als GeoTIFF

Wer es lieber als GeoTIFF haben möchte, hat es etwas schwerer, denn GDAL kommt mit dieser Art von Kacheln (mit Lücken und in der bereitgestellten Sortierung) nicht gut klar. Mein Goto-Tool dafür ist GMT.

gmt xyz2grd $(gmt gmtinfo -I- *.xyz) -Vl -I1 -G/tmp/gmt.tif=gd:GTiff *.xyz

Das so erstellte GeoTIFF kann anschließend mit GDAL optimiert werden (ab GDAL 3.2.3/3.3):

gdal_translate -of COG -co COMPRESS=DEFLATE -co PREDICTOR=2 \
  --config GDAL_NUM_THREADS ALL_CPUS --config GDAL_CACHEMAX 50%  \
  -a_srs EPSG:25832 /tmp/gmt.tif DOM1_XYZ_HH_2020_04_30.tif

Hier mal im Vergleich mit dem DGM1 als Schummerungen:

Vermaschung als 3D-Modell

Leider habe ich keine gute Lösung für die 3D-Vermaschung gefunden. tin-terrain und dem2mesh kommen nicht mit so großen Datenmengen auf einmal klar und weiter hab ich nicht geschaut. Wer da was gutes weiß kann sich bei mir bei nächster Gelegenheit Kekse oder Bier abholen. ;)

Daten hinter den Bildern und den Viewern
Datenlizenz Deutschland Namensnennung 2.0, Freie und Hansestadt Hamburg, Landesbetrieb Geoinformation und Vermessung (LGV)

Interaktive Karte der Baugenehmigungen in Hamburg

Ich habe endlich mal ein über mehrere Jahre gereiftes (eher gealtertes und verfrickeltes) Projekt in einen einigermaßen vorzeigbaren Zustand gebracht: Eine interaktive Karte der Baugenehmigungen in Hamburg.

Je weniger transparent eine Fläche dargestellt ist, desto mehr Dokumente sind mit ihr verknüpft (ja, es ist ein Feature je Dokument D;). Eigentlich war die Seite anders aufgebaut, mit einem PDF-Viewer auf der rechten Seite. Aber da daten.transparenz.hamburg.de kein HTTPS kann (seid ihr auch so gespannt auf die UMTS-Auktion nächste Woche?), geht das aus Sicherheitsgründen nicht ohne ein Spiegeln der Daten oder einen Proxy.

Die Daten kommen größtenteils aus dem Transparenzportal. Für das Matching der angebenen Flurstücks-“IDs” zu den tatsächlichen Flurstücken war aber ein erheblicher Aufwand nötig. Das Drama ging bis hin zum Parsen aus PDFs, die mal so, mal so formatiert waren und natürlich auch voller Eingabefehler auf Behördenseite. Vielleicht schreibe ich da noch beizeiten mal einen Rant. TL;DR: Ohne die zugehörige Gemarkung ist mit einer Flurstücks-“ID” wie in den Daten angegeben keine räumliche Zuordnung möglich. In den veröffentlichten Daten stecken nur die Nenner der Flurstücksnummern, nicht aber die Gemarkungsnummern. Ziemlich absurd.

Das ganze ist nur ein Prototyp, vermutlich voller Fehler und fehlender Daten. Aber interessant und spaßig ist es, viel Freude also!

Es wäre noch eine MENGE zu tun, um das ganze rund zu machen. Falls du Lust hast, melde dich gerne. Es geht vom wilden Parsen, über Sonderregeln für kaputte Dokumente, zu Kartenstyling bis zur UI. Schön wäre es auch alles in einer anständigen Datenbank zu halten und nicht nur nach der räumlichen Dimension durchsuchen zu können.

#30DayMapChallenge: Day 2 – Lines

Thanks Topi.

Addresses in Hamburg: http://suche.transparenz.hamburg.de/dataset/alkis-adressen-hamburg15

Convert the housenumber from string to integer: to_int("hausnr")

Processing -> Points to Path using above housenumber as order field and strname as group field.

Color those ziggyzaggety lines as you like.

Maybe explode the lines and style each segment by its housenumber?

And put them on an interactive slippy map.

Average Earth from Space 2018

Due to popular request, I now provide a high-resolution, digital download as an experiment. Any profit from this will go towards bug-fixes in GDAL, the free and open-source software I used for this project.

I collected all the daily images of the satellite-based Soumi VIIRS sensor and calculated the per-pixel average over the whole year of 2018.

You can explore it in full resolution in an interactive webmap.

WGS84 / Equirectangular
Web Mercator

You can see fascinating patterns, e.g. “downstream” of islands in the ocean or following big rivers like the Amazon. Be wary though as snow and clouds look the same here.

It’s also fun to look at the maximum to see cloudless regions (this image is highly exaggerated and does not make too much sense anyways):

Many thanks to Joshua Stevens for motivation and infos on data availability! The initial idea was of course inspired by Charlie ‘vruba’ Lyod‘s Cloudless atlas works. I explored imagery on https://worldview.earthdata.nasa.gov/, grabbed Soumi VIIRS images via https://wiki.earthdata.nasa.gov/display/GIBS/GIBS+API+for+Developers and processed it in GDAL. Poke me repeatedly to blog about the process please.

Full resolution, georeferenced imagery (12288×6144 resp. 8192×8192) in PNG available on request.

I said average but of course it is the median. The average or mean does not make much sense to use nor does it look any good. ;-)

Lots of CityGML files to a single obj to draco with localising coordinates to 0,0,0

I used the CityGML LoD2 model of Hamburg here, http://daten-hamburg.de/geographie_geologie_geobasisdaten/3d_stadtmodell_lod2/LoD2-DE_HH_2017-12-31.zip from http://suche.transparenz.hamburg.de/dataset/3d-stadtmodell-lod2-de-hamburg3. (If you do this for Hamburg, you might want to not include Neuwerk’s tiles ;))

1. https://github.com/tudelft3d/CityGML2OBJs/

python2 CityGML2OBJs.py -i LoD2-DE_HH_2017-12-31.zip/ -o LoD2-DE_HH_2017-12-31.zip_obj/

converts each CityGML file to a obj (takes about half an hour on my system).

2. http://gfx.cs.princeton.edu/proj/trimesh2/

mesh_cat LoD2-DE_HH_2017-12-31.zip_obj/*.obj -o LoD2-DE_HH_2017-12-31.zip.obj

turns them into one single obj.

mesh_filter LoD2-DE_HH_2017-12-31.zip.obj -center -scale 0.001 -rot 90 -1 0 0 LoD2-DE_HH_2017-12-31.zip_localised.obj

translates “so center of mass is at (0,0,0)” and scales to something smaller and rotates so that z is up. trimesh2’s documentation on the rot parameter is insufficient, I think my solution means “rotate by 90 degrees -1 times around x, 0 times around y, 0 times around z” or something like that. Look into the source if that helps you.

3. https://google.github.io/draco/

draco_encoder -i LoD2-DE_HH_2017-12-31.zip_localised.obj -o LoD2-DE_HH_2017-12-31.zip_localised.obj.drc

makes it tiny (8 Megabyte).

I did not do any error checking, used a epsilon of 1 in CityGML2OBJs and just wanted to play around.

(Not) Solving #quiztime with a spatial relationship query in Overpass-Turbo

I saw a #quiztime on Twitter and it looked like a perfect geospatial puzzle so I had to give it a try: https://twitter.com/Sector035/status/987631780147679233

So we have something called “Lidl” which I know is a supermarket. And we have something else, with a potential name ending in “nden”. And that’s it.

If you have a global OSM database, this would be a simple query away. If you don’t have a global OSM database, the Overpass API let’s you query one. And overpass turbo can make it fun (not in this case though, I had a lot of trial and error until I found these nice examples in the Wiki).

I ended up looking for things within 50 meters to a Lidl named “-nden”

[out:json][timeout:180][maxsize:2000000000];

{{radius=50}}

node[name="Lidl"]->.lidls;

( 
   way(around.lidls:{{radius}})["name"~".*nden$"];
  node(around.lidls:{{radius}})["name"~".*nden$"];
);

(._;>;);  // whatever? 

Too far away from the Lidl.

But driving around the place in Google StreetView I could not find the spot and it does not look very much like the photo.

So I guess my query is fine but either the "thing" or the Lidl are not in OSM yet.

Oh well, I did learn something new (about Overpass) and it was fun. :)

One SRTMGL1 GeoTIFF to rule them all

NOTE: In up to date GDAL things might be easier AND I messed up with --config and -co below. Poke me to update it please.

So about half a year ago Lukas Martinelli asked about a global SRTM GeoTIFF. The SRTM elevation data is usually shared in many small tiles which can be ideal for some cases but annoying for others. I like downloading and processing big files so I took that as a challenge. It’s probably some mental thing. I never finished this blog post back then. It’s probably another mental thing. 8) Read on if you are interested in some GNU coreutils hackery as well as GDAL magick.

Here is how I did it. Endless thanks as usual to Even Rouault who fixed GDAL bugs and gave great hints. I learned more about GeoTIFFs and GDAL than I should have needed to know.

Downloading the source files

The source files are available at http://e4ftl01.cr.usgs.gov/SRTM/SRTMGL1.003/2000.02.11/ (warning, visiting this URL will make your browser cry and potentially render your system unresponsive). To download them you must create an NASA Earthdata Login at https://urs.earthdata.nasa.gov/users/new/. :\

********************************************************************************

                         U.S. GOVERNMENT COMPUTER

This US Government computer is for authorized users only.  By accessing this
system you are consenting to complete monitoring with no expectation of privacy.
Unauthorized access or use may subject you to disciplinary action and criminal
prosecution.

OMGOMGOMG. I am sure they would have used <blink> if accessibility guidelines allowed it. As I am not sure what their Terms of Service are, I will not give you a copy’n’paste ready line to download them all. wget or aria2 work well. You should get 14297 files with a total size of 98G.

Inspecting the ZIP files and preparing for GDAL’s vsizip

Each of those ZIP files has just a single file “.hgt” inside. GDAL specifically has support for the “SRTM HGT Format“, so we know it can read those files. We just need to extract them. (If you clicked that link you see that with GDAL 2.2 it will support directly loading the data from the ZIPs, that’s just Even being awesome.)

We don’t want to uncompress all those files just because we want to build a GeoTIFF from them later, do we? Luckily GDAL has a vsizip thingie which allows it to read files inside zip archives. Wicked! For this we need the “deep” paths to the files though, for example e4ftl01.cr.usgs.gov/SRTM/SRTMGL1.003/2000.02.11/N29E000.SRTMGL1.hgt.zip/N29E000.hgt.

The files inside the archives are always simply the same filename minus the “.SRTMGL1” and the .zip extension. Perfect, now we know all the files from inside the ZIPs! Right? Nope. Unfortunately some (17) of the archives do include the “.SRTMGL1” bit in their files, for example N38E051.SRTMGL1.hgt… >:(

That’s why we need to look into each zip file and determine the filename inside. (Alternatively *you* could extract them all after all, you with your fancy, huge SSD.)

We can simply use `unzip` with the `-t` switch to make it show what’s inside (and as a “free” benefit it will also check the file’s integrity for us). `tee` is used here to save the output to a file. This will take a while as it needs to read through all the files…

$ unzip -t "e4ftl01.cr.usgs.gov/SRTM/SRTMGL1.003/2000.02.11/*.SRTMGL1.hgt.zip" | tee unzip-t.log

...

Archive: e4ftl01.cr.usgs.gov/SRTM/SRTMGL1.003/2000.02.11/N29E000.SRTMGL1.hgt.zip
testing: N29E000.hgt OK
No errors detected in compressed data of e4ftl01.cr.usgs.gov/SRTM/SRTMGL1.003/2000.02.11/N29E000.SRTMGL1.hgt.zip.

Archive: e4ftl01.cr.usgs.gov/SRTM/SRTMGL1.003/2000.02.11/N27E033.SRTMGL1.hgt.zip
testing: N27E033.hgt OK
No errors detected in compressed data of e4ftl01.cr.usgs.gov/SRTM/SRTMGL1.003/2000.02.11/N27E033.SRTMGL1.hgt.zip.

Archive: e4ftl01.cr.usgs.gov/SRTM/SRTMGL1.003/2000.02.11/N45E066.SRTMGL1.hgt.zip
testing: N45E066.hgt OK
No errors detected in compressed data of e4ftl01.cr.usgs.gov/SRTM/SRTMGL1.003/2000.02.11/N45E066.SRTMGL1.hgt.zip.

14297 archives were successfully processed.
14297 archives were successfully processed.

Yay, no errors!

In the output we see the path to the archive and we see the name of the file inside. With some sed and grep we can easily construct “deep” paths.

First we remove the status lines, the blank lines and the very last full status line with `grep`:

$ cat unzip-t.log | grep -v -e "No errors" -e '^$' -e 'successfully processed'

Archive: e4ftl01.cr.usgs.gov/SRTM/SRTMGL1.003/2000.02.11/N12E044.SRTMGL1.hgt.zip
testing: N12E044.hgt OK
Archive: e4ftl01.cr.usgs.gov/SRTM/SRTMGL1.003/2000.02.11/N29W002.SRTMGL1.hgt.zip
testing: N29W002.hgt OK
Archive: e4ftl01.cr.usgs.gov/SRTM/SRTMGL1.003/2000.02.11/N45E116.SRTMGL1.hgt.zip
testing: N45E116.hgt OK

Then we concatenate every consecutive two lines into one line with `paste` (I LOVE THIS 1 TRICK!):

$ cat unzip-t.log | grep -v -e "No errors" -e '^$' -e 'successfully processed' | paste - -

Archive: e4ftl01.cr.usgs.gov/SRTM/SRTMGL1.003/2000.02.11/N12E044.SRTMGL1.hgt.zip testing: N12E044.hgt OK
Archive: e4ftl01.cr.usgs.gov/SRTM/SRTMGL1.003/2000.02.11/N29W002.SRTMGL1.hgt.zip testing: N29W002.hgt OK
Archive: e4ftl01.cr.usgs.gov/SRTM/SRTMGL1.003/2000.02.11/N45E116.SRTMGL1.hgt.zip testing: N45E116.hgt OK

Finally we strip things away with sed (I don’t care that you would do it differently, this is how I quickly did it with my flair of hammering) and direct the output into a new file called `zips`:

$ cat unzip-t.log | grep -v -e "No errors" -e '^$' -e 'successfully processed' | paste - - | sed -e 's#Archive:[ ]*##' -e 's#\t[ ]*testing: #/#' -e 's#[ ]*OK##' > zips

e4ftl01.cr.usgs.gov/SRTM/SRTMGL1.003/2000.02.11/N12E044.SRTMGL1.hgt.zip/N12E044.hgt
e4ftl01.cr.usgs.gov/SRTM/SRTMGL1.003/2000.02.11/N29W002.SRTMGL1.hgt.zip/N29W002.hgt
e4ftl01.cr.usgs.gov/SRTM/SRTMGL1.003/2000.02.11/N45E116.SRTMGL1.hgt.zip/N45E116.hgt

Yay, we have all the actual inside-zip paths now!

If you are curious, you can try gdalinfo with those now. You need to prepend /vsizip/ to the path to make it read inside zip files.

$ gdalinfo /vsizip/e4ftl01.cr.usgs.gov/SRTM/SRTMGL1.003/2000.02.11/N12E044.SRTMGL1.hgt.zip/N12E044.hgt

Driver: SRTMHGT/SRTMHGT File Format
Files: /vsizip/e4ftl01.cr.usgs.gov/SRTM/SRTMGL1.003/2000.02.11/N12E044.SRTMGL1.hgt.zip/N12E044.hgt
Size is 3601, 3601
Coordinate System is:
GEOGCS["WGS 84",
    DATUM["WGS_1984",
        SPHEROID["WGS 84",6378137,298.257223563,
            AUTHORITY["EPSG","7030"]],
        AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0,
        AUTHORITY["EPSG","8901"]],
    UNIT["degree",0.0174532925199433,
        AUTHORITY["EPSG","9122"]],
    AUTHORITY["EPSG","4326"]]
Origin = (43.999861111111109,13.000138888888889)
Pixel Size = (0.000277777777778,-0.000277777777778)
Metadata:
  AREA_OR_POINT=Point
Corner Coordinates:
Upper Left  (  43.9998611,  13.0001389) ( 43d59'59.50"E, 13d 0' 0.50"N)
Lower Left  (  43.9998611,  11.9998611) ( 43d59'59.50"E, 11d59'59.50"N)
Upper Right (  45.0001389,  13.0001389) ( 45d 0' 0.50"E, 13d 0' 0.50"N)
Lower Right (  45.0001389,  11.9998611) ( 45d 0' 0.50"E, 11d59'59.50"N)
Center      (  44.5000000,  12.5000000) ( 44d30' 0.00"E, 12d30' 0.00"N)
Band 1 Block=3601x1 Type=Int16, ColorInterp=Undefined
  NoData Value=-32768
  Unit Type: m

Hooray! GDAL reads the hgt file from inside its zip!

We need to prepend all the paths with `/vsizip/` so let’s do that:

$ sed 's#^#/vsizip/##' zips > vsizips

/vsizip/e4ftl01.cr.usgs.gov/SRTM/SRTMGL1.003/2000.02.11/N12E044.SRTMGL1.hgt.zip/N12E044.hgt
/vsizip/e4ftl01.cr.usgs.gov/SRTM/SRTMGL1.003/2000.02.11/N29W002.SRTMGL1.hgt.zip/N29W002.hgt
/vsizip/e4ftl01.cr.usgs.gov/SRTM/SRTMGL1.003/2000.02.11/N45E116.SRTMGL1.hgt.zip/N45E116.hgt

Unfortunately those 17 misnamed files from earlier need special treatment… This is what my notes say, not sure what it was supposed to do. If you are recreating this all, please just ask me and I will look at it again. For now, let’s just pretend that this leads to a file called `hgtfiles` in which all the paths are perfect and all the files are perfect.


grep -v 'SRTMGL1.hgt$' vsizips > vsizipswithoutmisnamedfiles
mkdir misnamedfiles
cd misnamedfiles/
nano ../listofmisnamedfiles # insert the paths to those 17 zips here #TODO
while read filename; do unzip "../${filename}"; done < ../listofmisnamedfiles
rename 's/SRTMGL1.//' *.hgt
cd ..
find misnamedfiles/ -type f > listofmisnamedfileshgt
cat listofmisnamedfileshgt vsizipswithoutmisnamedfiles > hgtfiles

As I said above though, go with a recent GDAL and this is all much easier. Even even included a check for the different filenames inside, how can you not like that guy!

Building a Virtual Raster Table

Virtual Raster Tables (VRT) are some kind of files that pretend to be rasters. They are awesome. Here we use a VRT that simply turns all the small rasters we have into one big ass raster.

Ok, ready to create a Virtual Raster Table!

$ time gdalbuildvrt -input_file_list hgtfiles srtmgl1.003.vrt

0...10...20...30...40...50...60...70...80...90...100 - done.

real 0m22.679s
user 0m19.250s
sys 0m3.105s

Done.

You could go ahead and use this for your work/leisure now. But remember, it is tens of gigabytes of data so if you do not use it at a 1:1 scale things will not be fun and might fry your cat. You want overviews/pyramids.

Turning the VRT into a GeoTIFF (Optional)

Let’s make a HUGEGEOTIFF because that’s cool! You don’t have to, instead you could build overviews for the .vrt file.

We want it quick to read and small so I used DEFLATE, TILED and the horizontal predictor. I ran this on a weak i7 with 2G of RAM and can’t remember what the worst bottleneck was. Probably CPU.

$ time gdal_translate -co NUM_THREADS=ALL_CPUS --config PREDICTOR 2 -co COMPRESS=DEFLATE -co TILED=YES -co BIGTIFF=YES srtmgl1.003.vrt srtmgl1.003.vrt.tif

Input file size is 1296001, 417601
0...10...20...30...40...50...60...70...80...90...100 - done.

real 231m57.961s
user 503m23.044s
sys 3m14.600s

If you are courageous you can load that file in your GIS now. But again, unless you watch it at a 1:1 scale or something close to that it WILL not be much fun and potentially expose weaknesses in your GIS and fill up your system’s memory and crash and you had no unsaved work open, didn’t you?

Building Overviews

Overviews aka pyramids are usually about 1/3 the size of the full image. If they are not, you probably used different compression settings. This was the step that I expected to be just boring to wait for, but it turned out the most problematic. I tried all available free tools but none worked properly with an input this big. With GDAL we found a workaround after a while.

gdaladdo has problems building multiple overview levels with a file this big… The trick is to built the overviews sequentially, not in one invocation. The best solution at the moment was building overviews for overviews for overviews and so on until you reach a comfortable size. Something like:


gdaladdo -ro srtmgl1.003.tif 2
gdaladdo -ro srtmgl1.003.tif.ovr 2
gdaladdo -ro srtmgl1.003.tif.ovr.ovr 2
gdaladdo -ro srtmgl1.003.tif.ovr.ovr.ovr 2
gdaladdo -ro srtmgl1.003.tif.ovr.ovr.ovr.ovr 2 4 8 16 32 64 128 256 512

I used the following parameters:
gdaladdo -oo NUM_THREADS=ALL_CPUS --config GDAL_NUM_THREADS ALL_CPUS --config GDAL_CACHEMAX 2048 --config COMPRESS_OVERVIEW DEFLATE --config PREDICTOR_OVERVIEW 2 --config BIGTIFF_OVERVIEW YES -ro

ovrovrovrovrovrovrovrovrovrovr

Apparently gdaladdo automatically makes them tiled, which is good. I used https://gist.github.com/springmeyer/3007985 to find out.

Creating overviews took just about 3 hours with this weird trick. So, in total the processing just takes half a day.

Download

Enjoy! https://www.datenatlas.de/geodata/public/srtmgl1/. I impose no license/copyright/whatever on this derivative work.

TODO

Hillshading? Slope? You do it!

Found a better, faster way? You blog it!

We could add the overviews to the main image with `gdal_translate srtmgl1.003.tif srtmgl1.003.withovr.tif -co COPY_SRC_OVERVIEWS=YES [other options]` says Even.

Transparenzportal Hamburg API: Alle Datensätze eines bestimmten Hosts

Let’s take it to the next level: Wir wollen alle auf daten-hamburg.de gehosteten Datensätze, weil da die ganzen schicken Geodaten sind. Wir müssen also einen Query bauen, der uns alle Datensätze gibt, die “^http://daten-hamburg.de/” in der resources.url haben.

Mit http://wiki.apache.org/solr/CommonQueryParameters kann man komplexe Queries schreiben, sagt http://docs.ckan.org/en/latest/api/index.html#ckan.logic.action.get.package_search . Mit ein bisschen Scrollen stößt man gegebenenfalls auf resource_search und über Google nach “ckan resource_search” auf https://github.com/ckan/ckan/issues/1494, dessen Query man dann nimmt und sich damit nach http://suche.transparenz.hamburg.de/api/action/resource_search?query=url:http://daten-hamburg.de/ durchhangelt. Voll einfach! … Der Query dauert mehrere Sekunden und scheint ALLE Hits zurückzugeben, super!

Download läuft, so langsam wie daten-hamburg.de eben leider ist: https://www.datenatlas.de/geodata/public/sources/

Insgesamt sind es rund 104 Gigabyte, allerdings inklusive einiger Duplikate. Übrigens stecken auch SHA256-Hashes in den Daten, praktisch zum Überprüfen der Downloads.

Ugly-but-does-the-job URLs rausziehen:
$ cat suche.transparenz.hamburg.de/api/action/resource_search@query\=url%3Ahttp%3A%2F%2Fdaten-hamburg.de%2F | json_pp | grep '"url"' | grep -Eo 'http.*"' | sed 's#"$##' > urls