Von wo wurde der Hamburger Fernsehturm fotografiert

Vom großartigen @Fernsehturm_HH-Account inspiriert hab ich mal ein 4 Jahre altes Projektchen aufgewärmt: Verortete Fotos auf Flickr, die mit “Hamburg” und “Fernsehturm” getaggt sind. Alternativ als Linien vom (angeblichen) Aufnahmeort zum Turm.

Image

Ungefiltert und voller Mist, aber man kann z. B. toll sehen, wie einladend die Brückem vorm Kuhmühlenteich für Fotos sind.

Kostenlose Internetpunkteabsahnidee: Dasselbe mit dem Eiffelturm, dem Londoner Dildo u. ä. machen, Konkave Hülle außen drum, schick aufbereiten, als “Nur wo man $Wahrzeichen sieht, ist $Stadt” vermarkten, €€€€.

Satellite composite of Earth 2020

A follow-up to Average Earth from Space 2018 with a how-to. For each day of 2020 I took one global true color image of the whole planet and merged them together by using the most typical color per pixel. You can see cloud patterns in astonishing detail, global wind, permafrost (careful, white can be ice and/or clouds here) and more. Scroll to the bottom for interactive full resolution viewers.

Basically we will want to overlay one satellite image per day into one image for the whole year. You need two things: The images and the GDAL suite of geospatial processing tools.

Imagery

You can get a daily satellite composite of (almost) the whole earth from NASA. For example of the Soumi NPP / VIIRS instrument. Check it out at WorldView.

You can download those images via Global Imagery Browse Services (GIBS).

As the API I used two years ago is gone, Joshua Stevens was so nice to share code he used previously. It was easy to adapt:

set -e
set -u

# run like: $ bash gibs_viirs.sh 2020-10-05
# you get: VIIRS_SNPP_CorrectedReflectance_TrueColor-2020-10-05.tif
# in ~15 minutes and at ~600 megabytes for 32768x16384 pixels

# based on https://gist.github.com/jscarto/6c0413f4820ed5141744e96e19f31205

# https://gibs.earthdata.nasa.gov/wmts/epsg4326/best/1.0.0/WMTSCapabilities.xml
# VIIRS_SNPP_CorrectedReflectance_TrueColor is not served as PNG by GIBS
# so this is using JPEG tiles as source

# -outsize 65536 32768 took ~50 minutes
# -outsize 32768 16384 took ~15 minutes
# you can run multiple instances at once without issues to reduce total time

# TODO probably should be using a less detailed tileset than 250m to put
# less stress on the server...!

layer=VIIRS_SNPP_CorrectedReflectance_TrueColor
caldate=$1  # 2020-09-09
tilelevel=8  # 8 is the highest for 250m, see Capa -> 163840 81920 would be the full outsize
# 2022 says: Dude, check what gdal says for "Input file size is x, y" and then compare it to the outsize. Use the tilelevel that gives 2x the outsize, that seems to be what's needed

# ready? let's go!
gdal_translate \
-outsize 32768 16384 \
-projwin -180 90 180 -90 \
-of GTIFF \
-co TILED=YES \
-co COMPRESS=DEFLATE \
-co PREDICTOR=2 \
-co NUM_THREADS=ALL_CPUS \
"<GDAL_WMS>
 <Service name=\"TMS\">
 <ServerUrl>https://gibs.earthdata.nasa.gov/wmts/epsg4326/best/"${layer}"/default/"${caldate}"/250m/\${z}/\${y}/\${x}.jpg</ServerUrl>
</Service>
<DataWindow>
 <UpperLeftX>-180.0</UpperLeftX><!-- makes sense -->
 <UpperLeftY>90</UpperLeftY><!-- makes sense -->
 <LowerRightX>396.0</LowerRightX><!-- wtf -->
 <LowerRightY>-198</LowerRightY><!-- wtf -->
 <TileLevel>"${tilelevel}"</TileLevel>
 <TileCountX>2</TileCountX>
 <TileCountY>1</TileCountY>
 <YOrigin>top</YOrigin>
</DataWindow>
<Projection>EPSG:4326</Projection>
<BlockSizeX>512</BlockSizeX><!-- correct for VIIRS_SNPP_CorrectedReflectance_TrueColor-->
<BlockSizeY>512</BlockSizeY><!-- correct for VIIRS_SNPP_CorrectedReflectance_TrueColor -->
<BandsCount>3</BandsCount>
</GDAL_WMS>" \
${layer}-${caldate}.tif

As this was no scientific project, please note that I have spent no time checking e. g.:

  • If one could reduce the (significantly) compression artifacts of imagery received through this (the imagery is only provided as JPEG using this particular API),
  • if the temporal queries are actually getting the correct dates,
  • if there might be more imagery available
  • or even if the geographic referencing is correct.

As it takes a long time to fetch an image this way, I decided to go for a resolution of 32768×16384 pixels instead of 65536×32768 because the latter took about 50 minutes per image. A day of 32768×16384 pixels took me about 15 minutes to download.

Overlaying the images

There are lots of options to overlay images. imagemagick/graphicsmagick might be the obvious choice but they are unfit for imagery of these dimensions (exhausting RAM). VIPS/nips2 is awesome but might require some getting used to and/or manual processing. GDAL is the hot shit and very RAM friendly if you are careful. So I used GDAL for this.

Make sure to store the images somewhere sensible for lots of I/O.

Got all the images you want to process? Cool, build a VRT for them:

gdalbuildvrt \
VIIRS_SNPP_CorrectedReflectance_TrueColor-2020.vrt \
VIIRS_SNPP_CorrectedReflectance_TrueColor-2020-*.tif

This takes some seconds.

We want to overlay the images with some fancy, highly complex mathematical formula (or not ;) ) and since GDAL’s VRT driver supports custom Python functions to manipulate pixel values, we can use numpy for that. Put this in a file called functions.py and remember the path to that file:

import numpy as np

def median(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
      raster_ysize, buf_radius, gt,  **kwargs):
    out_ar[:] = np.median(in_ar, axis = 0)

def mean(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
      raster_ysize, buf_radius, gt,  **kwargs):
    out_ar[:] = np.mean(in_ar, axis = 0)

def max(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
      raster_ysize, buf_radius, gt,  **kwargs):
    out_ar[:] = np.amax(in_ar, axis = 0)

def min(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
      raster_ysize, buf_radius, gt,  **kwargs):
    out_ar[:] = np.amin(in_ar, axis = 0)

You can now use a median, mean, min or max function for aggregating the images per pixel. For that you have to modify the VRT to include the function you want it to use. I used sed for that:

sed -e 's#<VRTRasterBand#<VRTRasterBand subClass="VRTDerivedRasterBand"#' \
-e 's#</ColorInterp>#</ColorInterp>\n<PixelFunctionLanguage>Python</PixelFunctionLanguage>\n<PixelFunctionType>functions.median</PixelFunctionType>#' \
VIIRS_SNPP_CorrectedReflectance_TrueColor-2020.vrt \
> VIIRS_SNPP_CorrectedReflectance_TrueColor-2020_median.vrt

That’s it, we are ready to use GDAL to build an image that combines all the daily images into one median image. For this to work you have to set the PYTHONPATH environment variable to include the directory of the functions.py file. If it is in the same directory where you launch gdal, you can use $PWD, otherwise enter the full path to the directory. Adjust the rest of the options as you like, e. g. to choose a different output format. If you use COG, enabling ALL_CPUS is highly recommended or building overviews will take forever.

PYTHONPATH=$PWD gdal_translate \
--config CPL_DEBUG VRT --config GDAL_CACHEMAX 25% \
--config GDAL_VRT_ENABLE_PYTHON YES \
-of COG -co NUM_THREADS=ALL_CPUS \
-co COMPRESS=DEFLATE -co PREDICTOR=2 \
VIIRS_SNPP_CorrectedReflectance_TrueColor-2020_median.vrt \
VIIRS_SNPP_CorrectedReflectance_TrueColor-2020_median.vrt.tif

This will take many hours. 35 hours for me on a Ryzen 3600 with lots of RAM and the images on a cheap SSD. The resulting file is about the same size as the single images (makes sense, doesn’t it) at ~800 megabytes.

Alternative, faster approach

A small note while we are at it: GDAL calculates overviews from the source data. And since we are using a custom VRT function here, on a lot of raster images, that takes a long time. To save a lot of that time, you can build the file without overviews first, then calculate them in a second step. With this approach they will be calculated from the final raster instead of the initial input which, when ever there is non-trivial processing involved, is way quicker:

PYTHONPATH=$PWD gdal_translate \
--config CPL_DEBUG VRT --config GDAL_CACHEMAX 25% \
--config GDAL_VRT_ENABLE_PYTHON YES \
-of GTiff -co NUM_THREADS=ALL_CPUS \
VIIRS_SNPP_CorrectedReflectance_TrueColor-2020_median.vrt \
VIIRS_SNPP_CorrectedReflectance_TrueColor-2020_median.vrt.noovr.tif

Followed by the conversion to a COG (which automatically will build the overviews as mandatory for that awesome format):

gdal_translate \
--config GDAL_CACHEMAX 25% \
-of COG -co NUM_THREADS=ALL_CPUS \
-co COMPRESS=DEFLATE -co PREDICTOR=2 \
/tmp/VIIRS_SNPP_CorrectedReflectance_TrueColor-2020_median.vrt.noovr.tif \
VIIRS_SNPP_CorrectedReflectance_TrueColor-2020_median.vrt.cog.tif

This took “just” 10 hours for the initial raster and then an additional 11 seconds for the conversion COG and the building of overviews. And the resulting file is bit-by-bit identical to the one from the direct-to-COG approach. So one third of processing time for the same result. Nice!

Result

Check it out in full, zoomable resolution:

Or download the Cloud-Optimized GeoTIFF file for your own software:

Closing remarks

Please do not consider a true representation of the typical weather or cloud cover throughout the year. The satellite takes the day imagery at local noon if I recall correctly so the rest of the day is not part of this “analysis”. I did zero plausibility or consistency checks. The data was probably reprojected multiple times through out the full (sensor->composite) pipeline. The composite is based on color alone, anything bright will lead to a white-ish color, be it snow, ice, clouds, algae, sand, …

It’s just some neat imagery to love our planet.

Update 2020-01-05

Added compression to GeoTIFF creation where useful, not sure how I missed that here. Reduces filesizes to 1/2 or 1/3 even.

Your own little internet speed monitor

I wanted to monitor my ISP’s service over time and could not find any available simple tool for that. The usual system monitoring tools are usually displaying averages, not min/max values. So I used WD40 (speedtest-cli) and duct tape (cron) to make my own.

You need to have a cron daemon set up and speedtest-cli installed.

Then prepare an empty csv file with a header like this (don’t forget a trailing newline!) and store it in a path of your choice:

Server ID,Sponsor,Server Name,Timestamp,Distance,Ping,Download,Upload,Share,IP Address

Set up a cronjob at an interval of your choice (don’t be a dick) to run a speed test and log the results to the csv file:

@hourly speedtest --csv >> /home/user/path/to/speedtest.csv

If you have a fast connection you might spot slow test servers that would badly bias your results, so exclude them using the --exclude option if necessary.

That’s all, you get a nice log of internet ping, upload and download speeds, ready to be visualized in your software of choice (like the best spreadsheet software in existence). I will have to complain to my ISP for that drop since mid December for sure:

And now that I have written this, I realise that for plotting I could also just use a min/max function for a moving time window in Grafana I guess? The speedtests would still be triggered and provide nice bursts of usage. Anyone got pointers on how to do that?

Finding the most popular reaction in Slack

This can be run against a Slack export. It will count the reactions used and display them in an ordered list. Written for readability not speed or efficiency. No guarantees that this isn’t terribly broken. Enjoy and use responsibly!

import json
import glob
import collections

# collect messages
messages = []
for filename in glob.glob('*/*.json'):
    with open(filename) as f:
        messages += json.load(f)

# extract reactions
reactions = []
for message in messages:
    if "reactions" in message:
        reactions += message["reactions"]

# count reactions
reaction_counter = collections.Counter()
for reaction in reactions:
    reaction_counter.update({reaction["name"]: reaction["count"]})

# done, print them
print(reaction_counter.most_common())

Brother DCP-L2530DW printer/scanner on Archlinux

Connect your Brother DCP-L2530DW to your WLAN/Wifi network. Find the printer’s IP (make sure it is static, e. g. by setting it up accordingly in your router). Adjust the IP in the lines below.

Scanning

Install brscan4 and xsane.
As root run: brsaneconfig4 -a name="DCP-L2530DW" model="DCP-L2530DW" ip=192.168.1.123

Printing

Install cups.
As root run: lpadmin -p DCP-L2530DW-IPPeverywhere -E -v "ipp://192.168.1.123/ipp/print" -m everywhere

And you are ready to go, enjoy!

PS: If you change the IP, you might need to edit /etc/opt/brother/scanner/brscan5/brsanenetdevice.cfg and /etc/cups/printers.conf.


For printing via USB (e.g. because you don’t want to have a 2.4GHz network anymore but this sad printer supports no 5GHz), simply install brother-dcp-l2530dw from AUR, “find new printers” in CUPS and add it.

Interaktive Karte der Baugenehmigungen in Hamburg

Ich habe endlich mal ein über mehrere Jahre gereiftes (eher gealtertes und verfrickeltes) Projekt in einen einigermaßen vorzeigbaren Zustand gebracht: Eine interaktive Karte der Baugenehmigungen in Hamburg.

Je weniger transparent eine Fläche dargestellt ist, desto mehr Dokumente sind mit ihr verknüpft (ja, es ist ein Feature je Dokument D;). Eigentlich war die Seite anders aufgebaut, mit einem PDF-Viewer auf der rechten Seite. Aber da daten.transparenz.hamburg.de kein HTTPS kann (seid ihr auch so gespannt auf die UMTS-Auktion nächste Woche?), geht das aus Sicherheitsgründen nicht ohne ein Spiegeln der Daten oder einen Proxy.

Die Daten kommen größtenteils aus dem Transparenzportal. Für das Matching der angebenen Flurstücks-“IDs” zu den tatsächlichen Flurstücken war aber ein erheblicher Aufwand nötig. Das Drama ging bis hin zum Parsen aus PDFs, die mal so, mal so formatiert waren und natürlich auch voller Eingabefehler auf Behördenseite. Vielleicht schreibe ich da noch beizeiten mal einen Rant. TL;DR: Ohne die zugehörige Gemarkung ist mit einer Flurstücks-“ID” wie in den Daten angegeben keine räumliche Zuordnung möglich. In den veröffentlichten Daten stecken nur die Nenner der Flurstücksnummern, nicht aber die Gemarkungsnummern. Ziemlich absurd.

Das ganze ist nur ein Prototyp, vermutlich voller Fehler und fehlender Daten. Aber interessant und spaßig ist es, viel Freude also!

Es wäre noch eine MENGE zu tun, um das ganze rund zu machen. Falls du Lust hast, melde dich gerne. Es geht vom wilden Parsen, über Sonderregeln für kaputte Dokumente, zu Kartenstyling bis zur UI. Schön wäre es auch alles in einer anständigen Datenbank zu halten und nicht nur nach der räumlichen Dimension durchsuchen zu können.

Eyes that follow the cursor in QGIS

I did this all in some random EPSG:25832 location and scale, this uses several magic numbers that make it work for that. I did not make it work for any CRS or canvas size. If you do, please share. But this is just silly fun so …..

Have two polygons for the eyes.

Set their Symbol Layer Type to Geometry Generator and smooth them:

smooth($geometry, 3)

Add another Geometry Generator symbol layer to it and throw in the following magic expression to build the pupils. It calculates the distance from your cursor to the centroids of the polygons and it prepares a line from each centroid to the cursor. Then it places a point geometry onto that line at a fraction of the distance. Use the Geometry Type “Point / MultiPoint” for this Geometry Generator.

with_variable(
  'distance',
  distance(@canvas_cursor_point, centroid($geometry)),
  with_variable(
    'line',
    make_line(centroid($geometry), @canvas_cursor_point),
    line_interpolate_point(@line, @distance/5)
  )
)

Set the layer itself to automatically refresh its rendering in the layer’s properties:

Now the eyes will follow your cursor as it moves across the map canvas!

Task for you: The pupils are not clipped and can exit the eyes. Oops! Head over to Topi’s for a hint how to solve this: https://twitter.com/tjukanov/status/1278689814288760837

And then you paint the rest of the fucking owl:

Those are lines with 3 vertices each:

And the middle vertex is moved vertically using the expression below. Basically the line is reconstructed.

smooth(
  make_line(
    start_point($geometry),  -- first point is kept
    translate(
      point_n($geometry, 2),  -- second point is translated
      0,
      distance(
        @canvas_cursor_point, centroid($geometry)
      )/10  -- move according the cursor distance to centroid
    ),
    end_point($geometry)  -- last point is kept
  ),
  4
)

Here is my project file including the temporary scratch layers (use the awesome Memory Layer Saver plugin to have them loaded automatically):

Das eigene kleine Deutschlandradio Archiv

Mediatheken des Öffentlich-rechtlichen Rundfunks müssen wegen asozialen Arschlöchern ihre Inhalte depublizieren. Wegen anderer Arschlöcher sind die Inhalte nicht konsequent unter freien Lizenzen, aber das ist ein anderes Thema.

Ich hatte mir irgendwann mal angesehen, was es eigentlich für ein Aufwand wäre, die Inhalte verschiedener Mediatheken in ein privates Archiv zu spiegeln. Mit dem Deutschlandradio hatte ich angefangen und mit den üblichen Tools täglich die neuen Audiobeiträge in ein Google Drive geschoben. Dieses Setup läuft jetzt seit mehr als 2 Jahren ohne Probleme und vielleicht hat ja auch wer anders Spaß dran:

Also:

  • rclone einrichten oder mit eigener Infrastruktur arbeiten (dann die rclone-Zeile mit z.B. rsync ersetzen)
  • <20 GB Platz haben
  • Untenstehendes Skript als täglichen Cronjob einrichten (und sich den Output zu mailen lassen)
#!/bin/bash

# exit if anything fails
# not a good idea as downloads might 404 :D
set -e

cd /home/dradio/deutschlandradio

# get all available files
wget -nv -nc -x "http://srv.deutschlandradio.de/aodlistaudio.1706.de.rpc?drau:page="{0..100}"&drau:limit=1000"
grep -hEo 'http.*mp3' srv.deutschlandradio.de/* | sort | uniq > urls

# check which ones are new according to the list of done files
comm -13 urls_done urls > todo

numberofnewfiles=$(wc -l todo | awk '{print $1}')
echo "${numberofnewfiles} new files"

if (( numberofnewfiles < 1 )); then
        echo "exiting"
        exit
fi

# get the new ones
echo "getting new ones"
wget -i todo -nv -x -nc || echo "true so that set -e does not exit here :)"
echo "new ones downloaded"

# copy them to remote storage
rclone copy /home/dradio_scraper/deutschlandradio remote:deutschlandradio && echo "rclone done"

## clean up
# remove files
echo "cleaning up"
rm -r srv.deutschlandradio.de/
rm -rv ondemand-mp3.dradio.de/
rm urls

# update list of done files
cat urls_done todo | sort | uniq > /tmp/urls_done
mv /tmp/urls_done urls_done

# save todo of today
mv todo urls_$(date +%Y%m%d)

echo "done"

Pro Tag sind es so 2-3 Gigabyte neuer Beiträge.

In zwei Jahren sind rund 2,5 Terabyte zusammengekommen und ~300.000 Dateien, aber da sind eventuell auch die Seiten des Feeds mitgezählt worden und Beiträge, die schon älter waren.

Wer mehr will nimmt am besten direkt die Mediathekview-Datenbank als Grundlage.

Nächster Schritt wäre das eigentlich auch täglich nach archive.org zu schieben.

Highlight current timeslice in a QGIS Atlas layout

Did this for an ex-colleague some months ago and forgot to share the how-to publically. We needed a visual representation of the current time in a layout that showed both a raster map (different layer per timeslice) and a timeseries plot of an aspect of the data (this was created outside QGIS).

Have lots of raster layers you want to iterate through. I have:

./ECMWF_ERA_40_subset/2019-01-01.tif
./ECMWF_ERA_40_subset/2019-01-02.tif
./ECMWF_ERA_40_subset/2019-01-03.tif
./ECMWF_ERA_40_subset/2019-01-04.tif
./ECMWF_ERA_40_subset/2019-01-05.tif
./ECMWF_ERA_40_subset/2019-01-06.tif
...

Create a new layer for your map extent. Draw your extent as geometry. Duplicate that geometry as many times as you have days. Alternatively you could of course have different geometries per day. Whatever you do, you need a layer with one feature per timeslice for the Atlas to iterate though. I have 30 days to visualise so I duplicated my extent 30 times.

Open the Field Calculator. Add a new field called date as string type (not as date type until some bug is fixed (sorry, did not make a note here, maybe sorting is/was broken?)) with an expression that represents time and orders chronologically if sorted by QGIS. For example: '2019-01-' || lpad(@row_number,2,0) (assuming your records are in the correct order if you have different geometries…)

Have your raster layers named the same way as the date attribute values.

Make a new layout.

For your Layout map check “Lock layers” and use date as expression for the “Lock layers” override. This will now select the appropriate raster layer, based on the attribute value <-> layer name, to display for each Atlas page.

Cool, if you preview the Atlas now you got a nice animation through your raster layers. Let’s do part 2:

In your layout add your timeseries graph. Give it a unique ID, e. g. “plot box”. Set its width and height via new variables (until you can get those via an expression this is needed for calculations below).

Create a box to visualise the timeslice. Set its width to map_get(item_variables('plot box'), 'plot_width') / @atlas_totalfeatures. For the height and y use/adjust this expression: map_get(item_variables('plot box'), 'plot_height'). For x comes the magic:

with_variable(
'days_total',
day(to_date(maximum("date"))-to_date(minimum("date")))+1,
-- number of days in timespan
-- +1 because we need the number of days in total
-- not the inbetween, day() to just get the number of days
with_variable(
'mm_per_day',
map_get(item_variables('plot box'), 'plot_width') / @days_total,
with_variable(
'days',
day(to_date(attribute(@atlas_feature, 'date'))-to_date(minimum("date"))),
-- number of days the current feature is from the first day
-- to_date because BUG attribute() returns datetime for date field
@mm_per_day * @days + map_get(item_variables('plot box'), 'plot_x')
)
)
)

This will move the box along the x axis accordingly.

Have fun!