Category Archives: commandline

Satellite composite of Earth 2020

A follow-up to Average Earth from Space 2018 with a how-to. For each day of 2020 I took one global true color image of the whole planet and merged them together by using the most typical color per pixel. You can see cloud patterns in astonishing detail, global wind, permafrost (careful, white can be ice and/or clouds here) and more. Scroll to the bottom for interactive full resolution viewers.

Basically we will want to overlay one satellite image per day into one image for the whole year. You need two things: The images and the GDAL suite of geospatial processing tools.

Imagery

You can get a daily satellite composite of (almost) the whole earth from NASA. For example of the Soumi NPP / VIIRS instrument. Check it out at WorldView.

You can download those images via Global Imagery Browse Services (GIBS).

As the API I used two years ago is gone, Joshua Stevens was so nice to share code he used previously. It was easy to adapt:

set -e
set -u

# run like: $ bash gibs_viirs.sh 2020-10-05
# you get: VIIRS_SNPP_CorrectedReflectance_TrueColor-2020-10-05.tif
# in ~15 minutes and at ~600 megabytes for 32768x16384 pixels

# based on https://gist.github.com/jscarto/6c0413f4820ed5141744e96e19f31205

# https://gibs.earthdata.nasa.gov/wmts/epsg4326/best/1.0.0/WMTSCapabilities.xml
# VIIRS_SNPP_CorrectedReflectance_TrueColor is not served as PNG by GIBS
# so this is using JPEG tiles as source

# -outsize 65536 32768 took ~50 minutes
# -outsize 32768 16384 took ~15 minutes
# you can run multiple instances at once without issues to reduce total time

# TODO probably should be using a less detailed tileset than 250m to put
# less stress on the server...!

layer=VIIRS_SNPP_CorrectedReflectance_TrueColor
caldate=$1  # 2020-09-09
tilelevel=8  # 8 is the highest for 250m, see Capa -> 163840 81920 would be the full outsize
# 2022 says: Dude, check what gdal says for "Input file size is x, y" and then compare it to the outsize. Use the tilelevel that gives 2x the outsize, that seems to be what's needed

# ready? let's go!
gdal_translate \
-outsize 32768 16384 \
-projwin -180 90 180 -90 \
-of GTIFF \
-co TILED=YES \
-co COMPRESS=DEFLATE \
-co PREDICTOR=2 \
-co NUM_THREADS=ALL_CPUS \
"<GDAL_WMS>
 <Service name=\"TMS\">
 <ServerUrl>https://gibs.earthdata.nasa.gov/wmts/epsg4326/best/"${layer}"/default/"${caldate}"/250m/\${z}/\${y}/\${x}.jpg</ServerUrl>
</Service>
<DataWindow>
 <UpperLeftX>-180.0</UpperLeftX><!-- makes sense -->
 <UpperLeftY>90</UpperLeftY><!-- makes sense -->
 <LowerRightX>396.0</LowerRightX><!-- wtf -->
 <LowerRightY>-198</LowerRightY><!-- wtf -->
 <TileLevel>"${tilelevel}"</TileLevel>
 <TileCountX>2</TileCountX>
 <TileCountY>1</TileCountY>
 <YOrigin>top</YOrigin>
</DataWindow>
<Projection>EPSG:4326</Projection>
<BlockSizeX>512</BlockSizeX><!-- correct for VIIRS_SNPP_CorrectedReflectance_TrueColor-->
<BlockSizeY>512</BlockSizeY><!-- correct for VIIRS_SNPP_CorrectedReflectance_TrueColor -->
<BandsCount>3</BandsCount>
</GDAL_WMS>" \
${layer}-${caldate}.tif

As this was no scientific project, please note that I have spent no time checking e. g.:

  • If one could reduce the (significantly) compression artifacts of imagery received through this (the imagery is only provided as JPEG using this particular API),
  • if the temporal queries are actually getting the correct dates,
  • if there might be more imagery available
  • or even if the geographic referencing is correct.

As it takes a long time to fetch an image this way, I decided to go for a resolution of 32768×16384 pixels instead of 65536×32768 because the latter took about 50 minutes per image. A day of 32768×16384 pixels took me about 15 minutes to download.

Overlaying the images

There are lots of options to overlay images. imagemagick/graphicsmagick might be the obvious choice but they are unfit for imagery of these dimensions (exhausting RAM). VIPS/nips2 is awesome but might require some getting used to and/or manual processing. GDAL is the hot shit and very RAM friendly if you are careful. So I used GDAL for this.

Make sure to store the images somewhere sensible for lots of I/O.

Got all the images you want to process? Cool, build a VRT for them:

gdalbuildvrt \
VIIRS_SNPP_CorrectedReflectance_TrueColor-2020.vrt \
VIIRS_SNPP_CorrectedReflectance_TrueColor-2020-*.tif

This takes some seconds.

We want to overlay the images with some fancy, highly complex mathematical formula (or not ;) ) and since GDAL’s VRT driver supports custom Python functions to manipulate pixel values, we can use numpy for that. Put this in a file called functions.py and remember the path to that file:

import numpy as np

def median(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
      raster_ysize, buf_radius, gt,  **kwargs):
    out_ar[:] = np.median(in_ar, axis = 0)

def mean(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
      raster_ysize, buf_radius, gt,  **kwargs):
    out_ar[:] = np.mean(in_ar, axis = 0)

def max(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
      raster_ysize, buf_radius, gt,  **kwargs):
    out_ar[:] = np.amax(in_ar, axis = 0)

def min(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,
      raster_ysize, buf_radius, gt,  **kwargs):
    out_ar[:] = np.amin(in_ar, axis = 0)

You can now use a median, mean, min or max function for aggregating the images per pixel. For that you have to modify the VRT to include the function you want it to use. I used sed for that:

sed -e 's#<VRTRasterBand#<VRTRasterBand subClass="VRTDerivedRasterBand"#' \
-e 's#</ColorInterp>#</ColorInterp>\n<PixelFunctionLanguage>Python</PixelFunctionLanguage>\n<PixelFunctionType>functions.median</PixelFunctionType>#' \
VIIRS_SNPP_CorrectedReflectance_TrueColor-2020.vrt \
> VIIRS_SNPP_CorrectedReflectance_TrueColor-2020_median.vrt

That’s it, we are ready to use GDAL to build an image that combines all the daily images into one median image. For this to work you have to set the PYTHONPATH environment variable to include the directory of the functions.py file. If it is in the same directory where you launch gdal, you can use $PWD, otherwise enter the full path to the directory. Adjust the rest of the options as you like, e. g. to choose a different output format. If you use COG, enabling ALL_CPUS is highly recommended or building overviews will take forever.

PYTHONPATH=$PWD gdal_translate \
--config CPL_DEBUG VRT --config GDAL_CACHEMAX 25% \
--config GDAL_VRT_ENABLE_PYTHON YES \
-of COG -co NUM_THREADS=ALL_CPUS \
-co COMPRESS=DEFLATE -co PREDICTOR=2 \
VIIRS_SNPP_CorrectedReflectance_TrueColor-2020_median.vrt \
VIIRS_SNPP_CorrectedReflectance_TrueColor-2020_median.vrt.tif

This will take many hours. 35 hours for me on a Ryzen 3600 with lots of RAM and the images on a cheap SSD. The resulting file is about the same size as the single images (makes sense, doesn’t it) at ~800 megabytes.

Alternative, faster approach

A small note while we are at it: GDAL calculates overviews from the source data. And since we are using a custom VRT function here, on a lot of raster images, that takes a long time. To save a lot of that time, you can build the file without overviews first, then calculate them in a second step. With this approach they will be calculated from the final raster instead of the initial input which, when ever there is non-trivial processing involved, is way quicker:

PYTHONPATH=$PWD gdal_translate \
--config CPL_DEBUG VRT --config GDAL_CACHEMAX 25% \
--config GDAL_VRT_ENABLE_PYTHON YES \
-of GTiff -co NUM_THREADS=ALL_CPUS \
VIIRS_SNPP_CorrectedReflectance_TrueColor-2020_median.vrt \
VIIRS_SNPP_CorrectedReflectance_TrueColor-2020_median.vrt.noovr.tif

Followed by the conversion to a COG (which automatically will build the overviews as mandatory for that awesome format):

gdal_translate \
--config GDAL_CACHEMAX 25% \
-of COG -co NUM_THREADS=ALL_CPUS \
-co COMPRESS=DEFLATE -co PREDICTOR=2 \
/tmp/VIIRS_SNPP_CorrectedReflectance_TrueColor-2020_median.vrt.noovr.tif \
VIIRS_SNPP_CorrectedReflectance_TrueColor-2020_median.vrt.cog.tif

This took “just” 10 hours for the initial raster and then an additional 11 seconds for the conversion COG and the building of overviews. And the resulting file is bit-by-bit identical to the one from the direct-to-COG approach. So one third of processing time for the same result. Nice!

Result

Check it out in full, zoomable resolution:

Or download the Cloud-Optimized GeoTIFF file for your own software:

Closing remarks

Please do not consider a true representation of the typical weather or cloud cover throughout the year. The satellite takes the day imagery at local noon if I recall correctly so the rest of the day is not part of this “analysis”. I did zero plausibility or consistency checks. The data was probably reprojected multiple times through out the full (sensor->composite) pipeline. The composite is based on color alone, anything bright will lead to a white-ish color, be it snow, ice, clouds, algae, sand, …

It’s just some neat imagery to love our planet.

Update 2020-01-05

Added compression to GeoTIFF creation where useful, not sure how I missed that here. Reduces filesizes to 1/2 or 1/3 even.

Your own little internet speed monitor

I wanted to monitor my ISP’s service over time and could not find any available simple tool for that. The usual system monitoring tools are usually displaying averages, not min/max values. So I used WD40 (speedtest-cli) and duct tape (cron) to make my own.

You need to have a cron daemon set up and speedtest-cli installed.

Then prepare an empty csv file with a header like this (don’t forget a trailing newline!) and store it in a path of your choice:

Server ID,Sponsor,Server Name,Timestamp,Distance,Ping,Download,Upload,Share,IP Address

Set up a cronjob at an interval of your choice (don’t be a dick) to run a speed test and log the results to the csv file:

@hourly speedtest --csv >> /home/user/path/to/speedtest.csv

If you have a fast connection you might spot slow test servers that would badly bias your results, so exclude them using the --exclude option if necessary.

That’s all, you get a nice log of internet ping, upload and download speeds, ready to be visualized in your software of choice (like the best spreadsheet software in existence). I will have to complain to my ISP for that drop since mid December for sure:

And now that I have written this, I realise that for plotting I could also just use a min/max function for a moving time window in Grafana I guess? The speedtests would still be triggered and provide nice bursts of usage. Anyone got pointers on how to do that?

Brother DCP-L2530DW printer/scanner on Archlinux

Find your Brother DCP-L2530DW printer’s IP (make sure it is static, e. g. by setting it up accordingly in your router). Adjust the IP in the lines below.

Scanning

Install brscan5 and xsane.
As root run: brsaneconfig5 -a name="DCP-L2530DW" model="DCP-L2530DW" ip=192.168.1.123

Printing

Install cups.
As root run: lpadmin -p DCP-L2530DW-IPPeverywhere -E -v "ipp://192.168.1.123/ipp/print" -m everywhere

And you are ready to go, enjoy!

Das eigene kleine Deutschlandradio Archiv

Mediatheken des Öffentlich-rechtlichen Rundfunks müssen wegen asozialen Arschlöchern ihre Inhalte depublizieren. Wegen anderer Arschlöcher sind die Inhalte nicht konsequent unter freien Lizenzen, aber das ist ein anderes Thema.

Ich hatte mir irgendwann mal angesehen, was es eigentlich für ein Aufwand wäre, die Inhalte verschiedener Mediatheken in ein privates Archiv zu spiegeln. Mit dem Deutschlandradio hatte ich angefangen und mit den üblichen Tools täglich die neuen Audiobeiträge in ein Google Drive geschoben. Dieses Setup läuft jetzt seit mehr als 2 Jahren ohne Probleme und vielleicht hat ja auch wer anders Spaß dran:

Also:

  • rclone einrichten oder mit eigener Infrastruktur arbeiten (dann die rclone-Zeile mit z.B. rsync ersetzen)
  • <20 GB Platz haben
  • Untenstehendes Skript als täglichen Cronjob einrichten (und sich den Output zu mailen lassen)
#!/bin/bash

# exit if anything fails
# not a good idea as downloads might 404 :D
set -e

cd /home/dradio/deutschlandradio

# get all available files
wget -nv -nc -x "http://srv.deutschlandradio.de/aodlistaudio.1706.de.rpc?drau:page="{0..100}"&drau:limit=1000"
grep -hEo 'http.*mp3' srv.deutschlandradio.de/* | sort | uniq > urls

# check which ones are new according to the list of done files
comm -13 urls_done urls > todo

numberofnewfiles=$(wc -l todo | awk '{print $1}')
echo "${numberofnewfiles} new files"

if (( numberofnewfiles < 1 )); then
        echo "exiting"
        exit
fi

# get the new ones
echo "getting new ones"
wget -i todo -nv -x -nc || echo "true so that set -e does not exit here :)"
echo "new ones downloaded"

# copy them to remote storage
rclone copy /home/dradio_scraper/deutschlandradio remote:deutschlandradio && echo "rclone done"

## clean up
# remove files
echo "cleaning up"
rm -r srv.deutschlandradio.de/
rm -rv ondemand-mp3.dradio.de/
rm urls

# update list of done files
cat urls_done todo | sort | uniq > /tmp/urls_done
mv /tmp/urls_done urls_done

# save todo of today
mv todo urls_$(date +%Y%m%d)

echo "done"

Pro Tag sind es so 2-3 Gigabyte neuer Beiträge.

In zwei Jahren sind rund 2,5 Terabyte zusammengekommen und ~300.000 Dateien, aber da sind eventuell auch die Seiten des Feeds mitgezählt worden und Beiträge, die schon älter waren.

Wer mehr will nimmt am besten direkt die Mediathekview-Datenbank als Grundlage.

Nächster Schritt wäre das eigentlich auch täglich nach archive.org zu schieben.

zstandard compression in GeoTIFF

I ran GDAL 2.4’s gdal_translate (GDAL 2.4.0dev-333b907 or GDAL 2.4.0dev-b19fd35e6f-dirty, I am not sure) on some GeoTIFFs to compare the new ZSTD compression support to DEFLATE in file sizes and time taken.

Hardware was a mostly idle Intel(R) Xeon(R) CPU E3-1245 V2 @ 3.40GHz with fairly old ST33000650NS (Seagate Constellation) harddisks and lots of RAM.

A small input file was DGM1_2x2KM_XYZ_HH_2016-01-04.tif with about 40,000 x 40,000 pixels at around 700 Megabytes.
A big input file was srtmgl1.003.tif with about 1,3000,000 x 400,000 pixels at 87 Gigabytes.
Both input files had been DEFLATE compressed at the default level 6 without using a predictor (that’s what the default DEFLATE level will make them smaller here).

gdal_translate -co NUM_THREADS=ALL_CPUS -co PREDICTOR=2 -co TILED=YES -co BIGTIFF=YES --config GDAL_CACHEMAX 6144 was used all the time.
For DEFLATE -co COMPRESS=DEFLATE -co ZLEVEL=${level} was used, for ZSTD -co COMPRESS=ZSTD -co ZSTD_LEVEL=${level}

Mind the axes, sometimes I used a logarithmic scale!

Small file

DEFLATE

ZSTD

Big file

DEFLATE

ZSTD

Findings

It has been some weeks since I really looked at the numbers, so I am making the following up spontaneously. Please correct me!
Those numbers in the findings below should be percentages (between the algorithms, to their default values, etc), but my curiosity was satisfied. At the bottom is the data, maybe you can take it to present a nicer evaluation? ;)

ZSTD is powerful and weird. Sometimes subsequent levels might lead to the same result, sometimes a higher level will be fast or bigger. On low levels it is just as fast as DEFLATE or faster with similar or smaller sizes.

A <700 Megabyte version of the small file was accomplished within a minute with DEFLATE (6) or half a minute with ZSTD (5). With ZSTD (17) it got down to <600 Megabyte in ~5 Minutes, while DEFLATE never got anywhere near that.
Similarly for the big file, ZSTD (17) takes it down to 60 Gigabytes but it took almost 14 hours. DEFLATE capped at 65 Gigabytes. The sweet spot for ZSTD was at 10 with 4 hours for 65 Gigabytes (DEFLATE took 11 hours for that).

In the end, it is hard to say what default level ZSTD should take. For the small file level 5 was amazing, being even smaller than and almost twice as fast as the default (9). But for the big file the gains are much more gradual, here level 3 or level 10 stand out. I/O might be to blame?

Yes, the machine was not stressed and I did reproduce those weird ones.

Raw numbers

Small file

Algorithm Level Time [s] Size [Bytes] Size [MB] Comment
ZSTD 1 18 825420196 787
ZSTD 2 19 783437560 747
ZSTD 3 21 769517199 734
ZSTD 4 25 768127094 733
ZSTD 5 31 714610868 682
ZSTD 6 34 720153450 687
ZSTD 7 40 729787784 696
ZSTD 8 42 729787784 696
ZSTD 9 51 719396825 686 default
ZSTD 10 63 719394955 686
ZSTD 11 80 719383624 686
ZSTD 12 84 712429763 679
ZSTD 13 133 708790567 676
ZSTD 14 158 707088444 674
ZSTD 15 265 706788234 674
ZSTD 16 199 632481860 603
ZSTD 17 287 621778612 593
ZSTD 18 362 614424373 586
ZSTD 19 549 617071281 588
ZSTD 20 834 617071281 588
ZSTD 21 1422 616979884 588
DEFLATE 1 25 852656871 813
DEFLATE 2 26 829210959 791
DEFLATE 3 32 784069125 748
DEFLATE 4 31 758474345 723
DEFLATE 5 39 752578464 718
DEFLATE 6 62 719159371 686 default
DEFLATE 7 87 710755144 678
DEFLATE 8 200 705440096 673
DEFLATE 9 262 703038321 670

Big file

Algorithm Level Time [m] Size [Bytes] Size [MB] Comment
ZSTD 1 70 76132312441 72605
ZSTD 2 58 75351154492 71860
ZSTD 3 63 73369706659 69971
ZSTD 4 75 73343346296 69946
ZSTD 5 73 72032185603 68695
ZSTD 6 91 72564406429 69203
ZSTD 7 100 71138034760 67843
ZSTD 8 142 71175109524 67878
ZSTD 9 175 71175109524 67878 default
ZSTD 10 235 69999288435 66757
ZSTD 11 406 69999282203 66757
ZSTD 12 410 69123601926 65921
ZSTD 13 484 69123601926 65921
ZSTD 14 502 68477183815 65305
ZSTD 15 557 67494752082 64368
ZSTD 16 700 67494752082 64368
ZSTD 17 820 64255634015 61279
ZSTD 18 869 63595433364 60649
ZSTD 19 1224 63210562485 60282
ZSTD 20 2996 63140602703 60216
ZSTD 21 lolno
DEFLATE 1 73 87035905568 83004
DEFLATE 2 76 85131650648 81188
DEFLATE 3 73 79499430225 75817
DEFLATE 4 77 75413492394 71920
DEFLATE 5 92 76248511117 72716
DEFLATE 6 129 73901542836 70478 default
DEFLATE 7 166 73120114047 69733
DEFLATE 8 407 70446588490 67183
DEFLATE 9 643 70012677124 66769

Lots of CityGML files to a single obj to draco with localising coordinates to 0,0,0

I used the CityGML LoD2 model of Hamburg here, http://daten-hamburg.de/geographie_geologie_geobasisdaten/3d_stadtmodell_lod2/LoD2-DE_HH_2017-12-31.zip from http://suche.transparenz.hamburg.de/dataset/3d-stadtmodell-lod2-de-hamburg3. (If you do this for Hamburg, you might want to not include Neuwerk’s tiles ;))

1. https://github.com/tudelft3d/CityGML2OBJs/

python2 CityGML2OBJs.py -i LoD2-DE_HH_2017-12-31.zip/ -o LoD2-DE_HH_2017-12-31.zip_obj/

converts each CityGML file to a obj (takes about half an hour on my system).

2. http://gfx.cs.princeton.edu/proj/trimesh2/

mesh_cat LoD2-DE_HH_2017-12-31.zip_obj/*.obj -o LoD2-DE_HH_2017-12-31.zip.obj

turns them into one single obj.

mesh_filter LoD2-DE_HH_2017-12-31.zip.obj -center -scale 0.001 -rot 90 -1 0 0 LoD2-DE_HH_2017-12-31.zip_localised.obj

translates “so center of mass is at (0,0,0)” and scales to something smaller and rotates so that z is up. trimesh2’s documentation on the rot parameter is insufficient, I think my solution means “rotate by 90 degrees -1 times around x, 0 times around y, 0 times around z” or something like that. Look into the source if that helps you.

3. https://google.github.io/draco/

draco_encoder -i LoD2-DE_HH_2017-12-31.zip_localised.obj -o LoD2-DE_HH_2017-12-31.zip_localised.obj.drc

makes it tiny (8 Megabyte).

I did not do any error checking, used a epsilon of 1 in CityGML2OBJs and just wanted to play around.

Specifying the read/open driver/format with GDAL/OGR

For the commandline utilities you can’t. One possible workaround is using https://trac.osgeo.org/gdal/wiki/ConfigOptions#GDAL_SKIP and https://trac.osgeo.org/gdal/wiki/ConfigOptions#OGR_SKIP to blacklist drivers until the one you want is its first choice. A feature request exists at https://trac.osgeo.org/gdal/ticket/5917 but it seems that the option is not exposed as commandline option (yet?).

PS: If what you are trying to do is reading a .txt (or whatever) file as .csv and you get angry at the CSV driver only being selected if the file has a .csv extension, use CSV:yourfilename.txt

PPS: This post was motivated by not finding above information when searching the web. Hopefully this will rank high enough for *me* to find it next time. ;)

z/x/y.ext tiles to mbtiles/gpkg

I just had some trouble converting a set of tiles I scraped into a nice package. I tried both mb-util and tiles2gpkg_parallel.py. Both seemed to work but QGIS did not display anything. The “solution” was to convert the tiles from 512×512 pixels to 256×256. WTF?

tiles2gpkg_parallel.py also crashed at joining its intermediate files if used in Python 3.6. It needs “-tileorigin ul” for OSM-like tile names.