Welcome to #GISmaster

“Social media” was a mistake so I am bringing over some nice things I had posted on Twitter to this blog.

I love Taskmaster and … well:

  • Bring the most surprising sidecar file.
  • Recreate a OSGeo project’s logo using these materials, best recreation wins.
  • Trigger error 99999 as many times as you can. Most different reasons wins.
  • Bring the best value to find in a date column.
  • Georeference this monochrome scan of a hand-drawn 1910 map, most accurate wins.
  • Draw a world map. Most area conformal map wins.
  • Name as many QGIS releases as you can, most release names wins.
  • Bring the best typeface for a remote village.
  • Invent a new Shapefile sidecar file. Most useful sidecar file wins.
  • Make the most beautiful choropleth map. You have 5 minutes to name your data sources then 15 minutes to make your map.
  • Bring the most disturbing data preparation story.
  • Demonstrate a standard GIS operation as interpretative dance.
  • Get a social media leech to repost a map of yours without credit. Most retweets wins.
  • Bring the worst communal geo data portal.
  • Create the coolest hex grid map from this dataset of beehives.
  • Spell/pronounce GDAL’s main developer’s name correctly.
  • Imitate Ian Turton when a demanding user calls him on his private number for GeoServer support.
  • Bring a sane GML file.
  • Write a standard-compliant CityGML parser.
  • Design the most beautiful north arrow.
  • Choose an angle. (…) (…) Design a tube map only using lines at the angle you chose earlier.

Kate contributed “Determine what a Shapefile projection should be without a .prj file or metadata” :D

Previously at https://twitter.com/cartocalypse/status/1530251866000465920

#30DayMapChallenge as #1Day30MapsChallenge (2021)

Not sure why I never posted this last year but I did the #30DayMapChallenge in a single day, streamed live via a self-hosted Owncast instance. It was … insane and fun. This year I will do it again, on the 26th of November.

Here are most of the maps I made last year:

Some notes I kept, please bug me about recovering the others from my Twitter archive (I deleted old tweets a bit too early):

  • 1 Points: Pins via Geometry Generator in QGIS
  • 2 Lines: River elevation profiles of Elbe, Rhein, Ems, Weser, Donau and Main. DEM: © GeoBasis-DE / BKG (2021)
  • 13 NaEr I mean Natural Earth (Blame @tjukanov)
  • 18 Water (DGM-W 2010 Unter- und Außenelbe, Wasserstraßen- und Schifffahrtsverwaltung des Bundes, http://kuestendaten.de, 2010)
  • 20 Movement: Emojitions on a curvy trajectory. State changes depending on the curvyness ahead. Background: (C) OpenStreetMap Contributors <3
  • 21 Elevation with qgis2threejs (It’s art, I swear!
  • 22 Boundaries: Inspired by Command and Conquer Red Alert. Background by Spiney (CC-BY 3.0 / CC-BY-SA 3.0, https://opengameart.org/node/12098)
  • 24 Historical: Buildings in Hamburg that were built before the war (at least to some not so great dataset). Data Lizenz: Datenlizenz Deutschland Namensnennung 2.0 (Freie und Hansestadt Hamburg, Landesbetrieb Geoinformation und Vermessung (LGV))
  • 27 Heatmap: Outdoor advertisements (or something like that) in Hamburg. Fuck everything about that! Data Lizenz: Datenlizenz Deutschland Namensnennung 2.0 (Freie und Hansestadt Hamburg, Behörde für Verkehr und Mobilitätswende, (BVM))
  • 28 Earth not flat. Using my colleague’s Beeline plugin to create lines between the airports I have flown too and the Globe Builder plugin by @gispofinland to make a globe.

Looking at the number of “concurrent” readers on the Süddeutsche Zeitung articles

I scraped the numbers of live readers per article published by Süddeutsche Zeitung on their website for more than 3 years, never did anything too interesting with it and just decided to stop. Basically they publish a list of stories and their estimated current concurrent number of readers. Meaning you get a timestamp -> story/URL -> number of current readers. Easy enough and interesting for sure.

Here is how it worked, some results and data for you to build upon. Loads of it is stupid and silly, this is just me dumping it publicly so I can purge it.

Database

For data storage I chose the dumbest and easiest approach because I did not care about efficiency. This was a bit troublesome later when the VPS ran out of space but … shrug … I cleaned up and resumed without changes. Usually it’s ok to be lazy. :)

So yeah, data storage: A SQLite database with two tables:

CREATE TABLE visitors_per_url (
    timestamp TEXT,    -- 2022-01-13 10:58:00
    visitors INTEGER,  -- 13
    url TEXT           -- /wissen/zufriedenheit-stadt-land-1.5504425
);

CREATE TABLE visitors_total (
    timestamp TEXT,    -- 2022-01-13 10:58:00
    visitors INTEGER   -- 13
);

Can you spot the horrible bloating issue? Yeah, when the (same) URLs are stored again and again for each row, that gets big very quickly. Well, I was too lazy to write something smarter and more “relational”. Like this it is only marginally better than a CSV file (I used indexes on all the fields as I was playing around…). Hope you can relate. :o)

Scraping

#!/usr/bin/env python3

from datetime import datetime
from lxml import html
import requests
import sqlite3
import os

# # TODO
# - store URLs in a separate table and reference them by id, this will significantly reduce size of the db :o)
#     - more complicated insertion queries though so ¯\\\_(ツ)\_/¯

# The site updates every 2 minutes, so a job should run every 2 minutes.

# # Create database if not exists
sql_initialise = """
CREATE TABLE visitors_per_url (timestamp TEXT, visitors INTEGER, url TEXT);
CREATE TABLE visitors_total (timestamp TEXT, visitors INTEGER);
CREATE INDEX idx_visitors_per_url_timestamp ON visitors_per_url(timestamp);
CREATE INDEX idx_visitors_per_url_url ON visitors_per_url(url);
CREATE INDEX idx_visitors_per_url_timestamp_url ON visitors_per_url(timestamp, url);
CREATE INDEX idx_visitors_total_timestamp ON visitors_total(timestamp);
CREATE INDEX idx_visitors_per_url_timestamp_date ON visitors_per_url(date(timestamp));
"""

if not os.path.isfile("sz.db"):
    conn = sqlite3.connect('sz.db')
    with conn:
        c = conn.cursor()
        c.executescript(sql_initialise)
    conn.close()

# # Current time
# we don't know how long fetching the page will take nor do we
# need any kind of super accurate timestamps in the first place
# so let's truncate to full minutes
# WARNING: this *floors*, you might get visitor counts for stories
# that were released almost a minute later! timetravel wooooo!
now = datetime.now()
now = now.replace(second=0, microsecond=0)
print(now)

# # Get the webpage with the numbers
page = requests.get('https://www.sueddeutsche.de/news/activevisits')
tree = html.fromstring(page.content)
entries = tree.xpath('//div[@class="entrylist__entry"]')

# # Extract visitor counts and insert them to the database
# Nothing smart, fixed paths and indexes. If it fails, we will know the code needs updating to a new structure.

total_count = entries[0].xpath('span[@class="entrylist__count"]')[0].text
print(total_count)

visitors_per_url = []
for entry in entries[1:]:
    count = entry.xpath('span[@class="entrylist__socialcount"]')[0].text
    url = entry.xpath('div[@class="entrylist__content"]/a[@class="entrylist__link"]')[0].attrib['href']
    url = url.replace("https://www.sueddeutsche.de", "")  # save some bytes...
    visitors_per_url.append((now, count, url))

conn = sqlite3.connect('sz.db')
with conn:
    c = conn.cursor()
    c.execute('INSERT INTO visitors_total VALUES (?,?)', (now, total_count))
    c.executemany('INSERT INTO visitors_per_url VALUES (?,?,?)', visitors_per_url)
conn.close()

This ran every 2 minutes with a cronjob.

Plots

I plotted the data with bokeh, I think because it was easiest to get a color category per URL (… looking at my plotting script, ugh, I am not sure that was the reason).

#!/usr/bin/env python3

import os
import sqlite3
from shutil import copyfile
from datetime import datetime, date

from bokeh.plotting import figure, save, output_file
from bokeh.models import ColumnDataSource

# https://docs.python.org/3/library/sqlite3.html#sqlite3.Connection.row_factory
def dict_factory(cursor, row):
    d = {}
    for idx, col in enumerate(cursor.description):
        d[col[0]] = row[idx]
    return d

today = date.isoformat(datetime.now())

conn = sqlite3.connect('sz.db')
conn.row_factory = dict_factory
with conn:
    c = conn.cursor()
    c.execute(
        """
        SELECT * FROM visitors_per_url
        WHERE visitors > 100
        AND date(timestamp) = date('now');
        """
    )

    ## i am lazy so i group in sql, then parse from strings in python :o)
    #c.execute('SELECT url, group_concat(timestamp) AS timestamps, group_concat(visitors) AS visitors FROM visitors_per_url GROUP BY url;')

    visitors_per_url = c.fetchall()
conn.close()

# https://bokeh.pydata.org/en/latest/docs/user_guide/data.html so that the data is available for hover

data = {
    "timestamps": [datetime.strptime(e["timestamp"], '%Y-%m-%d %H:%M:%S') for e in visitors_per_url],
    "visitors": [e["visitors"] for e in visitors_per_url],
    "urls": [e["url"] for e in visitors_per_url],
    "colors": [f"#{str(hash(e['url']))[1:7]}" for e in visitors_per_url]  # lol!
}

source = ColumnDataSource(data=data)

# https://bokeh.pydata.org/en/latest/docs/gallery/color_scatter.html
# https://bokeh.pydata.org/en/latest/docs/gallery/elements.html for hover

p = figure(
    tools="hover,pan,wheel_zoom,box_zoom,reset",
    active_scroll="wheel_zoom",
    x_axis_type="datetime",
    sizing_mode='stretch_both',
    title=f"Leser pro Artikel auf sueddeutsche.de: {today}"
)

# radius must be huge because of unixtime values maybe?!
p.scatter(
    x="timestamps", y="visitors", source=source,
    size=5, fill_color="colors", fill_alpha=1, line_color=None,
    #legend="urls",
)

# click_policy does not work, hides everything
#p.legend.location = "top_left"
#p.legend.click_policy="hide"  # mute is broken too, nothing happens

p.hover.tooltips = [
    ("timestamp", "@timestamps"),
    ("visitors", "@visitors"),
    ("url", "@urls"),
]

output_file(f"public/plot_{today}.html", title=f"SZ-Leser {today}", mode='inline')
save(p)

os.remove("public/plot.html")  # will fail once :o)
copyfile(f"public/plot_{today}.html", "public/plot.html")

Results and findings

Nothing particularly noteworthy comes to mind. You can see perfectly normal days, you can see a pandemic wrecking havoc, you can see fascists being fascists. I found it interesting to see how you can clearly see when articles were pushed on social media or put on the frontpage (or off).

Data

https://hannes.enjoys.it/stuff/sz-leser.db.noindexes.7z

https://hannes.enjoys.it/stuff/sz-leser.plots.7z

If there is anything broken or missing, that’s how it is. I did not do any double checks just now. :}

Transparenzportal Hamburg API: Alle Daten herunterladen

https://suche.transparenz.hamburg.de.EXAMPLE.COM/api/action/resource_search?query=url: (“.EXAMPLE.COM” entfernen) liefert aktuell rund 200 Megabyte an JSON, da sollten alle Resourcen drin stecken oder zumindest die, die tatsächlich einen Datensatz referenzieren

Um es in normalen Editoren besser handlebar zu machen, hilft json_pp:

cat resource_search\?query=url\: | json_pp > url\:.json_pp

Und die URLs bekommt man (wie in meinem alten Post schon) mit

grep '"url"' url\:.json_pp | grep -Eo 'http.*"' | sed 's#"$##' > urls

Oder die Gesamtgröße via paste | bc:

$ grep '"file_size"' url\:.json_pp | grep -Po "\d+" | paste -s -d+ - | bc
3751796542539

Rund 4 Terabyte? Schauen wir mal.

FOSSGIS-Jeopardy: The buzzer

First in a series of very terse posts explaining and documenting the technical setup behind our remote video conferencing FOSSGIS-Jeopardy game setup. Mostly meant as public backup and because sharing is caring. Maybe it can inspire someone else to build silly stupid stuff for fun.

Here we are with the buzzer for the candidates to hit if they want to solve a task (aka ask the question to the presented thing). I wrote this for a funny digital xmas party with the lovely colleagues at Civity in 2020 and re-used it for FOSSGIS-Jeopardy 2021 and 2022.

It’s using WAMP (basically PubSub via websockets) between webbrowser pages via the awesome Crossbar + AutobahnJS combo which I first used for the crazy https://findingplaces.hamburg/ project.

Yes, there is a random partyparrot on each run.

A admin page can be used to unlock a buzzer “button” (well, a parrot GIF in our case because we need more silly fun in life) on participants’ buzzer pages. Participants can then touch/click their screen to “buzz”. The incoming buzzes are displayed visible for everyone with the delay since the buzzer was unlocked.

This is obviously highly dependent on the participants’ latency, both on their device (touch/click to network message) as well as from their device to the message router. A tab on a desktop browser in LAN will win against a mobile device in 2G if they’d hit it in the same moment. ¯\_(ツ)_/¯

There are two/three components: The message router, the webpages and optionally a webserver (I used crossbar as router which can also host static webpages)

Ask me anything if you want to set this up following the amazing instructions and fail.

Install and setup crossbar

pip install crossbar

https://crossbar.io/docs/TLS-Certificates/#using-lets-encrypt-with-crossbar-io & https://certbot.eff.org/instructions?ws=other&os=ubuntufocal

Set up a realm and static web directory in a config.json, e. g. something like this:

{
	"version": 2,
	"controller": {},
	"workers": [{

		"type": "router",
		"realms": [{
			"name": "YOURREALMHERE",
			"roles": [{
				"name": "public",
				"permissions": [{
					"uri": "com.example.your(sub)domainhere.*",
					"allow": {
						"call": false,
						"register": false,
						"publish": true,
						"subscribe": true
					}
				}]
			}]
		}],
		"transports": [{
			"type": "web",
			"endpoint": {
				"type": "tcp",
				"port": 443,
				"tls": {
					YOURTLSSETUPHERE
				}
			},
			"paths": {
				"/": {
					"type": "static",
					"directory": "web"
				},
				"ws": {
					"type": "websocket",
					"auth": {
						"anonymous": {
							"type": "static",
							"role": "public"
						}
					},
					"options": {
					        "max_frame_size": 20480,
					        "max_message_size": 20480,
					        "fail_by_drop": true
					}
				}
			}
		}]
	}]
}

I don’t remember anything about the options part. It might not be necessary. I probably tried to stuff something more complex into the messages at some point.

Then just run crossbar as root because you used a single-purpose 2€ VPS for this and you will delete it after the event anyways. Yolo! You should definitely secure your system, crossbar config and certificate and everything else properly otherwise. Seriously!

crossbar start --config /root/config.json

Webpages for control and playing

Here you go:

This is really really horrible and messy code, with lots of left-over bits, bugs and random snippets. But it works so who cares! I could not care less about its bEAuTy. Sometimes a banana is the right hammer.

  • Screen stays on (support depends on the OS)!
  • Fullscreen mode can be triggered with a button!
  • There is a sound when the buzzer is buzzed by anyone!
  • Animations are CSS transforms on PNG sprites!
  • Random emojis are displayed if a user did not set a username!
  • Inconsistent indentation!
  • Probably some german words here and there!
  • w3schools was extensively used to create this!

Put them in a web/ directory according to the crossbar config (or host it with another webserver if you like).

Using the buzzer

Now you have three webpages available:

Waggawaggawaggawagga animated ducks in QGIS

I used ne_110m_admin_0_countries.

Rendering updates for the layer at 0.1 seconds.

Geometry Generator for a Point for the marker location via line_interpolate_point:

with_variable(
  'biggest_geom',
  geometry_n(order_parts($geometry, 'area($geometry)', ascending:=False), 1),
  line_interpolate_point(
    boundary(@biggest_geom), 
    perimeter(@biggest_geom)*(round(epoch(now())/100)%100/100)
  )
)

Raster Image Marker with https://opengameart.org/content/character-spritesheet-duck, vertical anchor at bottom, sprite choice between walking and running (doesn’t actually work) plus the frame via

with_variable(
  'biggest_geom',
  geometry_n(order_parts($geometry, 'area($geometry)', ascending:=False), 1),
  '/your/path/Duck/Sprites/Walking-Running/'
  || if(perimeter(@biggest_geom) < 10, 'Walking', 'Running')
  || ' 00'
  || to_string(round(epoch(now())/200)%2+1)
  || '.png'
)

Rotation did not work, I tried line_interpolate_angle:

with_variable(
  'biggest_geom',
  geometry_n(order_parts($geometry, 'area($geometry)', ascending:=False), 1),
  line_interpolate_angle(
    boundary(@biggest_geom), 
    perimeter(@biggest_geom)*(round(epoch(now())/100)%100/100)
  )
)

Steps via two more Geometry Generators, both for Lines using line_substring and some nice style (inspired by the wonderful built-in cat trail preset):

with_variable(
	'biggest_geom',
	geometry_n(order_parts($geometry, 'area($geometry)', ascending:=False), 1),
  	line_substring(
	  boundary(@biggest_geom), 
	  0,
	  perimeter(@biggest_geom)*(round(epoch(now())/100)%100/100)
	)
)

Could be improved if (for example) Raster Image Marker would support:

  • Choice of resampling algorithm
  • Flipping
  • Rotation would work, no idea what’s wrong with my expression, it works with random values, so …
  • Whatever is broken with the choice between ‘Walking’ and ‘Running’ in the file path expression