Author Archives: Hannes

Looking at the number of “concurrent” readers on the Süddeutsche Zeitung articles

I scraped the numbers of live readers per article published by Süddeutsche Zeitung on their website for more than 3 years, never did anything too interesting with it and just decided to stop. Basically they publish a list of stories and their estimated current concurrent number of readers. Meaning you get a timestamp -> story/URL -> number of current readers. Easy enough and interesting for sure.

Here is how it worked, some results and data for you to build upon. Loads of it is stupid and silly, this is just me dumping it publicly so I can purge it.

Database

For data storage I chose the dumbest and easiest approach because I did not care about efficiency. This was a bit troublesome later when the VPS ran out of space but … shrug … I cleaned up and resumed without changes. Usually it’s ok to be lazy. :)

So yeah, data storage: A SQLite database with two tables:

CREATE TABLE visitors_per_url (
    timestamp TEXT,    -- 2022-01-13 10:58:00
    visitors INTEGER,  -- 13
    url TEXT           -- /wissen/zufriedenheit-stadt-land-1.5504425
);

CREATE TABLE visitors_total (
    timestamp TEXT,    -- 2022-01-13 10:58:00
    visitors INTEGER   -- 13
);

Can you spot the horrible bloating issue? Yeah, when the (same) URLs are stored again and again for each row, that gets big very quickly. Well, I was too lazy to write something smarter and more “relational”. Like this it is only marginally better than a CSV file (I used indexes on all the fields as I was playing around…). Hope you can relate. :o)

Scraping

#!/usr/bin/env python3

from datetime import datetime
from lxml import html
import requests
import sqlite3
import os

# # TODO
# - store URLs in a separate table and reference them by id, this will significantly reduce size of the db :o)
#     - more complicated insertion queries though so ¯\\\_(ツ)\_/¯

# The site updates every 2 minutes, so a job should run every 2 minutes.

# # Create database if not exists
sql_initialise = """
CREATE TABLE visitors_per_url (timestamp TEXT, visitors INTEGER, url TEXT);
CREATE TABLE visitors_total (timestamp TEXT, visitors INTEGER);
CREATE INDEX idx_visitors_per_url_timestamp ON visitors_per_url(timestamp);
CREATE INDEX idx_visitors_per_url_url ON visitors_per_url(url);
CREATE INDEX idx_visitors_per_url_timestamp_url ON visitors_per_url(timestamp, url);
CREATE INDEX idx_visitors_total_timestamp ON visitors_total(timestamp);
CREATE INDEX idx_visitors_per_url_timestamp_date ON visitors_per_url(date(timestamp));
"""

if not os.path.isfile("sz.db"):
    conn = sqlite3.connect('sz.db')
    with conn:
        c = conn.cursor()
        c.executescript(sql_initialise)
    conn.close()

# # Current time
# we don't know how long fetching the page will take nor do we
# need any kind of super accurate timestamps in the first place
# so let's truncate to full minutes
# WARNING: this *floors*, you might get visitor counts for stories
# that were released almost a minute later! timetravel wooooo!
now = datetime.now()
now = now.replace(second=0, microsecond=0)
print(now)

# # Get the webpage with the numbers
page = requests.get('https://www.sueddeutsche.de/news/activevisits')
tree = html.fromstring(page.content)
entries = tree.xpath('//div[@class="entrylist__entry"]')

# # Extract visitor counts and insert them to the database
# Nothing smart, fixed paths and indexes. If it fails, we will know the code needs updating to a new structure.

total_count = entries[0].xpath('span[@class="entrylist__count"]')[0].text
print(total_count)

visitors_per_url = []
for entry in entries[1:]:
    count = entry.xpath('span[@class="entrylist__socialcount"]')[0].text
    url = entry.xpath('div[@class="entrylist__content"]/a[@class="entrylist__link"]')[0].attrib['href']
    url = url.replace("https://www.sueddeutsche.de", "")  # save some bytes...
    visitors_per_url.append((now, count, url))

conn = sqlite3.connect('sz.db')
with conn:
    c = conn.cursor()
    c.execute('INSERT INTO visitors_total VALUES (?,?)', (now, total_count))
    c.executemany('INSERT INTO visitors_per_url VALUES (?,?,?)', visitors_per_url)
conn.close()

This ran every 2 minutes with a cronjob.

Plots

I plotted the data with bokeh, I think because it was easiest to get a color category per URL (… looking at my plotting script, ugh, I am not sure that was the reason).

#!/usr/bin/env python3

import os
import sqlite3
from shutil import copyfile
from datetime import datetime, date

from bokeh.plotting import figure, save, output_file
from bokeh.models import ColumnDataSource

# https://docs.python.org/3/library/sqlite3.html#sqlite3.Connection.row_factory
def dict_factory(cursor, row):
    d = {}
    for idx, col in enumerate(cursor.description):
        d[col[0]] = row[idx]
    return d

today = date.isoformat(datetime.now())

conn = sqlite3.connect('sz.db')
conn.row_factory = dict_factory
with conn:
    c = conn.cursor()
    c.execute(
        """
        SELECT * FROM visitors_per_url
        WHERE visitors > 100
        AND date(timestamp) = date('now');
        """
    )

    ## i am lazy so i group in sql, then parse from strings in python :o)
    #c.execute('SELECT url, group_concat(timestamp) AS timestamps, group_concat(visitors) AS visitors FROM visitors_per_url GROUP BY url;')

    visitors_per_url = c.fetchall()
conn.close()

# https://bokeh.pydata.org/en/latest/docs/user_guide/data.html so that the data is available for hover

data = {
    "timestamps": [datetime.strptime(e["timestamp"], '%Y-%m-%d %H:%M:%S') for e in visitors_per_url],
    "visitors": [e["visitors"] for e in visitors_per_url],
    "urls": [e["url"] for e in visitors_per_url],
    "colors": [f"#{str(hash(e['url']))[1:7]}" for e in visitors_per_url]  # lol!
}

source = ColumnDataSource(data=data)

# https://bokeh.pydata.org/en/latest/docs/gallery/color_scatter.html
# https://bokeh.pydata.org/en/latest/docs/gallery/elements.html for hover

p = figure(
    tools="hover,pan,wheel_zoom,box_zoom,reset",
    active_scroll="wheel_zoom",
    x_axis_type="datetime",
    sizing_mode='stretch_both',
    title=f"Leser pro Artikel auf sueddeutsche.de: {today}"
)

# radius must be huge because of unixtime values maybe?!
p.scatter(
    x="timestamps", y="visitors", source=source,
    size=5, fill_color="colors", fill_alpha=1, line_color=None,
    #legend="urls",
)

# click_policy does not work, hides everything
#p.legend.location = "top_left"
#p.legend.click_policy="hide"  # mute is broken too, nothing happens

p.hover.tooltips = [
    ("timestamp", "@timestamps"),
    ("visitors", "@visitors"),
    ("url", "@urls"),
]

output_file(f"public/plot_{today}.html", title=f"SZ-Leser {today}", mode='inline')
save(p)

os.remove("public/plot.html")  # will fail once :o)
copyfile(f"public/plot_{today}.html", "public/plot.html")

Results and findings

Nothing particularly noteworthy comes to mind. You can see perfectly normal days, you can see a pandemic wrecking havoc, you can see fascists being fascists. I found it interesting to see how you can clearly see when articles were pushed on social media or put on the frontpage (or off).

Data

https://hannes.enjoys.it/stuff/sz-leser.db.noindexes.7z

https://hannes.enjoys.it/stuff/sz-leser.plots.7z

If there is anything broken or missing, that’s how it is. I did not do any double checks just now. :}

Transparenzportal Hamburg API: Alle Daten herunterladen

https://suche.transparenz.hamburg.de.EXAMPLE.COM/api/action/resource_search?query=url: (“.EXAMPLE.COM” entfernen) liefert aktuell rund 200 Megabyte an JSON, da sollten alle Resourcen drin stecken oder zumindest die, die tatsächlich einen Datensatz referenzieren

Um es in normalen Editoren besser handlebar zu machen, hilft json_pp:

cat resource_search\?query=url\: | json_pp > url\:.json_pp

Und die URLs bekommt man (wie in meinem alten Post schon) mit

grep '"url"' url\:.json_pp | grep -Eo 'http.*"' | sed 's#"$##' > urls

Oder die Gesamtgröße via paste | bc:

$ grep '"file_size"' url\:.json_pp | grep -Po "\d+" | paste -s -d+ - | bc
3751796542539

Rund 4 Terabyte? Schauen wir mal.

FOSSGIS-Jeopardy: The buzzer

First in a series of very terse posts explaining and documenting the technical setup behind our remote video conferencing FOSSGIS-Jeopardy game setup. Mostly meant as public backup and because sharing is caring. Maybe it can inspire someone else to build silly stupid stuff for fun.

Here we are with the buzzer for the candidates to hit if they want to solve a task (aka ask the question to the presented thing). I wrote this for a funny digital xmas party with the lovely colleagues at Civity in 2020 and re-used it for FOSSGIS-Jeopardy 2021 and 2022.

It’s using WAMP (basically PubSub via websockets) between webbrowser pages via the awesome Crossbar + AutobahnJS combo which I first used for the crazy https://findingplaces.hamburg/ project.

Yes, there is a random partyparrot on each run.

A admin page can be used to unlock a buzzer “button” (well, a parrot GIF in our case because we need more silly fun in life) on participants’ buzzer pages. Participants can then touch/click their screen to “buzz”. The incoming buzzes are displayed visible for everyone with the delay since the buzzer was unlocked.

This is obviously highly dependent on the participants’ latency, both on their device (touch/click to network message) as well as from their device to the message router. A tab on a desktop browser in LAN will win against a mobile device in 2G if they’d hit it in the same moment. ¯\_(ツ)_/¯

There are two/three components: The message router, the webpages and optionally a webserver (I used crossbar as router which can also host static webpages)

Ask me anything if you want to set this up following the amazing instructions and fail.

Install and setup crossbar

pip install crossbar

https://crossbar.io/docs/TLS-Certificates/#using-lets-encrypt-with-crossbar-io & https://certbot.eff.org/instructions?ws=other&os=ubuntufocal

Set up a realm and static web directory in a config.json, e. g. something like this:

{
	"version": 2,
	"controller": {},
	"workers": [{

		"type": "router",
		"realms": [{
			"name": "YOURREALMHERE",
			"roles": [{
				"name": "public",
				"permissions": [{
					"uri": "com.example.your(sub)domainhere.*",
					"allow": {
						"call": false,
						"register": false,
						"publish": true,
						"subscribe": true
					}
				}]
			}]
		}],
		"transports": [{
			"type": "web",
			"endpoint": {
				"type": "tcp",
				"port": 443,
				"tls": {
					YOURTLSSETUPHERE
				}
			},
			"paths": {
				"/": {
					"type": "static",
					"directory": "web"
				},
				"ws": {
					"type": "websocket",
					"auth": {
						"anonymous": {
							"type": "static",
							"role": "public"
						}
					},
					"options": {
					        "max_frame_size": 20480,
					        "max_message_size": 20480,
					        "fail_by_drop": true
					}
				}
			}
		}]
	}]
}

I don’t remember anything about the options part. It might not be necessary. I probably tried to stuff something more complex into the messages at some point.

Then just run crossbar as root because you used a single-purpose 2€ VPS for this and you will delete it after the event anyways. Yolo! You should definitely secure your system, crossbar config and certificate and everything else properly otherwise. Seriously!

crossbar start --config /root/config.json

Webpages for control and playing

Here you go:

This is really really horrible and messy code, with lots of left-over bits, bugs and random snippets. But it works so who cares! I could not care less about its bEAuTy. Sometimes a banana is the right hammer.

  • Screen stays on (support depends on the OS)!
  • Fullscreen mode can be triggered with a button!
  • There is a sound when the buzzer is buzzed by anyone!
  • Animations are CSS transforms on PNG sprites!
  • Random emojis are displayed if a user did not set a username!
  • Inconsistent indentation!
  • Probably some german words here and there!
  • w3schools was extensively used to create this!

Put them in a web/ directory according to the crossbar config (or host it with another webserver if you like).

Using the buzzer

Now you have three webpages available:

Waggawaggawaggawagga animated ducks in QGIS

I used ne_110m_admin_0_countries.

Rendering updates for the layer at 0.1 seconds.

Geometry Generator for a Point for the marker location via line_interpolate_point:

with_variable(
  'biggest_geom',
  geometry_n(order_parts($geometry, 'area($geometry)', ascending:=False), 1),
  line_interpolate_point(
    boundary(@biggest_geom), 
    perimeter(@biggest_geom)*(round(epoch(now())/100)%100/100)
  )
)

Raster Image Marker with https://opengameart.org/content/character-spritesheet-duck, vertical anchor at bottom, sprite choice between walking and running (doesn’t actually work) plus the frame via

with_variable(
  'biggest_geom',
  geometry_n(order_parts($geometry, 'area($geometry)', ascending:=False), 1),
  '/your/path/Duck/Sprites/Walking-Running/'
  || if(perimeter(@biggest_geom) < 10, 'Walking', 'Running')
  || ' 00'
  || to_string(round(epoch(now())/200)%2+1)
  || '.png'
)

Rotation did not work, I tried line_interpolate_angle:

with_variable(
  'biggest_geom',
  geometry_n(order_parts($geometry, 'area($geometry)', ascending:=False), 1),
  line_interpolate_angle(
    boundary(@biggest_geom), 
    perimeter(@biggest_geom)*(round(epoch(now())/100)%100/100)
  )
)

Steps via two more Geometry Generators, both for Lines using line_substring and some nice style (inspired by the wonderful built-in cat trail preset):

with_variable(
	'biggest_geom',
	geometry_n(order_parts($geometry, 'area($geometry)', ascending:=False), 1),
  	line_substring(
	  boundary(@biggest_geom), 
	  0,
	  perimeter(@biggest_geom)*(round(epoch(now())/100)%100/100)
	)
)

Could be improved if (for example) Raster Image Marker would support:

  • Choice of resampling algorithm
  • Flipping
  • Rotation would work, no idea what’s wrong with my expression, it works with random values, so …
  • Whatever is broken with the choice between ‘Walking’ and ‘Running’ in the file path expression

Von Wegen Lisbeth – B*tch als Slitscan

Ausschnitt eines resultierenden Bildes

Das Lied Von Wegen Lisbeth – Bitch hat ein verrücktes Musikvideo, wo sich die Kamera in einer chaotischen Szene scheinbar um sich selbst dreht, während im Hintergrund jemand von links nach rechts läuft.

Ich hab da mal was beklopptes mit gemacht…: Eine Transformation des Videos im Slitscan-Stil. Angefangen 2019, und jetzt endlich mal richtig gerendert.

Kochrezept

  • Extrahiere die Videoframes: 5621 Frames mit einer Auflösung von 1920×1080 Pixeln
  • Schneide jeden der Frames in einzelne 1×1080 große Streifen: 5621x der linkste Streifen, 5621x der zweitlinkste Streifen und so weiter.
  • Setze je ein Bild je Streifen zusammen, als erstes ganz links den linksten Streifen des ersten Frames, direkt daneben den linksten Streifen vom zweiten Frame, dann den vom dritten, etc.: 1920 Bilder mit einer Auflösung von 5621×1080 Pixeln
Hier sieht man, wie sich eins der Bilder aus den einzelnen Pixelstreifen der Frames zusammensetzt
  • Setze die resultierenden Bilder als Video zusammen: Video in 5621×1080 Auflösung und 1920 Frames

Ich hab’s mit ffmpeg, graphicsmagick und einem Bash-Skript gemacht. Super ineffizient, aber sehr effektiv ;)

Resultat

Hier in klein, der Link unten führt zur “Vollversion”. Es wabbelt alles bisschen rum, weil die Kamera nicht in gleichbleibender Geschwindigkeit bewegt wurde, oder so ähnlich.

Ein so großes Video kann man nirgendwo einfach hosten oder vernünftig ansehen (oder?), praktischerweise gibt es das Gigapan Time Machine-Projekt, was auch Google für ihre Timelapse-Funktion nutzt. Damit kann man die Bilder nehmen und wie eine animierte Webkarte ansehbar machen, mit Zoom und Pan und einigermaßen effizientem Datenzugriff. Leider ist das Tool veraltet, nicht mehr in Entwicklung und macht es praktisch unmöglich die Videoparameter selbst zu tweaken.

Volle Auflösung: https://hannes.enjoys.it/stuff/lisbeth-slitscan/view.html

Achtung, ggf. hoher Datenverbrauch

PS: (Ich hab das Resultat vertikal gestaucht, damit die Proportionen besser passen. Und es läuft rückwärts, weil damit die Zeit der meisten Bewegungen passt. Ziemlicher Brainfuck.)

Hä?!

Joa, schwer zu verstehen, was hier passiert. Ein paar schicke Diagramme würden helfen (welcher Pixelstreifen geht wohin und was wurde wie zusammengesetzt), aber das ist mir zu zeitaufwendig. Daher nur ein paar Denkanstöße, vielleicht helfen die:

Das originale Video hat 5621 Frames in 1920×1080.
Dieses Video hat 1920 Frames in 5621×1080.

Die Dame mit dem Tuch am Anfang ist etwa bis Frame 750 im originalen Video sichtbar.
Im transformierten Video ist sie nie weiter als etwa 750 Pixel vom linken Bildrand entfernt.

Als 3D Volumen/Punktwolke

Weil warum auch nicht hab ich das ganze noch als dreidimensionales Etwas aufbereitet. Jedes Frame des Videos besteht aus 1920×1080 farbigen Pixeln, wenn man alle Frames hintereinander reiht, entsteht ein Kubus 1920x1080x5621 Pixeln. Und das ist im Grunde nichts anderes als eine sehr regelmäßige und “voluminöse” Punktwolke, also kann man entsprechende Tools missbrauchen. Ich hab die Daten mal in das fantastische Potree gesteckt:

Nein ich hatte die vertikale Achse nicht vertauscht, der Mauszeiger ist bei mir immer so!!!!!1

Es sind 11 Milliarden Punkte und mit 40 Gigabyte etwas zu groß, als dass ich es hosten möchte. Wer es nachbauen will:

for file in slices_reversed/*.png  # 0001.png, 0002.png, ...
do
	n=$(echo "${file}" | grep -Eo '[0-9]+' | sed 's#^0*##')  # frame number from filename
	convert "${file}" sparse-color: | tr " " "\n" | \
        sed 's/srgb//g' | sed 's/[()]//g' | \
        awk -F ',' -v OFS=',' -v n="$n" '{ print $1,-n-1,$2,$3,$4,$5 }' \
        # image to csv, insert frame number-1 so we get 0, flip y \
        # TODO flip y seems to have been buggy :(
done > images.xyzrgb.csv

Das 250GB-CSV wiederum mit split nach Belieben splitten, da txt2las bei mir irgendwie nicht mehr als 2^64 Punkte aufeinmal vertragen wollte und dann die Teile

txt2las -parse xyzRGB -i "${file}" -set_scale 1 1 1 -o "${file}.laz" && lasinfo -repair_bb "${file}.laz"

Anschließend die .laz-Dateien mit PotreeConverter zum Ansehen aufbereiten (Achtung, mit dem default Poisson-Sampler fliegt es immer bei 76% auseinander, wohl wegen der verrückt regulären Struktur der Punktwolke):

PotreeConverter ${chunk}.laz -o ${chunk}_potree --encoding BROTLI -p ${chunk} -m random

Die fertigen Teile dann in einen Viewer stecken, z. B. in das multiple_pointclouds.html-Beispiel und Spaß haben.

Disclaimer

Dies ist ein Kunstprojekt und ich hoffe damit es ist von etwas wie der Kunstfreiheit gedeckt. Falls nicht: Eine kurze Mail (hannes ät enjoys dotter it) und ich lösche es von der Seite.

❤️ Von Wegen Lisbeth!

The effect of lossy LERC compression, visually “explained”

LERC is a kick-ass approach to 2D raster data compression, supported in GDAL since version 3.3. You can use it for lossless compression but it is also able to throw away some bits of information for smaller data sizes. You tell it which level of Z error is acceptable for your values and it will use that freedom to change the values of neighboring cells to do its magic. Z means the “data” axis here, of a single-band in a 2D raster, X and Y are the coordinates, or rather the locations of the data values in the raster, which are obviously not changed.

I thought it would be nice to show what it actually results in, you can read up on the details elsewhere if you want. Look at the images in full resolution please.

I used a global SRTM DEM with a Z value in full meters (no floating point values but integers) and applied LERC on it in three ways: Lossless, with a maximum Z error of 1 meter and a maximum Z error of 10 meters. Zstandard compression was always used.

The original GeoTIFF file was already very well compressed with Zstandard level 15 and a horizontal predictor; at ~1296000 x ~417600 pixels it has a size of 86 gigabytes including overviews.

  • Original (ZSTD level 15): 86 GB
  • LERC_ZSTD (lossless): 105G
  • LERC_ZSTD (maximum Z error of 1): 81G
  • LERC_ZSTD (maximum Z error of 10): 21G

Cool, so if we don’t care about an error of 10 meters, we can have a global DEM (well, as global as SRTM is with its 60° cut-off) at ~30 meters pixel resolution in 21 gigabytes. But what does that actually look like then and how will this error appear? Well, check it out:

Here are some samples visualised with a greyscale color ramp (locally adjusted, so the lowest value in the image is black, the highest value is white). They are shown at a 1:1 resolution, one pixel in the image (if you look at it at 100%) is one cell of the DEM data. The left image is lossless, the middle one was allowed an Z error of 1 meter, the right one 10 meters.

Mountainous, here the values range from 0 meters to about 2000 meters:

You can hardly see a difference, at least visually.

“Mediumish”, values between ~100 and ~500 meters:

At the 10 meter error level you can see a significant terracing effect.

Plains, values all around 100 meters:

You can see some structures collapsing into flat areas in the 1 meter version and oh wow that 10 meters version looks like upscaled pixels.

Time to zoom in! I picked a less flat area again because it makes it easier to understand. Here the values are between ~100 and ~300 meters:

So what do we see here? Neighboring cells with the same values compress better so LERC is shifting the values around (within the allowed error), creating terraces of same-valued cells. If you look closely you can see that there is also a visible pattern of squarish structures. Those are the blocks or windows in which LERC looks at the data and does its adjustments, in this case they were 8×8 pixels wide. Note: What LERC does exactly is a bit more complex than “try to make neighboring values the same”, it actually looks at the bits required to store the values within a block and optimized that within the error tolerance.

And now you know what LERC can do, if you give it a error level to play with.

For reference, here is that same-ish area with the error tolerance at 1 meter:

You have to zoom in quite a bit more to be able to see the effects here due to the nature of the data in this extent in combination with the particular error tolerance:

The larger the zonal differences of your Z values are to each other and in relation to the error tolerance, the less distinguished will this effect be. If there are steps of 100 meters between neighboring pixels, an extra error of 10 meters won’t do much of a difference. But in more flat areas it will have significant “terracing” effects as you could see above. This is similar to “banding” effects in images where there is little variation in color, e. g. a blue sky or an artificial color gradient, and you look at it in a setup that has a color bit depth on a resolution your human eyes can distinguish.

So if you want to use LERC with a lossy approach, think hard about what is going to happen with your data later. What kind of analysis will be performed, how will it be “looked” at, what will be calculated. Do it smart and you can have a predictable/controllable lossy compression with seriously small file sizes, do it without thinking and your data will lead to misinterpretation and apocalypse.

How to build your own aerial image Twitter bot

https://gitlab.com/Hannes42/twitter-image-bot

About a year ago I built a small Twitter bot that posts an aerial image of Hamburg, Germany every day.

As I will shut it down (no reason, just decluttering) here’s how it works so you can run your own:

  1. Have aerial images with a permissive license
  2. Register a Twitter account and set it up for posting via API access
  3. Write code to pick an image and post it

Done!

Okok, I am kidding.

I used https://suche.transparenz.hamburg.de/dataset/digitale-orthophotos-belaubt-hamburg3 as data source because they have a CC-BY like license, allowing such a project without any legal complications. As a side benefit these images are already provided in tiles: There is not one big image of the whole region but 1km² image tiles. Perfect for random selection, quick resizing and posting.

Registering a Twitter account and setting it up is a privacy nightmare and, being a good human being as you are, you ought hate any part of it. I followed the Twython tutorial for OAuth1 in a live Python interpreter session which was fairly painless. If you do not want to link a phone number to your Twitter account (see above for how you should feel about that), this approach worked for me in the past but I bet they use regional profiling or worse so your luck might differ.

Next you need some code to do the work for you automatically. Here is what I wrote with Pillow==7.0.0 for the image processing and twython==3.8.2 for posting:

import glob
import random
from io import BytesIO

from PIL import Image
from twython import Twython

## Initialise Twitter
# use a Python interpreter to follow the stops on 
# https://twython.readthedocs.io/en/latest/usage/starting_out.html#oauth-1-user-authentication 
# don't rush, there are some intermediate keys iirc...
APP_KEY = '1234567890ABCDEFGHIJKLMNO'
APP_SECRET = '12345678901234567890123456F8U0CAKATAWAIATATAEARAAA'
OAUTH_TOKEN = '1234567890123456789123456F8U0CAKATAWAIATATAEARAAAA'
OAUTH_TOKEN_SECRET = '123456789012345678901234567890FFSFFSFFSFFSFFS'
twitter = Twython(APP_KEY, APP_SECRET, OAUTH_TOKEN, OAUTH_TOKEN_SECRET)

## Prepare the image
# via https://twython.readthedocs.io/en/latest/usage/advanced_usage.html#posting-a-status-with-an-editing-image

# Pick the image
jpegs = glob.glob("DOP20_HH_sommerbefliegung_2019.zip/*.jpg")
todays_image = random.choice(jpegs)
print(f"Today we will post {todays_image}")
photo = Image.open(todays_image)

# Resize the image
basewidth = 1000
wpercent = (basewidth / float(photo.size[0]))
height = int((float(photo.size[1]) * float(wpercent)))
resized_photo = photo.resize((basewidth, height), Image.ANTIALIAS)

# "Save" the resulting image in temporary memory
stream = BytesIO()
resized_photo.save(stream, format='JPEG')
stream.seek(0)

## We've got what we need, let's tweet
license = "dl-de/by-2-0 (Freie und Hansestadt Hamburg, Landesbetrieb Geoinformation und Vermessung)"
tweet = f"Das ist #IrgendwoInHH, aber wo denn nur?\n\n#codeforhamburg\n\nBild: {license}"

response = twitter.upload_media(media=stream)
twitter.update_status(status=tweet, media_ids=[response['media_id']])

First we initialize that Twython thingie, then we pick a random image from a directory of .jpg files, then we resize it to a maximum of 1000 pixels (you can simplify that if your images are square…) and finally we post it to Twitter.

Set up a cronjob or systemd timer or alarm clock to run the script as often as you like and that’s it.