I ran GDAL 2.4’s gdal_translate (GDAL 2.4.0dev-333b907 or GDAL 2.4.0dev-b19fd35e6f-dirty, I am not sure) on some GeoTIFFs to compare the new ZSTD compression support to DEFLATE in file sizes and time taken.
Hardware was a mostly idle Intel(R) Xeon(R) CPU E3-1245 V2 @ 3.40GHz with fairly old ST33000650NS (Seagate Constellation) harddisks and lots of RAM.
A small input file was DGM1_2x2KM_XYZ_HH_2016-01-04.tif with about 40,000 x 40,000 pixels at around 700 Megabytes.
A big input file was srtmgl1.003.tif with about 1,3000,000 x 400,000 pixels at 87 Gigabytes.
Both input files had been DEFLATE compressed at the default level 6 without using a predictor (that’s what the default DEFLATE level will make them smaller here).
gdal_translate -co NUM_THREADS=ALL_CPUS -co PREDICTOR=2 -co TILED=YES -co BIGTIFF=YES --config GDAL_CACHEMAX 6144
was used all the time.
For DEFLATE -co COMPRESS=DEFLATE -co ZLEVEL=${level}
was used, for ZSTD -co COMPRESS=ZSTD -co ZSTD_LEVEL=${level}
Mind the axes, sometimes I used a logarithmic scale!
Small file
DEFLATE
ZSTD
Big file
DEFLATE
ZSTD
Findings
It has been some weeks since I really looked at the numbers, so I am making the following up spontaneously. Please correct me!
Those numbers in the findings below should be percentages (between the algorithms, to their default values, etc), but my curiosity was satisfied. At the bottom is the data, maybe you can take it to present a nicer evaluation? ;)
ZSTD is powerful and weird. Sometimes subsequent levels might lead to the same result, sometimes a higher level will be fast or bigger. On low levels it is just as fast as DEFLATE or faster with similar or smaller sizes.
A <700 Megabyte version of the small file was accomplished within a minute with DEFLATE (6) or half a minute with ZSTD (5). With ZSTD (17) it got down to <600 Megabyte in ~5 Minutes, while DEFLATE never got anywhere near that.
Similarly for the big file, ZSTD (17) takes it down to 60 Gigabytes but it took almost 14 hours. DEFLATE capped at 65 Gigabytes. The sweet spot for ZSTD was at 10 with 4 hours for 65 Gigabytes (DEFLATE took 11 hours for that).
In the end, it is hard to say what default level ZSTD should take. For the small file level 5 was amazing, being even smaller than and almost twice as fast as the default (9). But for the big file the gains are much more gradual, here level 3 or level 10 stand out. I/O might be to blame?
Yes, the machine was not stressed and I did reproduce those weird ones.
Raw numbers
Small file
Algorithm | Level | Time [s] | Size [Bytes] | Size [MB] | Comment |
---|---|---|---|---|---|
ZSTD | 1 | 18 | 825420196 | 787 | |
ZSTD | 2 | 19 | 783437560 | 747 | |
ZSTD | 3 | 21 | 769517199 | 734 | |
ZSTD | 4 | 25 | 768127094 | 733 | |
ZSTD | 5 | 31 | 714610868 | 682 | |
ZSTD | 6 | 34 | 720153450 | 687 | |
ZSTD | 7 | 40 | 729787784 | 696 | |
ZSTD | 8 | 42 | 729787784 | 696 | |
ZSTD | 9 | 51 | 719396825 | 686 | default |
ZSTD | 10 | 63 | 719394955 | 686 | |
ZSTD | 11 | 80 | 719383624 | 686 | |
ZSTD | 12 | 84 | 712429763 | 679 | |
ZSTD | 13 | 133 | 708790567 | 676 | |
ZSTD | 14 | 158 | 707088444 | 674 | |
ZSTD | 15 | 265 | 706788234 | 674 | |
ZSTD | 16 | 199 | 632481860 | 603 | |
ZSTD | 17 | 287 | 621778612 | 593 | |
ZSTD | 18 | 362 | 614424373 | 586 | |
ZSTD | 19 | 549 | 617071281 | 588 | |
ZSTD | 20 | 834 | 617071281 | 588 | |
ZSTD | 21 | 1422 | 616979884 | 588 | |
DEFLATE | 1 | 25 | 852656871 | 813 | |
DEFLATE | 2 | 26 | 829210959 | 791 | |
DEFLATE | 3 | 32 | 784069125 | 748 | |
DEFLATE | 4 | 31 | 758474345 | 723 | |
DEFLATE | 5 | 39 | 752578464 | 718 | |
DEFLATE | 6 | 62 | 719159371 | 686 | default |
DEFLATE | 7 | 87 | 710755144 | 678 | |
DEFLATE | 8 | 200 | 705440096 | 673 | |
DEFLATE | 9 | 262 | 703038321 | 670 |
Big file
Algorithm | Level | Time [m] | Size [Bytes] | Size [MB] | Comment |
---|---|---|---|---|---|
ZSTD | 1 | 70 | 76132312441 | 72605 | |
ZSTD | 2 | 58 | 75351154492 | 71860 | |
ZSTD | 3 | 63 | 73369706659 | 69971 | |
ZSTD | 4 | 75 | 73343346296 | 69946 | |
ZSTD | 5 | 73 | 72032185603 | 68695 | |
ZSTD | 6 | 91 | 72564406429 | 69203 | |
ZSTD | 7 | 100 | 71138034760 | 67843 | |
ZSTD | 8 | 142 | 71175109524 | 67878 | |
ZSTD | 9 | 175 | 71175109524 | 67878 | default |
ZSTD | 10 | 235 | 69999288435 | 66757 | |
ZSTD | 11 | 406 | 69999282203 | 66757 | |
ZSTD | 12 | 410 | 69123601926 | 65921 | |
ZSTD | 13 | 484 | 69123601926 | 65921 | |
ZSTD | 14 | 502 | 68477183815 | 65305 | |
ZSTD | 15 | 557 | 67494752082 | 64368 | |
ZSTD | 16 | 700 | 67494752082 | 64368 | |
ZSTD | 17 | 820 | 64255634015 | 61279 | |
ZSTD | 18 | 869 | 63595433364 | 60649 | |
ZSTD | 19 | 1224 | 63210562485 | 60282 | |
ZSTD | 20 | 2996 | 63140602703 | 60216 | |
ZSTD | 21 | lolno | |||
DEFLATE | 1 | 73 | 87035905568 | 83004 | |
DEFLATE | 2 | 76 | 85131650648 | 81188 | |
DEFLATE | 3 | 73 | 79499430225 | 75817 | |
DEFLATE | 4 | 77 | 75413492394 | 71920 | |
DEFLATE | 5 | 92 | 76248511117 | 72716 | |
DEFLATE | 6 | 129 | 73901542836 | 70478 | default |
DEFLATE | 7 | 166 | 73120114047 | 69733 | |
DEFLATE | 8 | 407 | 70446588490 | 67183 | |
DEFLATE | 9 | 643 | 70012677124 | 66769 |
Pingback: Johannes Kröger to GeoHipster: “$existing_free_software can do that already.” | GeoHipster
Pingback: Johannes Kröger to GeoHipster: “$existing_free_software can do that already.” – GeoHipster