Global Data Compression Competition 2020

participants from all over the world

universal and image compressors

About GDCC 2020

This competition focuses on the advantages of algorithms and their implementations for universal lossless data compression rather than for certain data types. We test compressors under the following scenarios:

Test 1:
Qualitative-data compression

Text quite different from English:
Language (e.g. use Chineese filtered wikipedia)

Test 2:
Quantitative-data compression

The test set for this year contains images, most of which are photographic.

Test 3:
Mixed-data compression

This year's focus is on slightly preprocessed executable files (we removed incompressible chunks).

Test 4:
Small-block-data compression

We use small blocks of textual and mixed data to evaluate how compressors behave when the data size is severely limited, such as in block-storage systems.

categories

We impose speed limits to separate each of these four tests into three subcategories: rapid compression, balanced compression and high compression ratio (HCR). All told, the result is 12 categories and leaderboards, each with its own prizes.

2020 prize winners

Qualitative data

Quantitative data

Mixed data

Small-block-data

1 place
2 place
3 place
Rapid
Peter Thamm, pglz
Konstantinos Agiannis, agiannis_text
Frederic Langlet, k5
Balanced
Peter Thamm, pgcm
Mathieu Chartier, MCM
Dmitry Shkarin, DURILCA'light
high compression ratio
Dmitry Shkarin, DURILCA
Peter Thamm, sgcm
Mathieu Chartier, MCM
1 place
2 place
3 place
Rapid
Andreas Debski, Kvick
Peter Thamm, pglz
Konstantinos Agiannis, agiannis_image
Balanced
Marcio Pais, LEA
Dmitry Shkarin, BMF
Andreas Debski, Kvick
high compression ratio
Marcio Pais, EMMA
Dmitry Shkarin, BMF
Marcio Pais, LEA
1 place
2 place
3 place
Rapid
Peter Thamm, pglz
Sebastian.LUPANE, LZNV
Frederic Langlet, k5
Balanced
Peter Thamm, pgcm
Marcio Pais, KATY
Mathieu Chartier, MCM
high compression ratio
Marcio Pais, LILY
Peter Thamm, sgcm
Dmitry Shkarin, DURILCA'base
1 place
2 place
3 place
Rapid
Peter Thamm, pglz
Frederic Langlet, k5
Ilya Muravyov, ULZ
Balanced
Peter Thamm, pgcm
Marcio Pais, LUNA
Dmitry Shkarin, PPMd
high compression ratio
Dmitry Shkarin, PPMonstr
Marcio Pais, NINO
Peter Thamm, sgcm

Board of experts of GDCC 2020

photo

Alexander Rhatushnyak

A PhD developing data-compression algorithms since the 1990s. Coauthor of a book and patents on data compression, co-creator of the JPEG-XL standard, and multiple-time winner of the Hutter Prize and Calgary Corpus Compression Challenge—the only ongoing competitions (before ours) in lossless data compression.

photo

Eugene Shelwien

Developer of recompression algorithms for Deflate, JPEG, MP3, AAC, proprietary audio codecs and the .pa compression format. Administrator of Encode.su, the biggest international forum covering data-compression algorithms and software.

photo

Dmitriy Vatolin

A PhD, video-codec developer and coauthor of a book on data compression. Supervisor of collaborative video- and image-processing research projects that include Broadcom, Huawei, Intel, RealNetworks, Samsung and other leading companies. Instructs courses on methods of 3D and 2D video and image processing and compression.

photo

Alexander Rhatushnyak

A PhD developing data-compression algorithms since the 1990s. Coauthor of a book and patents on data compression, co-creator of the JPEG-XL standard, and multiple-time winner of the Hutter Prize and Calgary Corpus Compression Challenge—the only ongoing competitions (before ours) in lossless data compression.

photo

Eugene Shelwien

Developer of recompression algorithms for Deflate, JPEG, MP3, AAC, proprietary audio codecs and the .pa compression format. Administrator of Encode.su, the biggest international forum covering data-compression algorithms and software.

photo

Dmitriy Vatolin

A PhD, video-codec developer and coauthor of a book on data compression. Supervisor of collaborative video- and image-processing research projects that include Broadcom, Huawei, Intel, RealNetworks, Samsung and other leading companies. Instructs courses on methods of 3D and 2D video and image processing and compression.

2020 leaderboards

Test 1
Rapid
  • Rapid
  • Balanced
  • High compression ratio
Private

General notes

ranking

Table Additional Notes

Notes on Compressors

  • The leaderboard tables below contain results for contest submissions and selected publicly available compressors. The names of submitted compressors appear in boldface.
  • See “Ranking” for rules governing how we order the results.
  • When possible, we set compressor options to use just one thread for publicly available compressors. Some programs, however, may (and did) use multiple threads. Because we declined to fine-tune presets to fit the speed limits as tightly as possible, the compressors are not aligned by speed. Therefore, these results SHOULD NOT be used to draw conclusions about publicly available compressors such as “compressor X is better than compressor Y.”
  • HCR stands for “High Compression Ratio”.

For the “balanced” and “high compression ratio” categories we rank compressors according to the following metric:

c_full_size = compressed-data size + compressed-decompressor size

First place goes to the compressor with the smallest c_full_size.

We compress decompressors using bzip2 v.1.0.8 with the “-9” setting.

For the rapid categories we rank according to the function:

f = c_time + 2·d_time + 1/10⁶·c_full_size,

where c_time and d_time are, respectively, the compression and decompression times in seconds, and c_full_size is in bytes.

First place goes to the compressor with the smallest value for f.

The compressors that fell just short of a given speed category appear at the bottom of the corresponding table. Submissions that failed to fully comply with the rules (in particular, the rule that every compressor must correctly decode the compressed files for all four tests) are also at the bottom.

Most compressors in the table underwent testing on a machine running Windows 10 64-bit (except agiannis_image, agiannis_text, archbox, BWIC, guess, k5, Orz, pgcm, pglz, sgcm, TBCM which underwent testing on a machine running Ubuntu 18.04), with the machine configuration described in Test Hardware.

  • lzuf2 appears unranked in the table because it failed to correctly process all four test sets as the rules require
  • mcm 0.84 froze while decoding Test 3 data for both the -t11 and -x11 presets
  • nanozip 0.09 with the -cc -m26g -p1 -t1 -nm preset failed to correctly decode Test 3 data
  • Zstd was modified for Test 4 to comply with our API: it employed the functions ZSTD_createCCtx, ZSTD_compressCCtx, ZSTD_createDCtx and ZSTD_decompressDCtx from the zstd API; it was compiled using x86_64-w64-mingw32-gcc; and the ZSTD_compressCCtx function took the number from the preset column as an argument
  • zlib was modified for Test 4 to comply with our API: it employed the functions compress2 and uncompress from the zlib API, it was compiled using x86_64-w64-mingw32-gcc, and the compress2 function took the number from the preset column as an argument
  • lz4 was modified for Test 4 to comply with our API: it employed the functions LZ4_compress_HC and LZ4_decompress_safe from the lz4 API, it was compiled using x86_64-w64-mingw32-gcc, and the LZ4_compress_HC function took the number from the preset column as an argument
  • ZPAQ was modified for Test 4 to comply with our API: it employed the functions libzpaq::compress and libzpaq::decompress from the ZPAQ API, it was compiled using x86_64-w64-mingw32-g++, and the libzpaq::compress function took the number from the preset column as an argument

Charts for leaderboards

General notes

  • The line joining the markers for different compressors on the scatter plot shows the Pareto frontier. That is, for each such compressor, no other analyzed programs in that category achieve better results for both the selected time and compression parameters.
  • The names of submitted compressors appear in boldface.
  • The names of submitted compressors that failed to fully comply with the competition rules appear in strikethrough.
Test 1, Rapid
  • Test 1, Rapid
  • Test 1, Balanced
  • Test 1, HCR
  • Test 2, Rapid
  • Test 2, Balanced
  • Test 2, HCR
  • Test 3, Rapid
  • Test 3, Balanced
  • Test 3, HCR
  • Test 4, Rapid
  • Test 4, Balanced
  • Test 4, HCR
Full time
  • Full time
  • Compression time
  • Decompression time
c_full_size
  • c_full_size
  • c_full_size, megabytes
  • Compression ratio
  • Compression ratio, bits per byte
  • Compression degree

RANKING OF COMPRESSORS

Participated Compressors

Name Author Version Competition categories Description

Referenced Compressors

Name Version Competition categories