About GDCC 2020
This competition focuses on the advantages of algorithms and their implementations for universal lossless data compression rather than for certain data types. We test compressors under the following scenarios:
Test 1:
Qualitative-data compression
Text quite different from English:
Language (e.g. use Chineese filtered wikipedia)
Test 2:
Quantitative-data compression
The test set for this year contains images, most of which are photographic.
Test 3:
Mixed-data compression
This year's focus is on slightly preprocessed executable files (we removed incompressible chunks).
Test 4:
Small-block-data compression
We use small blocks of textual and mixed data to evaluate how compressors behave when the data size is severely limited, such as in block-storage systems.
categories
We impose speed limits to separate each of these four tests into three subcategories: rapid compression, balanced compression and high compression ratio (HCR). All told, the result is 12 categories and leaderboards, each with its own prizes.
2020 prize winners
Qualitative data
Quantitative data
Mixed data
Small-block-data
Board of experts of GDCC 2020
2020 leaderboards
- Rapid
- Balanced
- High compression ratio
General notes
ranking
Table Additional Notes
Notes on Compressors
- The leaderboard tables below contain results for contest submissions and selected publicly available compressors. The names of submitted compressors appear in boldface.
- See “Ranking” for rules governing how we order the results.
- When possible, we set compressor options to use just one thread for publicly available compressors. Some programs, however, may (and did) use multiple threads. Because we declined to fine-tune presets to fit the speed limits as tightly as possible, the compressors are not aligned by speed. Therefore, these results SHOULD NOT be used to draw conclusions about publicly available compressors such as “compressor X is better than compressor Y.”
- HCR stands for “High Compression Ratio”.
For the “balanced” and “high compression ratio” categories we rank compressors according to the following metric:
c_full_size = compressed-data size + compressed-decompressor size
First place goes to the compressor with the smallest c_full_size.
We compress decompressors using bzip2 v.1.0.8 with the “-9” setting.
For the rapid categories we rank according to the function:
f = c_time + 2·d_time + 1/10⁶·c_full_size,
where c_time and d_time are, respectively, the compression and decompression times in seconds, and c_full_size is in bytes.
First place goes to the compressor with the smallest value for f.
The compressors that fell just short of a given speed category appear at the bottom of the corresponding table. Submissions that failed to fully comply with the rules (in particular, the rule that every compressor must correctly decode the compressed files for all four tests) are also at the bottom.
Most compressors in the table underwent testing on a machine running Windows 10 64-bit (except agiannis_image, agiannis_text, archbox, BWIC, guess, k5, Orz, pgcm, pglz, sgcm, TBCM which underwent testing on a machine running Ubuntu 18.04), with the machine configuration described in Test Hardware.
- lzuf2 appears unranked in the table because it failed to correctly process all four test sets as the rules require
- mcm 0.84 froze while decoding Test 3 data for both the -t11 and -x11 presets
- nanozip 0.09 with the -cc -m26g -p1 -t1 -nm preset failed to correctly decode Test 3 data
- Zstd was modified for Test 4 to comply with our API: it employed the functions ZSTD_createCCtx, ZSTD_compressCCtx, ZSTD_createDCtx and ZSTD_decompressDCtx from the zstd API; it was compiled using x86_64-w64-mingw32-gcc; and the ZSTD_compressCCtx function took the number from the preset column as an argument
- zlib was modified for Test 4 to comply with our API: it employed the functions compress2 and uncompress from the zlib API, it was compiled using x86_64-w64-mingw32-gcc, and the compress2 function took the number from the preset column as an argument
- lz4 was modified for Test 4 to comply with our API: it employed the functions LZ4_compress_HC and LZ4_decompress_safe from the lz4 API, it was compiled using x86_64-w64-mingw32-gcc, and the LZ4_compress_HC function took the number from the preset column as an argument
- ZPAQ was modified for Test 4 to comply with our API: it employed the functions libzpaq::compress and libzpaq::decompress from the ZPAQ API, it was compiled using x86_64-w64-mingw32-g++, and the libzpaq::compress function took the number from the preset column as an argument
Charts for leaderboards
General notes
- The line joining the markers for different compressors on the scatter plot shows the Pareto frontier. That is, for each such compressor, no other analyzed programs in that category achieve better results for both the selected time and compression parameters.
- The names of submitted compressors appear in boldface.
- The names of submitted compressors that failed to fully comply with the competition rules appear in strikethrough.
- Test 1, Rapid
- Test 1, Balanced
- Test 1, HCR
- Test 2, Rapid
- Test 2, Balanced
- Test 2, HCR
- Test 3, Rapid
- Test 3, Balanced
- Test 3, HCR
- Test 4, Rapid
- Test 4, Balanced
- Test 4, HCR
- Full time
- Compression time
- Decompression time
- c_full_size
- c_full_size, megabytes
- Compression ratio
- Compression ratio, bits per byte
- Compression degree
RANKING OF COMPRESSORS
Participated Compressors
Name | Author | Version | Competition categories | Description |
---|
Referenced Compressors
Name | Version | Competition categories |
---|