Understanding Deflate

(jjrscott.com)

39 points | by ingve 3 days ago ago

4 comments

$Dylan16807 29 minutes ago
> That said, even in this simple case, decoding by hand was a pain.
Well doing it by hand at this level is mostly decoding quasi-ASCII by hand, and the compression isn't making it significantly more painful.
If you reorder the static huffman table and byte align things, the encoded data could look more like:
```
  T O B E O R N O T T 0x85 0x09 T 0x88 0x0F 0xFF
```
So short lengths become 0x80 + length and short distances encode as themselves.
This version is pretty easy to decode by hand, and still 16 bytes. I'm arguably cheating by removing the 3 bit header, but even 17 bytes would be good.
Though the encoder deciding to repeat those Ts doesn't make much sense, shouldn't this be compressed to literal("TOBEORNOT") copy(6,9) copy(9,15)?
[-]
$lifthrasiir 40 minutes ago

Not illustrated here, but the vast majority of deflated data uses a dynamic Huffman code which compresses the Huffman tree itself with another (fixed) Huffman tree. This sort of nested compression is in fact moderately popular; JPEG XL for example uses the same technique.