I've been looking at the PNG files produced by the first pass of PNGOUT, and it looks like, when you deflate a file, you start out with something a lot like the Huffman tables used in the "fixed Huffman" deflate blocks.
Have you experimented much with starting with different Huffman tables?
I think I've thought of a way to improve compression on some files by using a different initial Huffman table:
Some PNGs compress better with Huffman-only ("/b-n") compression, because they only have short matches, and using length/distance codes can mess up the Literal frequencies.
For these files, if you work out the Literal frequencies in the file, and assign low frequencies to the Length codes, you could make up the initial Huffman table that way.
The file would compress almost as well as it did on Huffman-only, using a bit of extra space on the Huffman table and a few longer literal code lengths. But then deflate would be able to improve the compression by using length/distance codes as well. So the resulting file might be smaller than with Huffman-only deflate, maybe even after just one pass.
Awesoken at
you start out with something a lot like the Huffman tables used in the "fixed Huffman"
Correct. How did you figure that out?
Have you experimented much with starting with different Huffman tables?
I haven't done much with it. My current strategy works well for most files. To implement something like you suggest, I would either have to use trial & error - which would slow things down a lot, or add another cryptic command line option - which hardly anybody would use anyway.
... assign low frequencies to the Length codes, you could make up the initial Huffman table that way.
I completely re-generate the Huffman tables at the end of each pass. Changing the initial Huffman table usually just makes the file size fall into a different local minima - which acts like a normal distribution with a standard deviation of a few bytes. Of course, there will always be exceptions to the rule.
counting_pine at
Awesoken said
you start out with something a lot like the Huffman tables used in the "fixed Huffman"
Correct. How did you figure that out?
I wrote a very verbose decompressor in QBASIC. It was an extremely valuable debugging tool when I was trying to write a deflate program.
I would either have to use trial & error - which would slow things down a lot, or add another cryptic command line option - which hardly anybody would use anyway.
Seeing as you already have two different methods in PNGOUT, you could add a new switch, say /m, to choose between them. The new method would start out like the Literals-only method:
Pass 1: Start with flat Literals table
[Pass 2: Use Literals table created by the Literal frequencies in the file]
Pass 3: Use table created by Literal frequencies, assigning low frequencies to the Length codes
Pass 4: Generate tables from the last pass and continue as normal
...
Of course, it's your program, and you may not want to add any more features to it. But sometimes Huffman-only is the best way to compress some files, and I think this method could potentially be able to beat it. Who knows, maybe it would be better for other files too.
In any case, I thought I'd post this anyway, at least to see what you thought of the idea.