Adaptive Huffman coding - FGK

Main features

statistic compression method
one-pass compression method
adaptive compression method
symetric compression method
can compress better than static Huffman coding (due to the fact that coding tree needn't to be encoded)
has worse sensitivity to errors (one wrong character can destroy the whole message)

Time and space complexities

Time complexity of algorithm: O(n * log|Σ|)
Space complexity of algorithm: O(|Σ|)

In the worst case (coding tree needs to be changed after each single character in the input message on all levels) is estimated time complexity O(n*log|Σ|), where n is character count in input message and |Σ| is character count in the input alphabet. Then log |Σ| is roughly the depth of coding tree (though the tree needn't to be balanced, this could do as an estimate).

History

Algorithm is based on the classical Huffman coding method. The oldest adaptive algoritm was published by Faller (1973) and later Gallager (1978), independently. At 1985 Knuth made a little modification, and so the algorithm was called FGK.

Description

One definition is needed to fully explain the priciple of the algoritm:
Binary coding tree has a sibling property if each node (except the root) has a sibling and if the nodes can be listed in order of nonincreasing weight with each node adjacent to its sibling.
Gallager proved that a binary prefix code is a Huffman code if and only if the code tree has the sibling property. So, algorithm modifies coding tree each time a new symbol is encoded or decoded and whenever it detects violation of sibling property, the tree is transformed (in order to satisfy the sibling property again).
There are two main possibilities how to build the coding tree at the begining of coding:

The tree is initialized with all symbols of input alphabet - in which case the tree initially consists of all symbols with a chosen propability
The tree is initialized with ZERO node - tree initially consists of a single node ZERO. When the encoder encouters a symbol which has not been read yet, it writes code of node ZERO to the output followed by the read symbol. ZERO node is then split into another ZERO node and a node containing the new symbol.

Pseudo-code


 1:  begin
 2:     create node ZERO
 3:     readSymbol(X)
 4:     while (X!=EOF) do
 5:      begin
 6:        if (first_read_of(X)) then
 7:         begin
 8:           output(ZERO)
 9:           output(X)
10:           create new node U with next nodes ZERO and new node X
11:           update_tree(U);
12:         end
13:        else
14:         begin
15:           output(X)
16:           update_tree(X)
17:         end
18:        readSymbol(X)
19:      end
20:  end
21:          
22:  procedure update_tree(U)
23:  begin
24:     while (U!=root) do
25:      begin
26:        if (exists node U1 with same value and greater order) then 
27:           change U1 and U
28:        increment value of U
29:        U := parent(U)
30:      end
31:     increment value of U, update leaf codes
32:  end

References

Adaptive Huffman coding - links, source codes, implementation. Datacompression.info.
Adaptive Huffman coding. Codepedia, the developers encyclopedia.