# Adaptive Huffman coding - FGK

## Main features

- statistic compression method
- one-pass compression method
- adaptive compression method
- symetric compression method
- can compress better than static Huffman coding (due to the fact that coding tree needn't to be encoded)
- has worse sensitivity to errors (one wrong character can destroy the whole message)

### Time and space complexities

**Time complexity of algorithm**: O(n * log|Σ|)

**Space complexity of algorithm**: O(|Σ|)

In the worst case (coding tree needs to be changed after each single character in the input message on all levels)
is estimated time complexity *O(n*log|Σ|)*, where *n* is character count in input message and *|Σ|* is
character count in the input alphabet. Then *log |Σ|* is roughly the depth of coding tree (though the tree needn't to be balanced, this could do
as an estimate).

## History

Algorithm is based on the classical Huffman coding method. The oldest adaptive algoritm was published by Faller (1973) and later Gallager (1978), independently. At 1985 Knuth made a little modification, and so the algorithm was called FGK.

## Description

One definition is needed to fully explain the priciple of the algoritm:

Binary coding tree has a *sibling property* if each node (except the root) has
a sibling and if the nodes can be listed in order of nonincreasing weight with each node adjacent to its sibling.

Gallager proved that a binary prefix code is a Huffman code if and only if the code tree has the sibling property. So, algorithm
modifies coding tree each time a new symbol is encoded or decoded and whenever it detects violation of sibling property, the tree is
transformed (in order to satisfy the sibling property again).

There are two main possibilities how to build the coding tree at the begining of coding:

**The tree is initialized with all symbols of input alphabet**- in which case the tree initially consists of all symbols with a chosen propability**The tree is initialized with ZERO node**- tree initially consists of a single node ZERO. When the encoder encouters a symbol which has not been read yet, it writes code of node ZERO to the output followed by the read symbol. ZERO node is then split into another ZERO node and a node containing the new symbol.

## Pseudo-code

```
1:
```**begin**
2: create node *ZERO*
3: **readSymbol**(*X*)
4: **while** (*X*!=*EOF*) **do**
5: **begin**
6: **if** (first_read_of(*X*)) **then**
7: **begin**
8: **output**(*ZERO*)
9: **output**(*X*)
10: create new node *U* with next nodes *ZERO* and new node *X*
11: update_tree(*U*);
12: **end**
13: **else**
14: **begin**
15: **output**(*X*)
16: update_tree(*X*)
17: **end**
18: **readSymbol**(*X*)
19: **end**
20: **end**
21:
22: **procedure** update_tree(*U*)
23: **begin**
24: **while** (*U*!=*root*) **do**
25: **begin**
26: **if** (exists node *U1* with same value and greater order) **then**
27: change *U1* and *U*
28: increment value of *U*
29: *U* := parent(*U*)
30: **end**
31: increment value of *U*, update leaf codes
32: **end**

## References

- Adaptive Huffman coding - links, source codes, implementation. Datacompression.info.
- Adaptive Huffman coding. Codepedia, the developers encyclopedia.