# Shannon-Fano coding

## Main features

- statistic compression method
- two-pass compression method
- semi-adaptive compression method
- asymetric compression method
- the result compression is worse in comparison with Huffman coding
- optimality of output is not guaranteed

### Time and space complexities

**Time complexity of algorithm**: O(n + |Σ| * log|Σ|)

**Space complexity of algorithm**: O(|Σ|)

From description of the algorithm we can compute time and space complexity by dividing the problem into three phases. Time complexity of the first phase of computing counts of symbols appearance is linearly dependent on length of the input text. The next phase is ordering symbols, which is implementation dependent. The last phase is generating of the output. This also linearly depends on length of the input text. Space complexity grows linearly with the count of different symbols in the input string.

## History

This method dates from the year 1949. It was published by Claude Elwood Shannon (he is designated as the father of theory of information) with Warren Weaver and by Robert Mario Fano independently.

## Description

As it can be seen in pseudocode of this algorithm, there are two passes through an input data. Based on frequency of appearance of symbols in the input text, codes for each symbol are generated. Using a method from top to bottom, the prefix code of variable length is generated. First, symbols are ordered by counts of their appearance. Next, list of symbols is recursively divided into two parts of almost same frequency of symbols occurrence.

First, count of different symbols is written to the output. Next, the encoded tree is written to the output. This tree is constructed from parts of divided list of symbols. Left branches of the tree are marked by 1, right branches by 0. This numbers are created by walk through the tree from root and from left to right. Only necessary numbers are written. Next, Fibonacci code of count of leaves is determined and written to the output. Then the coding of the input begins.

## Pseudo-code

```
1:
```**begin**
2: count source units
3: sort source units to non-decreasing order
4: SF-Split*S*
5: **output**(count of symbols, encoded tree, symbols)
6: write output
7: **end**
8:
9: **procedure** SF-Split(*S*)
10: **begin**
11: **if** (|*S*|>1) **then**
12: **begin**
13: divide *S* to *S1* and *S2* with about same count of units
14: add 1 to codes in *S1*
15: add 0 to codes in *S2*
16: SF-Split(*S1*)
17: SF-Split(*S2*)
18: **end**
19: **end**

## References

- Shannon-Fano coding. Wikipedia, the free encyclopedia
- Claude Elwood Shannon. Wikipedia, the free encyclopedia.
- C. E. Shannon: A Mathematical Theory of Communication. The Bell System Technical Journal, Vol. 27, July, October, 1948.
- C. E. Shannon: Prediction and Entropy of Printed English. The Bell System Technical Journal, Vol. 30, 1951.
- C. E. Shannon: Communication Theory of Secrecy Systems. The Bell System Technical Journal, Vol. 28, 1949.