Shannon-Fano coding

Main features

statistic compression method
two-pass compression method
semi-adaptive compression method
asymetric compression method
the result compression is worse in comparison with Huffman coding
optimality of output is not guaranteed

Time and space complexities

Time complexity of algorithm: O(n + |Σ| * log|Σ|)
Space complexity of algorithm: O(|Σ|)

From description of the algorithm we can compute time and space complexity by dividing the problem into three phases. Time complexity of the first phase of computing counts of symbols appearance is linearly dependent on length of the input text. The next phase is ordering symbols, which is implementation dependent. The last phase is generating of the output. This also linearly depends on length of the input text. Space complexity grows linearly with the count of different symbols in the input string.

History

This method dates from the year 1949. It was published by Claude Elwood Shannon (he is designated as the father of theory of information) with Warren Weaver and by Robert Mario Fano independently.

Description

As it can be seen in pseudocode of this algorithm, there are two passes through an input data. Based on frequency of appearance of symbols in the input text, codes for each symbol are generated. Using a method from top to bottom, the prefix code of variable length is generated. First, symbols are ordered by counts of their appearance. Next, list of symbols is recursively divided into two parts of almost same frequency of symbols occurrence.

First, count of different symbols is written to the output. Next, the encoded tree is written to the output. This tree is constructed from parts of divided list of symbols. Left branches of the tree are marked by 1, right branches by 0. This numbers are created by walk through the tree from root and from left to right. Only necessary numbers are written. Next, Fibonacci code of count of leaves is determined and written to the output. Then the coding of the input begins.

Pseudo-code


 1:  begin
 2:     count source units
 3:     sort source units to non-decreasing order
 4:     SF-SplitS
 5:     output(count of symbols, encoded tree, symbols)
 6:     write output
 7:   end
 8:  
 9:  procedure SF-Split(S)
10:  begin
11:     if (|S|>1) then
12:      begin
13:        divide S to S1 and S2 with about same count of units
14:        add 1 to codes in S1
15:        add 0 to codes in S2
16:        SF-Split(S1)
17:        SF-Split(S2)
18:      end
19:  end

References

Shannon-Fano coding. Wikipedia, the free encyclopedia
Claude Elwood Shannon. Wikipedia, the free encyclopedia.
C. E. Shannon: A Mathematical Theory of Communication. The Bell System Technical Journal, Vol. 27, July, October, 1948.
C. E. Shannon: Prediction and Entropy of Printed English. The Bell System Technical Journal, Vol. 30, 1951.
C. E. Shannon: Communication Theory of Secrecy Systems. The Bell System Technical Journal, Vol. 28, 1949.