previous main page content next

PPMC

Main features

History

PPM was invented by John Cleary and Ian Witten back in 1984. The original paper talked about PPMA and PPMB. The suffix letter ('A' or 'B') denotes the escape code used in the algorithm, however the rest of the algorithm is roughly the same. In the book "Text compression" all of them were explained. PPMC was presented by Alistair Moffat. In the paper by Moffat also PPMC' was presented, which is PPMC but using lazy exclusion. There are other variants such as PPMD, PPMZ, PPM*. The difference between PPMC and PPMA or PPMB are only the probability computation for the escape code.

Description

PPM is an adaptive statistical data compression technique based on context modeling and prediction. The name stands for Prediction by Partial Matching. PPM models use a set of previous symbols in the uncompressed symbol stream to predict the next symbol in the stream.

Predictions are usually reduced to symbol rankings. The number of previous symbols, n, determines the order of the PPM model which is denoted as PPM(n). Unbounded variants where the context has no length limitations also exist and are denoted as PPM*. If no prediction can be made based on all n context symbols a prediction is attempted with just n-1 symbols. This process is repeated until a match is found or no more symbols remain in context. At that point a fixed prediction is made. This process is the inverse of that followed by DMC compression algorithms which build up from a zero-order model.

Exclusion: if we are using escape codes, when we are in a given context we exclude lower orders from probability computing, because we just take in account the probabilities of current one, that's PPMC using lazy exclusions. If we exclude symbols which have appeared in higher orders from the probability computation that's called full exclusion.

Note: changing the context length will take effect after resetting the applet.

Pseudo-code


 1:  begin
 2:     while (not last character) do
 3:      begin
 4:        readSymbol()
 5:        shorten context
 6:        while (context not found and context length not -1) do
 7:         begin
 8:           output(escape sequence)
 9:           shorten context
10:         end
11:        output(character)
12:        while (context length not -1) do
13:         begin
14:           increase count of character (create node if nonexistant)
15:           shorten context
16:         end
17:      end
18:  end    

References

  1. Arturo San Emeterio Campos: Implementing PPMC with hash tables.
  2. PPM compression algorithm. Wikipedia, the free encyclopedia.
  3. T. Bell, J. Cleary, and I. Witten: Data compression using adaptive coding and partial string matching. IEEE Transactions on Communications, Vol. 32 (4), p. 396-402 (University of Waikato, New Zealand)(1984).
  4. A. Moffat: Implementing the PPM data compression scheme. IEEE Transactions on Communications, Vol. 38 (11), p. 1917-1921. (1990)
  5. J. Cleary, W. Teahan, and I. Witten: Unbounded length contexts for PPM. In J.A. Storer and M. Cohn, editors, Proceedings DCC-95. IEEE Computer Society Press (1995).

Example