previous main page content next

LZW - decompression

Main features

History

LZW (Lempel-Ziv-Welch) is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 algorithm published by Lempel and Ziv in 1978. The algorithm is designed to be fast to implement, but not necessarily optimal since it does not perform any analysis on the data.

Description

The compressor algorithm builds a string translation table from the text being compressed. The string translation table maps fixed-length codes (usually 12-bit) to strings. The string table is initialized with all single-character strings (256 entries in the case of 8-bit characters). As the compressor character-serially examines the text, it stores every unique two-character string into the table as a code/character concatenation, with the code mapping to the corresponding first character. As each two-character string is stored, the first character is outputted. Whenever a previously-encountered string is read from the input, the longest such previously-encountered string is determined, and then the code for this string concatenated with the extension character (the next character in the input) is stored in the table. The code for this longest previously-encountered string is outputted and the extension character is used as the beginning of the next string.

The decompressor algorithm only requires the compressed text as an input, since it can build an identical string table from the compressed text as it is recreating the original text. However, an abnormal case shows up whenever the sequence character/string/character/string/character (with the same character for each character and string for each string) is encountered in the input and character/string is already stored in the string table. When the decompressor reads the code for character/string/character in the input, it cannot resolve it because it has not yet stored this code in its table. This special case can be dealt with because the decompressor knows that the extension character is the previously-encountered character.

Pseudo-code


 1:  begin
 2:     tmp = ""
 3:     initialization of the dictionary
 4:     while ((code = read code) != EOF) do
 5:      begin
 6:        if ( is phrase with code (code) in dictionary ) then
 7:         begin
 8:           phrase := dictionary[code]
 9:           output(phrase)
10:           add to dictionary (tmp + first character of phrase)
11:         end
12:        else 
13:         begin
14:           if ( code > maximum index of dictionary + 1 ) then
15:              exit
16:           phrase := tmp + first char of tmp
17:           output(phrase)
18:           add to dictionary (phrase)
19:         end
20:        tmp := phrase
21:      end
22:  end

References

  1. M. Kopka: Dictionary compression (LZW). 1996.
  2. Miroslav Beneš:Data compression. 2005.
  3. LZW. Wikipedia, the free encyclopedia.
  4. Terry Welch: A Technique for High-Performance Data Compression. Computer, June 1984.
  5. Mark Nelson: LZW Data Compression. Dr. Dobb's Journal, October 1989.
  6. Stuart Caie: Sad day... GIF patent dead at 20.6.2003.
  7. Nathan Willis: Bringing back LZW compression. 2005.
  8. M. C. Battilana: The GIF Controversy: A Software Developer's Perspective. 2004.
  9. Interactive LZW compression.

Example