The Prague Stringology Conference 2005

Jussi Rautio

Context-dependent Stopper encoding

Abstract:
A character-based encoding method is presented for natural-language texts and genetic data. Exact string matching from the encoded text is faster than from the original text, with medium and longer patterns. A compression ratio of about 50% is achieved as a by-product. The method encodes characters with variable-length codewords of 2-bit base symbols. An advanced variant is context-dependent, using information from the previous character. The method supersedes the previous comparable methods in compression ratio, and is comparable to the best such methods in search speed.

Download article: Article in PostScript Article in PDF
 PostScript   PDF