The Prague Stringology Conference 2002

Heikki Hyyrö

A Bit-Vector Algorithm for Computing Levenshtein and Damerau Edit Distances

Abstract:
The edit distance between strings A and B is defined as the minimum number of edit operations needed in converting A into B or vice versa. The Levenshtein edit distance allows three types of operations: an insertion, a deletion or a substitution of a character. The Damerau edit distance allows the previous three plus in addition a transposition between two adjacent characters. To our best knowledge the best current practical algorithms for computing these edit distances run in time O(dm) and O(s + ( m/w ) n), where d is the edit distance between the two strings, m and n are their lengths (m <= n), w is the computer word size and s is the size of the alphabet. In this paper we present an algorithm that runs in time O(s + ( d/w ) m). The structure of the algorithm is such, that in practice it is mostly suitable for testing whether the edit distance between two strings is within some pre-determined error threshold. We also present some initial test results with thresholded edit distance computation. In them our algorithm works faster than the original algorithm of Myers.

Download paper: Article in PostScript Article in PDF
 PostScript   PDF