Prague Stringology Conference 2010

Domenico Cantone, Simone Faro and Emanuele Giaquinta

Approximate String Matching Allowing for Inversions and Translocations

Abstract:
The approximate string matching problem consists in finding all locations at which a pattern P of length m matches a substring of a text T of length n, after a given finite number of edit operations. In this paper we investigate such problem when the string distance involves translocations of equal length adjacent factors and inversions of factors. In particular, we devise a O(nm max(α,β))-time and O(m2)-space algorithm, where α and β are respectively the maximum length of the factors involved in any translocation and inversion. Our algorithm is based on the dynamic-programming approach and makes use of the Directed Acyclic Word Graph of the pattern. Moreover we show that under the assumptions of equiprobability and independence of characters our algorithm has a O(n logσm) average time complexity. Finally, we briefly sketch in an appendix an efficient implementation, based on bit-parallelism.

Download article: Article in PostScript Article in PDF BibTeX Reference
 PostScript   PDF   BibTeX reference