Veli Makinen, Gonzalo Navarro and Esko Ukkonen
Matching Numeric Strings under Noise
Abstract: |
Numeric string is a sequence of symbols from an alphabet Σ subset of U where U$ is some numerical universe closed under addition and subtraction. Given two numeric strings A=a_{1} ... a_{m} and B=b_{1} ... b_{n} and a distance function d(A,B) that gives the score of the best (partial) matching of A and B, the transposition invariant distance is min_{t in U}{d(A+t,B)}, where A + t=(a_{1}+t)(a_{2}+t)...(a_{m}+t). The corresponding matching problem is to find occurrences j of A in B where d(A+t,B_{j'...j}) is smaller than some given threshold and B_{j'...j} is a substring of B. In this paper, we give efficient algorithms for matching numeric strings - with and without transposition invariance - under noise; we consider distance functions d(A,B) such that symbols a \in A and b\in B can be matched if |b-a|≤ δ, or the κ largest differences |b-a| can be discarded. |
Download paper: | ||
PostScript |