Prague Stringology Conference 2021

Cedric Chauve, Marni Mishna and France Paquet-Nadeau

Refined Upper Bounds on the Size of the Condensed Neighbourhood of Sequences

The d-neighbourhood of a sequence s is the set of sequences that are at distance at most d from s, for a given measure of distance and parameter d. The condensed d-neighbourhood is obtained by removing from the neighbourhood any sequence having a prefix also in the neighbourhood. Estimating the maximum size of the condensed neighbourhood over all DNA sequences of a given length k for the Levenshtein distance is a problem related, among others, to the analysis of the BLAST (Basic Local Alignment Search Tool, Altschul et al., 1990). In this work, we analyse recurrences for computing an upper bound to the maximum size of the condensed d-neighbourhood for sequences of length k and provide a simpler asymptotic expression that we conjecture results in a dramatically improved upper bound.

Download paper: Article in PostScript Article in PDF BibTeX Reference
 PostScript   PDF   BibTeX reference 
Download presentation: Presentation