Prague Stringology Conference 2023

Holly Koponen, Mhaskar Neerja and William F. Smyth

Improved Practical Algorithms to Compute Maximal Covers

Abstract:
A cover of a string x = x[1..n] is a repeating substring u of x such that every position in x lies within an occurrence of u. Since very few strings possess a cover, it becomes interesting to compute various kinds of cover generalizations. Here we describe algorithms to compute a maximal cover; that is, a repeating substring u of x that covers M = Mx positions, the maximum coverage attained by any repeating substring of x. In 2015, an O(n log n)-time algorithm to compute a maximal cover was proposed; but the algorithm was complex, making use of annotated suffix trees. In 2022, an O(n2)-time maximal cover algorithm was implemented and evaluated on protein sequences. In this paper, we propose two simple O(n2)-time algorithms for this problem that, nonetheless, as we show by experiment, execute in linear time in many cases that arise in practice, and are much faster than the algorithm recently implemented in 2022. On the other hand, when experiments are restricted to the highly repetitive Fibonacci strings, the behaviour of both algorithms is clearly quadratic.

Download paper: Article in PostScript Article in PDF BibTeX Reference
 PostScript   PDF   BibTeX reference