Prague Stringology Conference 2025

Hamed Hasibi, Mhaskar Neerja and William F. Smyth

Approximate Longest Common Substring of Multiple Strings: Experimental Evaluation

Abstract:
Determining an Approximate Longest Common Substring (ALCS) among a collection of strings S=s1, s2, ..., sm, m ≥ 2, is especially significant in computational biology – for instance, when tracking related mutations across several genomic sequences. To address this, the Restricted k-t Longest Common Substring (Rkt-LCS) problem was introduced by Hasibi et al. (2025), along with several efficient solutions under various distance metrics. In this paper, we describe and evaluate two parallel strategies for computing the Rkt-LCS of a set S of strings under the Hamming distance metric. The first strategy parallelizes the computation of the core data structure introduced in Hasibi et al. (2025), while the second makes use of a Graphics Processing Unit (GPU) that we demonstrate executes over a hundred times faster than its CPU counterpart.

Download paper: Article in PostScript Article in PDF BibTeX Reference
 PostScript   PDF   BibTeX reference 
Download presentation: Presentation