Prague Stringology Conference 2017

Martin Berglund and Brink van der Merwe

Regular Expressions with Backreferences Re-examined

Abstract:
Most modern regular expression matching libraries (one of the rare exceptions being Google's RE2) allow backreferences, operations which bind a substring to a variable allowing it to be matched again verbatim. However, different implementations not only vary in the syntax permitted when using backreferences, but both implementations and definitions in the literature offer up a number of different variants on how backreferences match. Our aim is to compare the various flavors by considering the formal languages that each can describe, resulting in the establishment of a hierarchy of language classes. Beyond the hierarchy itself, some complexity results are given, and as part of the effort on comparing language classes new pumping lemmas are established, and old ones extended to new classes.

Download paper: Article in PostScript Article in PDF BibTeX Reference
 PostScript   PDF   BibTeX reference