Dominik Köppl, Tomohiro I, Isamu Furuya, Yoshimasa Takabatake, Kensuke Sakai and Keisuke Goto
Re-Pair in Small Space
Abstract: |
Re-Pair is a grammar compression scheme with favorably good compression rates. The computation of Re-Pair comes with the cost of maintaining large frequency tables, which makes it hard to compute Re-Pair on large scale data sets. As a solution for this problem we present, given a text of length n whose characters are drawn from an integer alphabet with size σ = n^{O(1)}, an O(n^{2}) ᑎ O(n^{2} lg log_{τ}n lg lg lg n/log_{τ}n) time algorithm computing Re-Pair with max((n/c) lg n, n ⌈lg τ⌉) + O(lg n) bits of working space including the text space, where c ≥ 1 is a fix user-defined constant and τ is the sum of σ and the number of non-terminals. |
Download paper: | |||
PostScript | BibTeX reference |
Download presentation: |