DominikKoppl, Tomohiro I, Isamu Furuya, Yoshimasa Takabatake, Kensuke Sakai and Keisuke Goto
Re-Pair in Small Space
|Re-Pair is a grammar compression scheme with favorably good compression rates. The computation of Re-Pair comes with the cost of maintaining large frequency tables, which makes it hard to compute Re-Pair on large scale data sets. As a solution for this problem we present, given a text of length n whose characters are drawn from an integer alphabet with size σ = nO(1), an O(n2) ᑎ O(n2 lg logτn lg lg lg n/logτn) time algorithm computing Re-Pair with max((n/c) lg n, n ⌈lg τ⌉) + O(lg n) bits of working space including the text space, where c ≥ 1 is a fix user-defined constant and τ is the sum of σ and the number of non-terminals.