# DCA (Data Compression using Antidictionaries)

## Main features

- context compression method
- contextual method
- works with binary alphabet

## History

Method was developed in 1998 by M. Crochemore, F. Mignosi, A. Restivo, S. Salemi.

## Description

Algorithm uses *antidictionary* (AD) - a set of words (of certain maximal length),
which are not contained in input text. AD doesn't contain all such words, but just *minimal forbidden words - MFW*.
MFW is word u[1..n], u doesn't occur in input text, but both u[1..n-1] and u[2..n] do.

When processing i-th letter of input and there is a word v[1..n] in antidictionary, that v[1..n-1] is a suffix of previously processed input, we don't need to code i-th letter into output, because (when decoding) we can find that letter with a help of the suffix andd the antidictionary. Other input letters are normally copied to output.

### Applet description

When building a factor automaton, the node being just processed is blue-colored and newly inserted node is colored red. After finishing the automaton all nodes that have just one child are marked blue. Then we add new nodes (to pair with "lonely children") and mark them red.

During the process of encoding we can see all suffixes of processed input and each of these suffixes has additional "0" or "1" at the end (marked red).

## Pseudo-code

```
1:
```**function** Create AD (input word *w*, AD's word's max length *l*)
2: **begin**
3: create factor automaton matching all factors (of maximal length *l*) of input text
4: each node has value equal to corresponding factor
5: **for each** (node *U*) **do**
6: **begin**
7: **if** (*U* has just one child) **then**
8: **begin**
9: let *h* is value of edge leading from *U* (*h* ∈ {0,1})
10: create node *U'* and mark it as *forbidden word*
11: create edge from *U* to *U'* with value ¬*h*
12: create suffix-link - edge from *U* (with value *s*[1..n])
13: to *U''* with value *s*[2..n] (if *U''* doesn't exist,
14: try *s*[3..n] etc.)
15: **end**
16: **end**
17: **for each** (node *U* marked as *forbidden word*) **do**
18: **begin**
19: let *P* is direct ancestor of *U*
20: let *S* is node, where suffix-link from *P* leads
21: let *h* is value of edge from *P* to *U*
22: **if** ((exists node *T* linked by edge with value *h* from *S*)
23: **and** (*T* is not *forbidden word*)) **then**
24: **begin**
25: mark *U* as *minimal forbidden word - MFW*
26: **end**
27: **end**
28: return values of all nodes marked as *MFW* - antidictionary AD
29: **end**
30:
31: **function** Process Input (antidictionary *AD*, input word *w*)
32: **begin**
33: let *v* = ε {processed input}
34: let *g* = ε {output}
35: **for** (*i* := 1 **to** |*w*|) **do**
36: **begin**
37: let *M0* = set of all suffixes of *v* with additional 0 as last letter
38: let *M0* = set of all suffixes of *v* with additional 1 as last letter
39: **if** ((*M0* ∪ *M1*) ∩ *AD* ≠ Ø) **then**
40: *g* := *g*.*w*_{i}
41: *v* := *v*.*w*_{i}
42: **end**
43: **output**(*g*)
44: **end**

## References

- M. Crochemore, F. Mignosi, A. Restiva and S. Salemi: Text Compression Using Antidictionaries. 1998.