The Prague Stringology Conference 2003

Learning the Morphological Features of a Large Set of Words

Abstract:

Given K - a large set of words - this paper presents a new method for learning the morphological features of K. The method, LMF, has two components: preprocessing and processing. The first component makes use of two separate methods, namely, refinement and time-space optimization. The former is a method that uses the closed world assumption of the default logic for partitioning K into a set of hierarchical languages. The latter is for efficiently learning the morphological features of each language outputted by the former method. Although, the finite-state transducers or the two-trie structure can be used to map a language onto a set of values, but we use our own competitor which has recently been proposed for such a mapping, consisting of associating a finite-state automaton accepting the input language with a decision tree (dt) representing the output values. The advantages of this approach are that it leads to more compact representations than transducers, and that decision trees can easily be synthesized by machine learning techniques. In the processing phase, given an input string (x), thanks to the hierarchical languages establishing the preferency order for the utilization of the current automaton(g_i) among the multiple ones, if x can be spelled out using g_i, then the output is returned using its counterpart namely dt_i, otherwise, we inspect other alternative until an output or failure be done. LMF has learned good strategies for the large sets of the words which are consuming tasks form space and times point of views all the verbs in French, including all the conjugated forms of each verb.

Download paper:
	PostScript	PDF