Adding frequencies to the LGLex lexicon with IRASUBCAT

Abstract

We present a method for enlarge a lexicon (with frequencies information), that is useful for parsing and others NLP applications. We show an example enlarging the verbal LGLex lexicon of French [8], using several corpora extracted from the evaluation campaign for French parsers Passage [5]. To do that, we use the results of the frmg parser [7] with IRASubcat, a tool that automatically acquires subcategorization frames from corpus in any language and that also allows to complete an existing lexicon. We obtain the frequencies of occurrence for each input and each subcategorization frame for 14,068 distinct lemmas.Sociedad Argentina de Informática e Investigación Operativ

    Similar works