2 research outputs found

    The Tanl Lemmatizer Enriched with a Sequence of Cascading Filters

    No full text

    The Tanl Lemmatizer Enriched with a Sequence of Cascading Filters

    No full text
    We have extended an existing lemmatizer, which relies on a lexicon of about 1.2 millions form, where lemmas are indexed by rich PoS tags, with a sequence of cascading filters, each one in charge of dealing with specific issues related to out-of-dictionary words. The last two filters are devoted to resolve semantic ambiguities between words of the same syntactic category, by querying external resources: an enriched index built on the Italian Wikipedia and the Google index
    corecore