Different Issues In The Design Of A Lemmatizer/Tagger For Basque

Abstract

This paper presents relevant issues that have been considered in the design of a general purpose lemmatizer/tagger for Basque (EUSLEM). The lemmatizer/tagger is conceived as a basic tool necessary for other linguistic applications. It uses the lexical data base and the morphological analyzer previously developed and implemented. Due to the characteristics of the language, the tagset here proposed is structured in four levels so that each level is a refinement of the previous one in the sense that it adds more detailed information. We will focus on the problems found in designing this tagset and on the strategies for morphological disambiguation that will be used. 1. Introduction This paper describes the development of a general purpose lemmatizer/tagger for Basque which will lay the foundations for further applications in the field of automatic processing of Basque texts. In order to elaborate this project the following basic tools will be used: . The Lexical Database for Basque (LDB..

    Similar works

    Full text

    thumbnail-image

    Available Versions