3 research outputs found

    A set of open-source tools for Turkish natural language processing

    Get PDF
    Abstract This paper introduces a set of freely available, open-source tools for Turkish that are built around TRmorph, a morphological analyzer introduced earlier in Çöltekin (2010a). The article first provides an update on the analyzer, which includes a complete rewrite using a different finite-state description language and tool set as well as major tagset changes to comply better with the state-of-the-art computational processing of Turkish and the user requests received so far. Besides these major changes to the analyzer, this paper introduces tools for morphological segmentation, stemming and lemmatization, guessing unknown words, grapheme to phoneme conversion, hyphenation and a morphological disambiguation

    Baltic and Nordic Parts of the European Linguistic Infrastructure

    Get PDF
    This paper describes scientific, technical and legal work done on the creation of the linguistic infrastructure for the Nordic and Baltic countries. The paper describes the research on assessment of the language technology support for languages of Baltic and Nordic countries, on establishing language resource sharing infrastructure and collection and description of linguistic resources. We present improvements necessary to ensure usability and interoperability of language resources, discuss IPR issues related to intellectual property rights for complex resources, describe extension of infrastructure through integration of language-resource specific repositories. Work on treebanks, wordnets, terminology resources and finite-state technology is described in more details. Finally, our approach on ensuring sustainability of infrastructure is discussed.Peer reviewe

    Using HFST for Creating Computational Linguistic Applications*

    No full text
    Abstract. HFST – Helsinki Finite-State Technology (hfst.sf.net) is a framework for compiling and applying linguistic descriptions with finitestate methods. HFST currently collects some of the most important finite-state tools for creating morphologies and spellcheckers into one open-source platform and supports extending and improving the descriptions with weights to accommodate the modeling of statistical information. HFST offers a path from language descriptions to efficient language applications. In this article, we focus on aspects of HFST that are new to the end user, i.e. new tools, new features in existing tools, or new language applications, in addition to some revised algorithms that increase performance
    corecore