Search CORE

778 research outputs found

A framework for lexical representation

Author: González José C.
Goñi José M.
Publication venue
Publication date: 01/01/1995
Field of study

In this paper we present a unification-based lexical platform designed for highly inflected languages (like Roman ones). A formalism is proposed for encoding a lemma-based lexical source, well suited for linguistic generalizations. From this source, we automatically generate an allomorph indexed dictionary, adequate for efficient processing. A set of software tools have been implemented around this formalism: access libraries, morphological processors, etc.Comment: 9 page

arXiv.org e-Print Archive

CiteSeerX

Archivo Digital UPM

An extended spell checker for unknown words

Author: Indig Balázs
Publication venue: Pazmany Peter Katolikus Egyetem
Publication date: 01/01/2013
Field of study

Repository of the Academy's Library

SMM: Detailed, Structured Morphological Analysis for Spanish

Author: Mahlow Cerstin
Piotrowski Michael
Publication venue
Publication date: 13/05/2015
Field of study

We present a morphological analyzer for Spanish called SMM. SMM is implemented in the grammar development framework Malaga, which is based on the formalism of Left-Associative Grammar. We briefly present the Malaga framework, describe the implementation decisions for some interesting morphological phenomena of Spanish, and report on the evaluation results from the analysis of corpora. SMM was originally only designed for analyzing word forms; in this article we outline two approaches for using SMM and the facilities provided by Malaga to also generate verbal paradigms. SMM can also be embedded into applications by making use of the Malagaprogramming interface; we briefly discuss some application scenarios

Publikationsserver des Instituts für Deutsche Sprache

TectoMT – a deep-linguistic core of the combined Chimera MT system

Author: Bojar Ondřej
Hajič Jan
Popel Martin
Rosa Rudolf
Sudarikov Roman
Publication venue
Publication date: 01/01/2016
Field of study

Chimera is a machine translation system that combines the TectoMT deep-linguistic core with phrase-based MT system Moses. For English–Czech pair it also uses the Depfix post-correction system. All the components run on Unix/Linux platform and are open source (available from Perl repository CPAN and the LINDAT/CLARIN repository). The main website is https://ufal.mff.cuni.cz/tectomt. The development is currently supported by the QTLeap 7th FP project (http://qtleap.eu)

Biblio at Institute of Formal and Applied Linguistics

Grammar Enhanced Biliteracy: Naskapi Language Structures For Facilitating Reading In Naskapi

Author: Jancewicz William
Publication venue: UND Scholarly Commons
Publication date: 01/01/2013
Field of study

The Naskapi language is the language of instruction in the early primary grades of the school in the Naskapi community. Only recently have Naskapi-speaking teachers received formal instruction in pedagogy, with a cohort of Naskapi teachers following courses for their Bachelor of Education degree towards careers teaching in the Naskapi language in their local school. These adults are highly motivated to become literate in their mother tongue in order to teach or prepare curriculum materials in the Naskapi language. This thesis explores how basic grammatical structures can be mastered, and provides insight into the form that pedagogical grammatical instruction should take, in order to equip these individuals to become adequately literate in their mother tongue

UND Scholarly Commons (University of North Dakota)

Natural language understanding : a new challenge for grammar systems

Author: Martín-Vide Carlos
Publication venue
Publication date: 01/01/1996
Field of study

University of Szeged

towards an optimal solution to lemmatization in arabic

Author: Mourad Abbas
Publication venue
Publication date: 01/01/2018
Field of study

Abstract Lemmatization—computing the canonical forms of words in running text—is an important component in any NLP system and a key preprocessing step for most applications that rely on natural language understanding. In the case of Arabic, lemmatization is a complex task because of the rich morphology, agglutinative aspects, and lexical ambiguity due to the absence of short vowels in writing. In this paper, we introduce a new lemmatizer tool that combines a machine-learning-based approach with a lemmatization dictionary, the latter providing increased accuracy, robustness, and flexibility to the former. Our evaluations yield a performance of over 98% for the entire lemmatization pipeline. The lemmatizer tools are freely downloadable for private and research purposes

Open Access Repository

Inquiries into words, constraints and contexts : Festschrift in the honour of Kimmo Koskenniemi on his 60th birthday

Author: Arppe Antti
Carlson Lauri
Linden Krister
Piitulainen Jussi Olavi
Suominen Mickael
Vainio Martti
Westerlund Hanna
Yli-Jyrä Anssi Mikael
Publication venue: CSLI publications
Publication date: 01/01/2005
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto