Search CORE

289 research outputs found

Morphological annotation of Korean with Directly Maintainable Resources

Author: Berlocher Ivan
Huh Hyun-Gue
Laporte Eric
Nam Jee-Sun
Publication venue
Publication date: 01/01/2006
Field of study

This article describes an exclusively resource-based method of morphological annotation of written Korean text. Korean is an agglutinative language. Our annotator is designed to process text before the operation of a syntactic parser. In its present state, it annotates one-stem words only. The output is a graph of morphemes annotated with accurate linguistic information. The granularity of the tagset is 3 to 5 times higher than usual tagsets. A comparison with a reference annotated corpus showed that it achieves 89% recall without any corpus training. The language resources used by the system are lexicons of stems, transducers of suffixes and transducers of generation of allomorphs. All can be easily updated, which allows users to control the evolution of the performances of the system. It has been claimed that morphological annotation of Korean text could only be performed by a morphological analysis module accessing a lexicon of morphemes. We show that it can also be performed directly with a lexicon of words and without applying morphological rules at annotation time, which speeds up annotation to 1,210 word/s. The lexicon of words is obtained from the maintainable language resources through a fully automated compilation process

arXiv.org e-Print Archive

CiteSeerX

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

AmAMorph: Finite State Morphological Analyzer for Amazighe

Author: Driss Aboutajdine
Fatima Zahra Nejme
Siham Boulaknadel
Publication venue: 'Faculty of Electrical Engineering and Computing, Univ. of Zagreb'
Publication date: 01/01/2016
Field of study

This paper presents AmAMorph, a morphological analyzer for Amazighe language using a system based on the NooJ linguistic development environment. The paper begins with the development of Amazighe lexicons with large coverage formalization. The built electronic lexicons, named ‘NAmLex’, ‘VAmLex’ and ‘PAmLex’ which stand for ‘Noun Amazighe Lexicon’, ‘Verb Amazighe Lexicon’ and ‘Particles Amazighe Lexicon’, link inflectional, morphological, and syntacticsemantic information to the list of lemmas. Automated inflectional and derivational routines are applied to each lemma producing over inflected forms. To our knowledge,AmAMorph is the first morphological analyzer for Amazighe. It identifies the component morphemes of the forms using large coverage morphological grammars. Along with the description of how the analyzer is implemented, this paper gives an evaluation of the analyzer

Directory of Open Access Journals

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Dependency parsing with an extended finite-state approach

Author: Kaplan Ronald M
Kemal Oflazer
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2003
Field of study

This article presents a dependency parsing scheme using an extended finite-state approach. The parser augments input representation with "channels" so that links representing syntactic dependency relations among words can be accommodated and iterates on the input a number of times to arrive at a fixed point. Intermediate configurations violating various constraints of projective dependency representations such as no crossing links and no independent items except sentential head are filtered via finite-state filters. We have applied the parser to dependency parsing of Turkish

CiteSeerX

Crossref

Sabanci University Research Database

Inquiries into words, constraints and contexts : Festschrift in the honour of Kimmo Koskenniemi on his 60th birthday

Author: Arppe Antti
Carlson Lauri
Linden Krister
Piitulainen Jussi Olavi
Suominen Mickael
Vainio Martti
Westerlund Hanna
Yli-Jyrä Anssi Mikael
Publication venue: CSLI publications
Publication date: 01/01/2005
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

A Semi-automatic and Low Cost Approach to Build Scalable Lemma-based Lexical Resources for Arabic Verbs

Author: Abdelali Ahmed,
Doumi Noureddine
Lehireche Ahmed
Maurel Denis
Publication venue: IJIT
Publication date: 01/02/2016
Field of study

International audienceThis work presents a method that enables Arabic NLP community to build scalable lexical resources. The proposed method is low cost and efficient in time in addition to its scalability and extendibility. The latter is reflected in the ability for the method to be incremental in both aspects, processing resources and generating lexicons. Using a corpus; firstly, tokens are drawn from the corpus and lemmatized. Secondly, finite state transducers (FSTs) are generated semi-automatically. Finally, FSTsare used to produce all possible inflected verb forms with their full morphological features. Among the algorithm’s strength is its ability to generate transducers having 184 transitions, which is very cumbersome, if manually designed. The second strength is a new inflection scheme of Arabic verbs; this increases the efficiency of FST generation algorithm. The experimentation uses a representative corpus of Modern Standard Arabic. The number of semi-automatically generated transducers is 171. The resulting open lexical resources coverage is high. Our resources cover more than 70% Arabic verbs. The built resources contain 16,855 verb lemmas and 11,080,355 fully, partially and not vocalized verbal inflected forms. All these resources are being made public and currently used as an open package in the Unitex framework available under the LGPL license

Directory of Open Access Journals

HAL Descartes

HAL Université de Tours

Hal-Diderot

Supistavan äärellistilaisen dependenssijäsentimen suunnittelun tarkennusta

Author: Piitulainen Jussi
Voutilainen Atro
Yli-Jyrä Anssi Mikael
Publication venue: ACL Anthology
Publication date: 23/07/2012
Field of study

Proceeding volume: 2012This work complements a parallel paper of a new finite-state dependency parser architecture (Yli-Jyrä, 2012) by a proposal for a linguistically elaborated morphology-syntax interface and its finite-state implementation. The proposed interface extends Gaifman’s (1965) classical dependency rule formalism by separating lexical word forms and morphological categories from syntactic categories. The separation lets the linguist take advantage of the morphological features in order to reduce the number of dependency rules and to make them lexically selective. In addition, the relative functional specificity of parse trees gives rise to a measure of parse quality. By filtering worse parses out from the parse forest using finite-state techniques, the best parses are saved. Finally, we present a synthesis of strict grammar parsing and robust text parsing by connecting fragmental parses into trees with additional linear successor links.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Finite State Morphology and Sindhi Noun Inflections

Author: Bhatti Mohammad Iqbal
Rahman Mutee U
Publication venue: Institute of Digital Enhancement of Cognitive Processing, Waseda University
Publication date: 01/01/2011
Field of study