Search CORE

27 research outputs found

Unifying morphology resources with OntoLex-Morph: a case study in German

Author: Chiarcos Christian
Fäth Christian
Ionov Maxim
Publication venue
Publication date: 20/04/2023
Field of study

The OntoLex vocabulary has become a widely used community standard for machine-readable lexical resources on the web. The primary motivation to use OntoLex in favor of tool- or application-specific formalisms is to facilitate interoperability and information integration across different resources. One of its extension that is currently being developed is a module for representing morphology, OntoLex-Morph. In this paper, we show how OntoLex-Morph can be used for the encoding and integration of different types of morphological resources on a unified basis. With German as the example, we demonstrate it for (a) a full-form dictionary with inflection information (Unimorph), (b) a dictionary of base forms and their derivations (UDer), (c) a dictionary of compounds (from GermaNet), and (d) lexicon and inflection rules of a finite-state parser/generator (SMOR/Morphisto). These data are converted to OntoLex-Morph, their linguistic information is consolidated and corresponding lexical entries are linked with each other. The main contribution of this paper is the discussion of the current state of OntoLex-Morph and its validation on different types of real-world resources for a single language. In the longer term, the successful application of OntoLex-Morph to such diverse data, along with the adjustments to the vocabulary observed in the process, will be a means to establish interoperability among morphological resources as well as between them and classical lexical data such as dictionaries, WordNets, or thesauri

OPUS Augsburg

DeLex, a freely-avaible, large-scale and linguistically grounded morphological lexicon for German

Author: Sagot Benoît
Publication venue: HAL CCSD
Publication date: 26/05/2014
Field of study

International audienceWe introduce DeLex, a freely-avaible, large-scale and linguistically grounded morphological lexicon for German developed within the Alexina framework. We extracted lexical information from the German wiktionary and developed a morphological inflection grammar for German, based on a linguistically sound model of inflectional morphology. Although the developement of DeLex involved some manual work, we show that is represents a good tradeoff between development cost, lexical coverage and resource accuracy

INRIA a CCSD electronic archive server

Hal-Diderot

HFST—Framework for Compiling and Applying Morphologies

Author: A. Savary
A.V. Aho
C. Allauzen
H. Schmid
J.A. Brzozowski
K. Oflazer
K.R. Beesley
K.R. Beesley
L. Karttunen
M. Huldén
M. Silfverberg
Publication venue: Springer
Publication date: 01/01/2011
Field of study

HFST–Helsinki Finite-State Technology ( hfst.sf.net ) is a framework for compiling and applying linguistic descriptions with finite-state methods. HFST currently connects some of the most important finite-state tools for creating morphologies and spellers into one open-source platform and supports extending and improving the descriptions with weights to accommodate the modeling of statistical information. HFST offers a path from language descriptions to efficient language applications in key environments and operating systems. HFST also provides an opportunity to exchange transducers between different software providers in order to get the best out of each finite-state library.Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

DeLex, a freely-avaible, large-scale and linguistically grounded morphological lexicon for German

Author: Sagot Benoît
Publication venue: HAL CCSD
Publication date: 26/05/2014
Field of study

INRIA a CCSD electronic archive server

Agreement Constraints for Statistical Machine Translation into German

Author: Koehn Philipp
Williams Philip
Publication venue
Publication date: 01/07/2011
Field of study

Edinburgh Research Explorer

A case study in tagging case in german: an assessment of statistical approaches

Author: Clematide Simon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

In this study, we assess the performance of purely statistical approaches using supervised machine learning for predicting case in German (nominative, accusative, dative, genitive, n/a). We experiment with two different treebanks containing morphological annotations: TIGER and TUEBA. An evaluation with 10-fold cross-validation serves as the basis for systematic comparisons of the optimal parametrizations of different approaches. We test taggers based on Hidden Markov Models (HMM), Decision Trees, and Conditional Random Fields (CRF). The CRF approach based on our hand-crafted feature model achieves an accuracy of about 94%. This outperforms all other approaches and results in an improvement of 11% compared to a baseline HMM trigram tagger and an improvement of 2% compared to a state-of-the-art tagger for rich morphological tagsets. Moreover, we investigate the effect of additional (morphological) categories (gender, number, person, part of speech) in the internal tagset used for the training. Rich internal tagsets improve results for all tested approaches

Crossref

ZORA

SMM: Detailed, Structured Morphological Analysis for Spanish

Author: Mahlow Cerstin
Piotrowski Michael
Publication venue
Publication date: 13/05/2015
Field of study

We present a morphological analyzer for Spanish called SMM. SMM is implemented in the grammar development framework Malaga, which is based on the formalism of Left-Associative Grammar. We briefly present the Malaga framework, describe the implementation decisions for some interesting morphological phenomena of Spanish, and report on the evaluation results from the analysis of corpora. SMM was originally only designed for analyzing word forms; in this article we outline two approaches for using SMM and the facilities provided by Malaga to also generate verbal paradigms. SMM can also be embedded into applications by making use of the Malagaprogramming interface; we briefly discuss some application scenarios

Publikationsserver des Instituts für Deutsche Sprache

Proceedings of the 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference (LREC2022), 20-25 June 2022, Marseille, France

Author: Chiarcos Christian
Declerck Thierry
Ionov Maxim
McCrae John Philip
Montiel Elena
Publication venue
Publication date: 20/04/2023
Field of study

OPUS Augsburg

HFST—a System for Creating NLP Tools

Author: Axelson Erik
Drobac Senka
Hardwick Sam
Kuokkala Juha
Linden Krister
Niemi Jyrki
Pirinen Tommi
Silfverberg Miikka
Publication venue: Springer-Verlag
Publication date: 01/01/2013
Field of study

The paper presents and evaluates various NLP tools that have been created using the open source library HFST--Helsinki Finite-State Technology and outlines the minimal extensions that this has required to a pure finite-state system. In particular, the paper describes an implementation and application of p-match presented by Karttunen at SFCM 2011.Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto