Search CORE

933 research outputs found

DepAnn - An Annotation Tool for Dependency Treebanks

Author: Kakkonen Tuomo
Publication venue
Publication date: 01/01/2005
Field of study

DepAnn is an interactive annotation tool for dependency treebanks, providing both graphical and text-based annotation interfaces. The tool is aimed for semi-automatic creation of treebanks. It aids the manual inspection and correction of automatically created parses, making the annotation process faster and less error-prone. A novel feature of the tool is that it enables the user to view outputs from several parsers as the basis for creating the final tree to be saved to the treebank. DepAnn uses TIGER-XML, an XML-based general encoding format for both, representing the parser outputs and saving the annotated treebank. The tool includes an automatic consistency checker for sentence structures. In addition, the tool enables users to build structures manually, add comments on the annotations, modify the tagsets, and mark sentences for further revision

arXiv.org e-Print Archive

CiteSeerX

Error-tolerant Finite State Recognition with Applications to Morphological Analysis and Spelling Correction

Author: Oflazer Kemal
Publication venue
Publication date: 21/07/1995
Field of study

Error-tolerant recognition enables the recognition of strings that deviate mildly from any string in the regular set recognized by the underlying finite state recognizer. Such recognition has applications in error-tolerant morphological processing, spelling correction, and approximate string matching in information retrieval. After a description of the concepts and algorithms involved, we give examples from two applications: In the context of morphological analysis, error-tolerant recognition allows misspelled input word forms to be corrected, and morphologically analyzed concurrently. We present an application of this to error-tolerant analysis of agglutinative morphology of Turkish words. The algorithm can be applied to morphological analysis of any language whose morphology is fully captured by a single (and possibly very large) finite state transducer, regardless of the word formation processes and morphographemic phenomena involved. In the context of spelling correction, error-tolerant recognition can be used to enumerate correct candidate forms from a given misspelled string within a certain edit distance. Again, it can be applied to any language with a word list comprising all inflected forms, or whose morphology is fully described by a finite state transducer. We present experimental results for spelling correction for a number of languages. These results indicate that such recognition works very efficiently for candidate generation in spelling correction for many European languages such as English, Dutch, French, German, Italian (and others) with very large word lists of root and inflected forms (some containing well over 200,000 forms), generating all candidate solutions within 10 to 45 milliseconds (with edit distance 1) on a SparcStation 10/41. For spelling correction in Turkish, error-tolerantComment: Replaces 9504031. gzipped, uuencoded postscript file. To appear in Computational Linguistics Volume 22 No:1, 1996, Also available as ftp://ftp.cs.bilkent.edu.tr/pub/ko/clpaper9512.ps.

arXiv.org e-Print Archive

CiteSeerX

Bilkent University Institutional Repository

Working together towards an ideal infrastructure for language learner corpora

Author: Boyd Adriane
Jansen Maarten
Lindström Tiedemann Therese
Mikelić Preradović Nives
Rosen Alexandr
Rosén Dan
Stemle Egon W.
Volodina Elena
Publication venue: Presses universitaires de Louvain
Publication date: 01/01/2019
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

miraQA: Initial experiments in Question Answering

Author: García Serrano Ana
González Cristóbal José Carlos
Goñi Menoyo José Miguel
Martínez Fernández José Luis
Martínez Fernández Paloma
Pablo Sánchez César de
Villena Román Julio
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2004
Field of study

We present the miraQA system that constitutes MIRACLE first experience in Question Answering for monolingual Spanish and has been developed for QA@CLEF 2004. The architecture of the system is described and details of our approach to Statistical Answer Extraction based on Hidden Markov Models are presented. One run that uses last year question set for training purposes has been submitted. The results are presented together with ideas for improvement

Archivo Digital UPM

Confronting Focus Strategies in Finnish and in Italian: An Experimental Study on Object Focusing

Author: Elina Ylinärä
Giorgio Carella
Mara Frascarelli
Publication venue
Publication date: 01/01/2023
Field of study

Focus is cross-linguistically associated with a number of different strategies, such as fronting, clefting, markers, and prosody. In some cases, the choice between one strategy or another is determined by language-specific rules, while in others, two or more strategies seem to be optional, and thus, somehow “unpredictable”. In this experimental study, we investigate the syntactic strategies employed in object focusing in Finnish and in Italian by examining the syntactic, semantic, and pragmatic features underlying the choice of a specific Focus strategy. In particular, the present experiment is aimed to investigate two strategies employed in both languages for object Focus realization, namely, Focus in situ and fronting, in order to verify whether the choice between them is influenced by a specific type of feature, a combination of Focus-related features, the verb category involved, or the interplay between these three factors. The incidence of alternative constructions, in particular clefting in Italian and the -hAn discourse marker in Finnish, is also taken into consideration, and relevant asymmetries are analyzed in a comprehensive, comparative account

Archivio della Ricerca - Università di Roma 3

A collaborative infrastructure for handling syntactic annotations

Author: Villemonte de La Clergerie Éric
Publication venue: HAL CCSD
Publication date: 01/01/2008
Field of study

International audienceWe believe that collaborative annotating is needed to build and improve large syntacti- cally annotated corpora such as tree banks or dependency banks. We present such a col- laborative infrastructure, implemented as a WEB service in the context of the French project Passage whose primary objective is the automatic syntactic annotating of a large corpus over 100 millions words

INRIA a CCSD electronic archive server

Hal-Diderot

The architecture of grammar and the division of labour in exponence

Author: Bermúdez-Otero Ricardo
Publication venue: 'Oxford University Press (OUP)'
Publication date: 27/09/2012
Field of study

Crossref

The University of Manchester - Institutional Repository

Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan languages

Author
Publication venue: Croatian Language Technologies Society, Faculty of Humanities and Social Science
Publication date: 01/01/2010
Field of study

Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan Languages publishes 17 papers that were presented at the conference organised in Dubrovnik, Croatia, 4-6 Octobre 2010

Repozitorij Filozofskog fakulteta u Zagrebu' at University of Zagreb