Search CORE

7,908 research outputs found

Syntactic annotation of non-canonical linguistic structures

Author: Doolittle Seanna
Hirschmann Hagen
Lüdeling Anke
Publication venue
Publication date: 27/10/2009
Field of study

This paper deals with the syntactic annotation of corpora that contain both ‘canonical’ and ‘non-canonical’ sentences

Hochschulschriftenserver - Universität Frankfurt am Main

ATLAS: A flexible and extensible architecture for linguistic annotation

Author: Bird Steven
Day David
Garofolo John
Henderson John
Laprun Christophe
Liberman Mark
Publication venue
Publication date: 01/01/2000
Field of study

We describe a formal model for annotating linguistic artifacts, from which we derive an application programming interface (API) to a suite of tools for manipulating these annotations. The abstract logical model provides for a range of storage formats and promotes the reuse of tools that interact through this API. We focus first on ``Annotation Graphs,'' a graph model for annotations on linear signals (such as text and speech) indexed by intervals, for which efficient database storage and querying techniques are applicable. We note how a wide range of existing annotated corpora can be mapped to this annotation graph model. This model is then generalized to encompass a wider variety of linguistic ``signals,'' including both naturally occuring phenomena (as recorded in images, video, multi-modal interactions, etc.), as well as the derived resources that are increasingly important to the engineering of natural language processing systems (such as word lists, dictionaries, aligned bilingual corpora, etc.). We conclude with a review of the current efforts towards implementing key pieces of this architecture.Comment: 8 pages, 9 figure

arXiv.org e-Print Archive

CiteSeerX

Atomic: an open-source software platform for multi-level corpus annotation

Author: Bierkandt Lennart
Druskat Stephan
Gast Volker
Rzymski Christoph
Zipser Florian
Publication venue
Publication date: 09/10/2014
Field of study

This paper presents Atomic, an open-source platform-independent desktop application for multi-level corpus annotation. Atomic aims at providing the linguistic community with a user-friendly annotation tool and sustainable platform through its focus on extensibility, a generic data model, and compatibility with existing linguistic formats. It is implemented on top of the Eclipse Rich Client Platform, a pluggable Java-based framework for creating client applications. Atomic - as a set of plug-ins for this framework - integrates with the platform and allows other researchers to develop and integrate further extensions to the software as needed. The generic graph-based meta model Salt serves as Atomic’s domain model and allows for unlimited annotation levels and types. Salt is also used as an intermediate model in the Pepper framework for conversion of linguistic data, which is fully integrated into Atomic, making the latter compatible with a wide range of linguistic formats. Atomic provides tools for both less experienced and expert annotators: graphical, mouse-driven editors and a command-line data manipulation language for rapid annotation

University of Hildesheim

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

FigShare

What linguists always wanted to know about german and did not know how to estimate

Author: Hinrichs Erhard
Kübler Sandra
Publication venue
Publication date: 01/01/2006
Field of study

This paper profiles significant differences in syntactic distribution and differences in word class frequencies for two treebanks of spoken and written German: the TüBa-D/S, a treebank of transliterated spontaneous dialogues, and the TüBa-D/Z treebank of newspaper articles published in the German daily newspaper die tageszeitung´(taz). The approach can be used more generally as a means of distinguishing and classifying language corpora of different genres

Hochschulschriftenserver - Universität Frankfurt am Main

The Validation of Speech Corpora

Author: Baumann Angela
Draxler Christoph
Ellbogen Tania
Hoole Phil
Schiel Florian
Steffen Alexander
Publication venue
Publication date: 01/01/2012
Field of study

1.2 Intended audience........................

CiteSeerX

Open Access LMU