Search CORE

181 research outputs found

Automatic interlinear glossing as two-level sequence classification

Author: Samardžić Tanja
Schikowski Robert
Stoll Sabine
Publication venue: s.n.
Publication date: 01/01/2010
Field of study

We discuss the aspect of synchronisation in the language design and implementation of the asynchronous data flow language S-Net. Synchronisation is a crucial aspect of any coordination approach. S-Net provides a particularly simple construct, the synchrocell. As a primitive S-Net language construct synchrocell implements a one-off synchronisation of two data items of different type on a stream of such data items. We believe this semantics captures the essence of synchronisation, and no simpler design is possible. While the exact built-in behaviour as such is typically not what is required by S-Net application programmers, we show that in conjunction with other language features S-Net synchrocells meet typical demands for synchronisation in streaming networks quite well. Moreover, we argue that their simplistic design, in fact, is a necessary prerequisite to implement an even more interesting scenario: modelling state in streaming networks of stateless components. We finish with the outline of an efficient implementation by the S-Net runtime system

Crossref

ZORA

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Working papers in corpus linguistics and digital technologies : analyses and methodology Vol. 5. - INEL Corpora General Transcription and Annotation Principles

Author: Arkhipov Alexandre
Publication venue
Publication date: 01/01/2020
Field of study

University of Szeged

Handling word formation in comparative linguistics

Author: List J.
Schweikhard N.
Publication venue
Publication date: 01/06/2020
Field of study

Word formation plays a central role in human language. Yet computational approaches to historical linguistics often pay little attention to it. This means that the detailed findings of classical historical linguistics are often only used in qualitative studies, yet not in quantitative studies. Based on human- and machine-readable formats suggested by the CLDF-initiative, we propose a framework for the annotation of cross-linguistic etymological relations that allows for the differentiation between etymologies that involve only regular sound change and those that involve linear and non-linear processes of word formation. This paper introduces this approach by means of sample datasets and a small Python library to facilitate annotation

MPG.PuRe

Proceedings of the 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference (LREC2022), 20-25 June 2022, Marseille, France

Author: Chiarcos Christian
Declerck Thierry
Ionov Maxim
McCrae John Philip
Montiel Elena
Publication venue
Publication date: 20/04/2023
Field of study

OPUS Augsburg

Automatic morphosyntactic analysis of Light Warlpiri corpus data

Author: Welsh Gina Maree
Publication venue
Publication date: 01/01/2020
Field of study

Morphosyntactic analysis aligns a morphosyntactic tag (‘gloss’) for each word in a given text. Manual morphosyntactic glossing requires significant time and effort to implement on a larger scale, such as for a language corpus. Computational methods of automatic analysis can aid in automating this process. In this thesis, I applied a method of automatic morphosyntactic analysis to a set of Light Warlpiri corpus data (O’Shannessy, 2005). The method used the software tool Computerised Language Analysis (MacWhinney, 2000) to apply rules-based word analysis and syntactic disambiguation to the data. My thesis will describe how this method was adapted to the morphosyntactic properties of Light Warlpiri, as well as its performance on the corpus data. Overall, the method was successfully adapted to the Light Warlpiri data, with some recurring challenges noted. Finally, the thesis will discuss the variables within the workflow that affected the adaptation of the method, with emphasis on practical considerations

The Australian National University

Proceedings of the EACL 2009 Workshop on Language Technologies for African Languages

Author: De Pauw Guy
de Schryver Gilles-Maurice
Levin Lori
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2009
Field of study

Ghent University Academic Bibliography

The Human Language Project

Author: Abney Steven
Publication venue
Publication date: 08/02/2010
Field of study

This is a "white paper" proposing the construction of a "universal corpus" containing digitizations of the world's languages. The proposed corpus is community-built and community-owned.http://deepblue.lib.umich.edu/bitstream/2027.42/64990/1/proposal.pd

Deep Blue Documents at the University of Michigan

Inducing the Cross-Disciplinary Usage of Morphological Language Data Through Semantic Modelling

Author: Klimek Bettina
Publication venue
Publication date: 01/01/2020
Field of study

Despite the enormous technological advancements in the area of data creation and management the vast majority of language data still exists as digital single-use artefacts that are inaccessible for further research efforts. At the same time the advent of digitisation in science increased the possibilities for knowledge acquisition through the computational application of linguistic information for various disciplines. The purpose of this thesis, therefore, is to create the preconditions that enable the cross-disciplinary usage of morphological language data as a sub-area of linguistic data in order to induce a shared reusability for every research area that relies on such data. This involves the provision of morphological data on the Web under an open license and needs to take the prevalent diversity of data compilation into account. Various representation standards emerged across single disciplines which lead to heterogeneous data that differs with regard to complexity, scope and data formats. This situation requires a unifying foundation enabling direct reusability. As a solution to fill the gap of missing open data and to overcome the presence of isolated datasets a semantic data modelling approach is applied. Being rooted in the Linked Open Data (LOD) paradigm it pursues the creation of data as uniquely identifiable resources that are realised as URIs, accessible on the Web, available under an open license, interlinked with other resources, and adhere to Linked Data representation standards such as the RDF format. Each resource then contributes to the LOD cloud in which they are all interconnected. This unification results from ontologically shared bases that formally define the classification of resources and their relation to other resources in a semantically interoperable manner. Subsequently, the possibility of creating semantically structured data has sparked the formation of the Linguistic Linked Open Data (LLOD) research community and LOD sub-cloud containing primarily language resources. Over the last decade, ontologies emerged mainly for the domain of lexical language data which lead to a significant increase in Linked Data-based linguistic datasets. However, an equivalent model for morphological data is still missing, leading to a lack of this type of language data within the LLOD cloud. This thesis presents six publications that are concerned with the peculiarities of morphological data and the exploration of their semantic representation as an enabler of cross-disciplinary reuse. The Multilingual Morpheme Ontology (MMoOn Core) as well as an architectural framework for morphemic dataset creation as RDF resources are proposed as the first comprehensive domain representation model adhering to the LOD paradigm. It will be shown that MMoOn Core permits the joint representation of heterogeneous data sources such as interlinear glossed texts, inflection tables, the outputs of morphological analysers, lists of morphemic glosses or word-formation rules which are all equally labelled as “morphological data” across different research areas. Evidence for the applicability and adequacy of the semantic modelling entailed by the MMoOn Core ontology is provided by two datasets that were transformed from tabular data into RDF: the Hebrew Morpheme Inventory and Xhosa RDF dataset. Both further demonstrate how their integration into the LLOD cloud - by interlinking them with external language resources - yields insights that could not be obtained from the initial source data. Altogether the research conducted in this thesis establishes the foundation for an interoperable data exchange and the enrichment of morphological language data. It strives to achieve the broader goal of advancing language data-driven research by overcoming data barriers and discipline boundaries

edoc

2nd Conference on Language, Data and Knowledge (LDK 2019), May 20–23, 2019, Leipzig, Germany

Author: Buitelaar Paul
Chiarcos Christian
de Melo Gerard
Dojchinovski Milan
Eskevich Maria
Fäth Christian
Klimek Bettina
McCrae John P.
Publication venue
Publication date: 27/04/2023
Field of study

OPUS Augsburg

Proceedings

Author: Bick Eckhard
Hagen Kristin
Müürisep Kaili
Trosterud Trond
Publication venue
Publication date: 17/11/2011
Field of study

Proceedings of the NODALIDA 2011 Workshop Constraint Grammar Applications. Editors: Eckhard Bick, Kristin Hagen, Kaili Müürisep, Trond Trosterud. NEALT Proceedings Series, Vol. 14 (2011), vi+69 pp. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/19231

DSpace at Tartu University Library