Search CORE

57 research outputs found

Proceedings

Author: Bick Eckhard
Hagen Kristin
Müürisep Kaili
Trosterud Trond
Publication venue
Publication date: 17/11/2011
Field of study

Proceedings of the NODALIDA 2011 Workshop Constraint Grammar Applications. Editors: Eckhard Bick, Kristin Hagen, Kaili Müürisep, Trond Trosterud. NEALT Proceedings Series, Vol. 14 (2011), vi+69 pp. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/19231

DSpace at Tartu University Library

English vs. Esperanto: A comparative study of clausal word order in a Minimalist framework

Author
Publication venue: 'University of Agder'
Publication date: 01/01/2022
Field of study

Both English and Esperanto are international auxiliary languages, but English is deemed as an SVO language with rigid word order, while Esperanto, although considered predominantly SVO, allows for relatively free constituent order according to some scholars. The goal of this thesis is to determine if this is the case and identify whether this difference in constituency leniency can be attributed to parametric differences between English and Esperanto. To answer this, the thesis seeks to uncover the underlying syntactic structure of Esperanto in transitive constructions and compare it to the syntactic structure of English. This thesis studies the order of the subject, object, and verb in both main and embedded clause types to identify potential parametric differences and analyse the patterns through the Minimalist framework, and the Principles and Parameters model. To identify which transitive word order patterns are common in English and Esperanto corpora studies were conducted for both languages to identify the word order patterns used and how often they occurred. The English data were retrieved from the Georgetown University Multilayer corpus, while Arbobanko were used form the Esperanto data. In addition to the corpus study, a survey was conducted for the Esperanto data to test the acceptability of each word order. My data reflect less word order variety in Esperanto than a previous study conducted by Gledhill (2000). My data does, however, reflect a greater word order variety in Esperanto than English as stated by other scholars. These differences found in word order patterns between the two languages could, however, not be accounted for by significant parametric differences. Instead, a greater variation in non-obligatory constituent movements

Agder University Research Archive

Proceedings of the 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference (LREC2022), 20-25 June 2022, Marseille, France

Author: Chiarcos Christian
Declerck Thierry
Ionov Maxim
McCrae John Philip
Montiel Elena
Publication venue
Publication date: 20/04/2023
Field of study

OPUS Augsburg

Universal Discourse Representation Structure Parsing

Author: Bos Johan
Cohen S.B.
Lapata M.
Liu J.
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2021
Field of study

We consider the task of crosslingual semantic parsing in the style of Discourse Representation Theory (DRT) where knowledge from annotated corpora in a resource-rich language is transferred via bitext to guide learning in other languages. We introduce Universal Discourse Representation Theory (UDRT), a variant of DRT that explicitly anchors semantic representations to tokens in the linguistic input. We develop a semantic parsing framework based on the Transformer architecture and utilize it to obtain semantic resources in multiple languages following two learning schemes. The many-to-one approach translates non-English text to English, and then runs a relatively accurate English parser on the translated text, while the one-to-many approach translates gold standard English to non-English text and trains multiple parsers (one per language) on the translations. Experimental results on the Parallel Meaning Bank show that our proposal outperforms strong baselines by a wide margin and can be used to construct (silver-standard) meaning banks for 99 languages

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Language Processing and the Artificial Mind: Teaching Code Literacy in the Humanities

Author: Weltman Jerry Scott
Publication venue: LSU Digital Commons
Publication date: 01/01/2015
Field of study

Humanities majors often find themselves in jobs where they either manage programmers or work with them in close collaboration. These interactions often pose difficulties because specialists in literature, history, philosophy, and so on are not usually code literate. They do not understand what tasks computers are best suited to, or how programmers solve problems. Learning code literacy would be a great benefit to humanities majors, but the traditional computer science curriculum is heavily math oriented, and students outside of science and technology majors are often math averse. Yet they are often interested in language, linguistics, and science fiction. This thesis is a case study to explore whether computational linguistics and artificial intelligence provide a suitable setting for teaching basic code literacy. I researched, designed, and taught a course called “Language Processing and the Artificial Mind.” Instead of math, it focuses on language processing, artificial intelligence, and the formidable challenges that programmers face when trying to create machines that understand natural language. This thesis is a detailed description of the material, how the material was chosen, and the outcome for student learning. Student performance on exams indicates that students learned code literacy basics and important linguistics issues in natural language processing. An exit survey indicates that students found the course to be valuable, though a minority reacted negatively to the material on programming. Future studies should explore teaching code literacy with less programming and new ways to make coding more interesting to the target audience

Louisiana State University

From a time standard for medical informatics to a controlled language for health

Author: Buekens
Ceusters
E Van Der Haring
F Steurs
J Rogers
P Zanstra
Rector
W Ceusters
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Cross-Lingual Link Discovery for Under-Resourced Languages

Author: Ahmadi Sina
Apostol Elena-Simona
Bosque-Gil Julia
Chiarcos Christian
Dojchinovski Milan
Gkirtzou Katerina
Gracia Jorge
Gromann Dagmar
Liebeskind Chaya
Rosner Michael
Serasset Gilles
Truica Ciprian-Octavian
Valūnaitė-Oleškevičienė Giedrė
Publication venue
Publication date: 01/01/2022
Field of study

CC BY-NC 4.0In this paper, we provide an overview of current technologies for cross-lingual link discovery, and we discuss challenges, experiences and prospects of their application to under-resourced languages. We first introduce the goals of cross-lingual linking and associated technologies, and in particular, the role that the Linked Data paradigm (Bizer et al., 2011) applied to language data can play in this context. We define under-resourced languages with a specific focus on languages actively used on the internet, i.e., languages with a digitally versatile speaker community, but limited support in terms of language technology. We argue that languages for which considerable amounts of textual data and (at least) a bilingual word list are available, techniques for cross-lingual linking can be readily applied, and that these enable the implementation of downstream applications for under-resourced languages via the localisation and adaptation of existing technologies and resources

Mykolas Romeris University Institutional Repository

TAL Bibliography (1951-2002). Parte I

Author: Enea Alessandro
Gazzetti S.
Orsolini Paola
Pardelli Gabriella
Sassi Manuela
Publication venue
Publication date
Field of study

No abstract availabl

PUblication MAnagement

Recommended from our members

Semantic chunking

Author: Muszynska Ewa
Publication venue: University of Cambridge
Publication date: 01/10/2020
Field of study

Long sentences pose a challenge for natural language processing (NLP) applications. They are associated with a complex information structure leading to increased requirements for processing resources. Although the issue is present in many areas of research, there is little uniformity in the solutions used by research communities dedicated to individual NLP applications. Different aspects of the problem are addressed by different tasks, such as sentence simplification or shallow chunking. The main contribution of this thesis is the introduction of the task of semantic chunking as a general approach to reducing the cost of processing long sentences. The goal of semantic chunking is to find semantically contained fragments of a sentence representation that can be processed independently and recombined without loss of information. We anchor its principles in established concepts of semantic theory, in particular event and situation semantics. Most of the experiments in this thesis focus on semantic chunking defined on complex semantic representations in Dependency Minimal Recursion Semantics (DMRS), but we also demonstrate that the task can be performed on sentence strings. We present three chunking models: a) rule-based proof-of-concept DMRS chunking system; b) a semi-supervised sequence labelling neural model for surface semantic chunking; c) a system capable of finding semantic chunk boundaries based on the inherent structure of DMRS graphs, generalisable in the form of descriptive templates. We show how semantic chunking can be applied within a divide-and-conquer processing paradigm, using as an example the task of realization from DMRS. The application of semantic chunking yields noticeable efficiency gains without decreasing the quality of results

Apollo (Cambridge)

NLP for Language Varieties of Italy: Challenges and the Path Forward

Author: Ramponi Alan
Publication venue
Publication date: 20/09/2022
Field of study

Italy is characterized by a one-of-a-kind linguistic diversity landscape in Europe, which implicitly encodes local knowledge, cultural traditions, artistic expression, and history of its speakers. However, over 30 language varieties in Italy are at risk of disappearing within few generations. Language technology has a main role in preserving endangered languages, but it currently struggles with such varieties as they are under-resourced and mostly lack standardized orthography, being mainly used in spoken settings. In this paper, we introduce the linguistic context of Italy and discuss challenges facing the development of NLP technologies for Italy's language varieties. We provide potential directions and advocate for a shift in the paradigm from machine-centric to speaker-centric NLP. Finally, we propose building a local community towards responsible, participatory development of speech and language technologies for languages and dialects of Italy.Comment: 16 pages, 3 figures, 4 table

arXiv.org e-Print Archive