57 research outputs found

    Proceedings

    Get PDF
    Proceedings of the NODALIDA 2011 Workshop Constraint Grammar Applications. Editors: Eckhard Bick, Kristin Hagen, Kaili Müürisep, Trond Trosterud. NEALT Proceedings Series, Vol. 14 (2011), vi+69 pp. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/19231

    English vs. Esperanto: A comparative study of clausal word order in a Minimalist framework

    Get PDF
    Both English and Esperanto are international auxiliary languages, but English is deemed as an SVO language with rigid word order, while Esperanto, although considered predominantly SVO, allows for relatively free constituent order according to some scholars. The goal of this thesis is to determine if this is the case and identify whether this difference in constituency leniency can be attributed to parametric differences between English and Esperanto. To answer this, the thesis seeks to uncover the underlying syntactic structure of Esperanto in transitive constructions and compare it to the syntactic structure of English. This thesis studies the order of the subject, object, and verb in both main and embedded clause types to identify potential parametric differences and analyse the patterns through the Minimalist framework, and the Principles and Parameters model. To identify which transitive word order patterns are common in English and Esperanto corpora studies were conducted for both languages to identify the word order patterns used and how often they occurred. The English data were retrieved from the Georgetown University Multilayer corpus, while Arbobanko were used form the Esperanto data. In addition to the corpus study, a survey was conducted for the Esperanto data to test the acceptability of each word order. My data reflect less word order variety in Esperanto than a previous study conducted by Gledhill (2000). My data does, however, reflect a greater word order variety in Esperanto than English as stated by other scholars. These differences found in word order patterns between the two languages could, however, not be accounted for by significant parametric differences. Instead, a greater variation in non-obligatory constituent movements

    Universal Discourse Representation Structure Parsing

    Get PDF
    We consider the task of crosslingual semantic parsing in the style of Discourse Representation Theory (DRT) where knowledge from annotated corpora in a resource-rich language is transferred via bitext to guide learning in other languages. We introduce Universal Discourse Representation Theory (UDRT), a variant of DRT that explicitly anchors semantic representations to tokens in the linguistic input. We develop a semantic parsing framework based on the Transformer architecture and utilize it to obtain semantic resources in multiple languages following two learning schemes. The many-to-one approach translates non-English text to English, and then runs a relatively accurate English parser on the translated text, while the one-to-many approach translates gold standard English to non-English text and trains multiple parsers (one per language) on the translations. Experimental results on the Parallel Meaning Bank show that our proposal outperforms strong baselines by a wide margin and can be used to construct (silver-standard) meaning banks for 99 languages

    Language Processing and the Artificial Mind: Teaching Code Literacy in the Humanities

    Get PDF
    Humanities majors often find themselves in jobs where they either manage programmers or work with them in close collaboration. These interactions often pose difficulties because specialists in literature, history, philosophy, and so on are not usually code literate. They do not understand what tasks computers are best suited to, or how programmers solve problems. Learning code literacy would be a great benefit to humanities majors, but the traditional computer science curriculum is heavily math oriented, and students outside of science and technology majors are often math averse. Yet they are often interested in language, linguistics, and science fiction. This thesis is a case study to explore whether computational linguistics and artificial intelligence provide a suitable setting for teaching basic code literacy. I researched, designed, and taught a course called “Language Processing and the Artificial Mind.” Instead of math, it focuses on language processing, artificial intelligence, and the formidable challenges that programmers face when trying to create machines that understand natural language. This thesis is a detailed description of the material, how the material was chosen, and the outcome for student learning. Student performance on exams indicates that students learned code literacy basics and important linguistics issues in natural language processing. An exit survey indicates that students found the course to be valuable, though a minority reacted negatively to the material on programming. Future studies should explore teaching code literacy with less programming and new ways to make coding more interesting to the target audience

    Cross-Lingual Link Discovery for Under-Resourced Languages

    Get PDF
    CC BY-NC 4.0In this paper, we provide an overview of current technologies for cross-lingual link discovery, and we discuss challenges, experiences and prospects of their application to under-resourced languages. We first introduce the goals of cross-lingual linking and associated technologies, and in particular, the role that the Linked Data paradigm (Bizer et al., 2011) applied to language data can play in this context. We define under-resourced languages with a specific focus on languages actively used on the internet, i.e., languages with a digitally versatile speaker community, but limited support in terms of language technology. We argue that languages for which considerable amounts of textual data and (at least) a bilingual word list are available, techniques for cross-lingual linking can be readily applied, and that these enable the implementation of downstream applications for under-resourced languages via the localisation and adaptation of existing technologies and resources

    NLP for Language Varieties of Italy: Challenges and the Path Forward

    Full text link
    Italy is characterized by a one-of-a-kind linguistic diversity landscape in Europe, which implicitly encodes local knowledge, cultural traditions, artistic expression, and history of its speakers. However, over 30 language varieties in Italy are at risk of disappearing within few generations. Language technology has a main role in preserving endangered languages, but it currently struggles with such varieties as they are under-resourced and mostly lack standardized orthography, being mainly used in spoken settings. In this paper, we introduce the linguistic context of Italy and discuss challenges facing the development of NLP technologies for Italy's language varieties. We provide potential directions and advocate for a shift in the paradigm from machine-centric to speaker-centric NLP. Finally, we propose building a local community towards responsible, participatory development of speech and language technologies for languages and dialects of Italy.Comment: 16 pages, 3 figures, 4 table
    corecore