Search CORE

249 research outputs found

Annotation Automatique Des Connaissances Spatiales En Arabe

Author: Al-Hajj Moustafa
Hijazi Rita
Sabra Amani
Publication venue: 'European Scientific Institute, ESI'
Publication date: 31/05/2018
Field of study

In this paper, we introduce a rule-based approach to annotate Locative and Directional Expressions in Arabic natural language text. The annotation is based on a constructed semantic map of the spatiality domain. Challenges are twofold: first, we need to study how locative and directional expressions are expressed linguistically in these texts; and second, we need to automatically annotate the relevant textual segments accordingly. The research method we will use in this article is analytic-descriptive. We will validate this approach on specific novel rich with these expressions and show that it has very promising results. We will be using NOOJ as a software tool to implement finite-state transducers to annotate linguistic elements according to Locative and Directional Expressions. In conclusion, NOOJ allowed us to write linguistic rules for the automatic annotation in Arabic text of Locative and Directional Expressions

Crossref

European Scientific Journal, ESJ

European Scientific Journal (European Scientific Institute)

Prosodic segmentation and cross-linguistic comparison in CorpAfroAs and CorTypo: Corpus-driven and corpus-based approaches

Author: Mettouchi Amina
Vanhove Martine
Publication venue: University of Hawai'i Press
Publication date: 01/01/2021
Field of study

The paper addresses the issue of corpus-design in relation to research questions for under-described languages. It shows how a corpus emerges from the methodology and habitus of its contributors, and how it is shaped by the technical tools used for data organization. It also underlines the ways in which a morphosyntactically annotated corpus, segmented into intonation units, is amenable to a wide array of searches, both corpus-based and corpus-driven, and both formal and functional. After a presentation of the annotation layout, and the segmentation choices that characterize the two projects, CorpAfroAs and CorTypo, scientific results are illustrated for two languages, Kabyle and Beja, and more marginally for Zaar, Juba Arabic, and Modern Hebrew. They exemplify corpus-driven and corpus-based approaches of information structure and grammatical relations. Both types of approaches plead for an integrated view of prosody, closely interacting with syntax, semantics, phonology, information structure, and all levels of human communication and cognition. They also plead for a general endeavour to annotate as much as possible the large array of prosodic cues that are inseparable from speech processing and interaction dynamics

HAL Descartes

ScholarSpace at University of Hawai'i at Manoa

Clitics as calcified processing strategies

Author: Bouzouita Miriam
Chatzikyriakidis Stergios
Publication venue: CSLI Publications
Publication date: 01/01/2009
Field of study

Ghent University Academic Bibliography

Can humain association norm evaluate latent semantic analysis?

Author: Gatkowska Izabela
Korzycki Michał
Lubaszewski Wiesław
Publication venue: [s.n.]
Publication date: 01/01/2013
Field of study

This paper presents the comparison of word association norm created by a psycholinguistic experiment to association lists generated by algorithms operating on text corpora. We compare lists generated by Church and Hanks algorithm and lists generated by LSA algorithm. An argument is presented on how those automatically generated lists reflect real semantic relations

Jagiellonian Univeristy Repository

Max-Planck-Institute for Psycholinguistics: Annual Report 2001

Author: Kelly A.
Melinger A.
Publication venue: MPI for Psycholinguistics
Publication date: 01/01/2001
Field of study

MPG.PuRe

The literal/non-literal divide synchronically and diachronically: The lexical semantics of an English posture verb

Author: Fraser Katherine Elisabeth
Publication venue
Publication date: 12/12/2022
Field of study

This thesis' main research goal is to provide an account of the English posture verb sit, from a synchronic a diachronic perspective. My proposed account of sit comprises various components, including a characterisation of the different possible meanings of sit and a comparison with stand and lie. The two relevant meanings are a literal one and non-literal one (The girl is sitting on the chair vs. The wine bottle is sitting on the chair; in the former the subject is described to be in a sitting position, while in the latter the subject is not in a sitting position). I analyse each meaning/use separately, noting which semantic patterns occur with one type only and those which occur with both. I argue that the non-literal use is diachronically connected to the literal one, and I motivate this claim based on the shared components identified in the thesis and on data from corpus studies reported in the thesis. A consequence of acknowledging a divide between the literal and non-literal uses---a perspective not usually taken in theoretical linguistics---is that I am able to account for important semantic details which might be otherwise overlooked. The cognitive and typological literature includes account of posture verbs cross-linguistically, but in the theoretical literature these verbs have not received much attention. In this thesis, I review existing proposals and highlight the uncertainties surrounding the posture verbs. In order to fillthese gaps in the literature and to better understand the phenomena, I analyse data from synchronic and diachronic corpus studies, and incorporate these insights into my account of sitEl principal objetivo de investigación de esta tesis es dar cuenta del verbo de postura inglés sit (`sentarse¿), desde una perspectiva sincrónica y diacrónica. La descripción que propongo de sit comprende varios componentes, incluida una caracterización de los diferentes significados posibles de sit y una comparación con stand (`estar de pie¿) y lie (`estar echado¿). La literatura cognitiva y tipológica incluye una descripción de los verbos de postura de forma interlingüística, pero en la literatura teórica estos verbos no han recibido mucha atención. En esta tesis, reviso las propuestas existentes y destaco las preguntas sin responder que rodean a los verbos de postura. Para llenar estos vacíos en la literatura científica y comprender mejor los fenómenos, analizo datos de estudios de corpus sincrónicos y diacrónicos, e incorporo estos conocimientos en mi explicación de sit. Los dos significados relevantes son uno literal y uno no literal (The girl is sitting on the chair `La niña está sentada en la silla' vs. The wine bottle is sitting on the chair `(lit.) La botella de vino está sentada en la silla¿; en la primera frase, se describe el sujeto en posición de estar sentado, mientras que en la segunda frase el sujeto no está sentado). Analizo cada significado/uso por separado, notando qué patrones semánticos ocurren con un solo tipo y cuáles ocurren con ambos. Argumento que el uso no literal está conectado diacrónicamente con el literal, y motivo esta afirmación a partir de los componentes compartidos identificados en la tesis y en los datos de los estudios de corpus tratados en la tesis. Una consecuencia de reconocer una división entre los usos literales y no literales (una perspectiva que no suele adoptarse en la lingüística teórica) es que se consigue dar cuenta de importantes detalles semánticos que de otro modo podrían pasarse por alto

Archivo Digital para la Docencia y la Investigación

Treebank-based acquisition of Chinese LFG resources for parsing and generation

Author: Guo Yuqing
Publication venue: Dublin City University. School of Computing
Publication date: 01/11/2009
Field of study

This thesis describes a treebank-based approach to automatically acquire robust,wide-coverage Lexical-Functional Grammar (LFG) resources for Chinese parsing and generation, which is part of a larger project on the rapid construction of deep, large-scale, constraint-based, multilingual grammatical resources. I present an application-oriented LFG analysis for Chinese core linguistic phenomena and (in cooperation with PARC) develop a gold-standard dependency-bank of Chinese f-structures for evaluation. Based on the Penn Chinese Treebank, I design and implement two architectures for inducing Chinese LFG resources, one annotation-based and the other dependency conversion-based. I then apply the f-structure acquisition algorithm together with external, state-of-the-art parsers to parsing new text into "proto" f-structures. In order to convert "proto" f-structures into "proper" f-structures or deep dependencies, I present a novel Non-Local Dependency (NLD) recovery algorithm using subcategorisation frames and f-structure paths linking antecedents and traces in NLDs extracted from the automatically-built LFG f-structure treebank. Based on the grammars extracted from the f-structure annotated treebank, I develop a PCFG-based chart generator and a new n-gram based pure dependency generator to realise Chinese sentences from LFG f-structures. The work reported in this thesis is the first effort to scale treebank-based, probabilistic Chinese LFG resources from proof-of-concept research to unrestricted, real text. Although this thesis concentrates on Chinese and LFG, many of the methodologies, e.g. the acquisition of predicate-argument structures, NLD resolution and the PCFG- and dependency n-gram-based generation models, are largely language and formalism independent and should generalise to diverse languages as well as to labelled bilexical dependency representations other than LFG

Irish Universities

DCU Online Research Access Service

Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan languages

Author
Publication venue: Croatian Language Technologies Society, Faculty of Humanities and Social Science
Publication date: 01/01/2010
Field of study

Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan Languages publishes 17 papers that were presented at the conference organised in Dubrovnik, Croatia, 4-6 Octobre 2010

Repozitorij Filozofskog fakulteta u Zagrebu' at University of Zagreb

Cross-linguistic trade-offs and causal relationships between cues to grammatical subject and object, and the problem of efficiency-related explanations

Author: Levshina N.
Publication venue: 'Frontiers Media SA'
Publication date: 01/07/2021
Field of study

Cross-linguistic studies focus on inverse correlations (trade-offs) between linguistic variables that reflect different cues to linguistic meanings. For example, if a language has no case marking, it is likely to rely on word order as a cue for identification of grammatical roles. Such inverse correlations are interpreted as manifestations of language users’ tendency to use language efficiently. The present study argues that this interpretation is problematic. Linguistic variables, such as the presence of case, or flexibility of word order, are aggregate properties, which do not represent the use of linguistic cues in context directly. Still, such variables can be useful for circumscribing the potential role of communicative efficiency in language evolution, if we move from cross-linguistic trade-offs to multivariate causal networks. This idea is illustrated by a case study of linguistic variables related to four types of Subject and Object cues: case marking, rigid word order of Subject and Object, tight semantics and verb-medial order. The variables are obtained from online language corpora in thirty languages, annotated with the Universal Dependencies. The causal model suggests that the relationships between the variables can be explained predominantly by sociolinguistic factors, leaving little space for a potential impact of efficient linguistic behavior

Directory of Open Access Journals

PubMed Central

MPG.PuRe