249 research outputs found

    Annotation Automatique Des Connaissances Spatiales En Arabe

    Get PDF
    In this paper, we introduce a rule-based approach to annotate Locative and Directional Expressions in Arabic natural language text. The annotation is based on a constructed semantic map of the spatiality domain. Challenges are twofold: first, we need to study how locative and directional expressions are expressed linguistically in these texts; and second, we need to automatically annotate the relevant textual segments accordingly. The research method we will use in this article is analytic-descriptive. We will validate this approach on specific novel rich with these expressions and show that it has very promising results. We will be using NOOJ as a software tool to implement finite-state transducers to annotate linguistic elements according to Locative and Directional Expressions. In conclusion, NOOJ allowed us to write linguistic rules for the automatic annotation in Arabic text of Locative and Directional Expressions

    Prosodic segmentation and cross-linguistic comparison in CorpAfroAs and CorTypo: Corpus-driven and corpus-based approaches

    Get PDF
    The paper addresses the issue of corpus-design in relation to research questions for under-described languages. It shows how a corpus emerges from the methodology and habitus of its contributors, and how it is shaped by the technical tools used for data organization. It also underlines the ways in which a morphosyntactically annotated corpus, segmented into intonation units, is amenable to a wide array of searches, both corpus-based and corpus-driven, and both formal and functional. After a presentation of the annotation layout, and the segmentation choices that characterize the two projects, CorpAfroAs and CorTypo, scientific results are illustrated for two languages, Kabyle and Beja, and more marginally for Zaar, Juba Arabic, and Modern Hebrew. They exemplify corpus-driven and corpus-based approaches of information structure and grammatical relations. Both types of approaches plead for an integrated view of prosody, closely interacting with syntax, semantics, phonology, information structure, and all levels of human communication and cognition. They also plead for a general endeavour to annotate as much as possible the large array of prosodic cues that are inseparable from speech processing and interaction dynamics

    Can humain association norm evaluate latent semantic analysis?

    Get PDF
    This paper presents the comparison of word association norm created by a psycholinguistic experiment to association lists generated by algorithms operating on text corpora. We compare lists generated by Church and Hanks algorithm and lists generated by LSA algorithm. An argument is presented on how those automatically generated lists reflect real semantic relations

    Max-Planck-Institute for Psycholinguistics: Annual Report 2001

    No full text

    The literal/non-literal divide synchronically and diachronically: The lexical semantics of an English posture verb

    Get PDF
    This thesis' main research goal is to provide an account of the English posture verb sit, from a synchronic a diachronic perspective. My proposed account of sit comprises various components, including a characterisation of the different possible meanings of sit and a comparison with stand and lie. The two relevant meanings are a literal one and non-literal one (The girl is sitting on the chair vs. The wine bottle is sitting on the chair; in the former the subject is described to be in a sitting position, while in the latter the subject is not in a sitting position). I analyse each meaning/use separately, noting which semantic patterns occur with one type only and those which occur with both. I argue that the non-literal use is diachronically connected to the literal one, and I motivate this claim based on the shared components identified in the thesis and on data from corpus studies reported in the thesis. A consequence of acknowledging a divide between the literal and non-literal uses---a perspective not usually taken in theoretical linguistics---is that I am able to account for important semantic details which might be otherwise overlooked. The cognitive and typological literature includes account of posture verbs cross-linguistically, but in the theoretical literature these verbs have not received much attention. In this thesis, I review existing proposals and highlight the uncertainties surrounding the posture verbs. In order to fillthese gaps in the literature and to better understand the phenomena, I analyse data from synchronic and diachronic corpus studies, and incorporate these insights into my account of sitEl principal objetivo de investigación de esta tesis es dar cuenta del verbo de postura inglés sit (`sentarse¿), desde una perspectiva sincrónica y diacrónica. La descripción que propongo de sit comprende varios componentes, incluida una caracterización de los diferentes significados posibles de sit y una comparación con stand (`estar de pie¿) y lie (`estar echado¿). La literatura cognitiva y tipológica incluye una descripción de los verbos de postura de forma interlingüística, pero en la literatura teórica estos verbos no han recibido mucha atención. En esta tesis, reviso las propuestas existentes y destaco las preguntas sin responder que rodean a los verbos de postura. Para llenar estos vacíos en la literatura científica y comprender mejor los fenómenos, analizo datos de estudios de corpus sincrónicos y diacrónicos, e incorporo estos conocimientos en mi explicación de sit. Los dos significados relevantes son uno literal y uno no literal (The girl is sitting on the chair `La niña está sentada en la silla' vs. The wine bottle is sitting on the chair `(lit.) La botella de vino está sentada en la silla¿; en la primera frase, se describe el sujeto en posición de estar sentado, mientras que en la segunda frase el sujeto no está sentado). Analizo cada significado/uso por separado, notando qué patrones semánticos ocurren con un solo tipo y cuáles ocurren con ambos. Argumento que el uso no literal está conectado diacrónicamente con el literal, y motivo esta afirmación a partir de los componentes compartidos identificados en la tesis y en los datos de los estudios de corpus tratados en la tesis. Una consecuencia de reconocer una división entre los usos literales y no literales (una perspectiva que no suele adoptarse en la lingüística teórica) es que se consigue dar cuenta de importantes detalles semánticos que de otro modo podrían pasarse por alto

    Treebank-based acquisition of Chinese LFG resources for parsing and generation

    Get PDF
    This thesis describes a treebank-based approach to automatically acquire robust,wide-coverage Lexical-Functional Grammar (LFG) resources for Chinese parsing and generation, which is part of a larger project on the rapid construction of deep, large-scale, constraint-based, multilingual grammatical resources. I present an application-oriented LFG analysis for Chinese core linguistic phenomena and (in cooperation with PARC) develop a gold-standard dependency-bank of Chinese f-structures for evaluation. Based on the Penn Chinese Treebank, I design and implement two architectures for inducing Chinese LFG resources, one annotation-based and the other dependency conversion-based. I then apply the f-structure acquisition algorithm together with external, state-of-the-art parsers to parsing new text into "proto" f-structures. In order to convert "proto" f-structures into "proper" f-structures or deep dependencies, I present a novel Non-Local Dependency (NLD) recovery algorithm using subcategorisation frames and f-structure paths linking antecedents and traces in NLDs extracted from the automatically-built LFG f-structure treebank. Based on the grammars extracted from the f-structure annotated treebank, I develop a PCFG-based chart generator and a new n-gram based pure dependency generator to realise Chinese sentences from LFG f-structures. The work reported in this thesis is the first effort to scale treebank-based, probabilistic Chinese LFG resources from proof-of-concept research to unrestricted, real text. Although this thesis concentrates on Chinese and LFG, many of the methodologies, e.g. the acquisition of predicate-argument structures, NLD resolution and the PCFG- and dependency n-gram-based generation models, are largely language and formalism independent and should generalise to diverse languages as well as to labelled bilexical dependency representations other than LFG

    Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan languages

    Get PDF
    Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan Languages publishes 17 papers that were presented at the conference organised in Dubrovnik, Croatia, 4-6 Octobre 2010

    Cross-linguistic trade-offs and causal relationships between cues to grammatical subject and object, and the problem of efficiency-related explanations

    Get PDF
    Cross-linguistic studies focus on inverse correlations (trade-offs) between linguistic variables that reflect different cues to linguistic meanings. For example, if a language has no case marking, it is likely to rely on word order as a cue for identification of grammatical roles. Such inverse correlations are interpreted as manifestations of language users’ tendency to use language efficiently. The present study argues that this interpretation is problematic. Linguistic variables, such as the presence of case, or flexibility of word order, are aggregate properties, which do not represent the use of linguistic cues in context directly. Still, such variables can be useful for circumscribing the potential role of communicative efficiency in language evolution, if we move from cross-linguistic trade-offs to multivariate causal networks. This idea is illustrated by a case study of linguistic variables related to four types of Subject and Object cues: case marking, rigid word order of Subject and Object, tight semantics and verb-medial order. The variables are obtained from online language corpora in thirty languages, annotated with the Universal Dependencies. The causal model suggests that the relationships between the variables can be explained predominantly by sociolinguistic factors, leaving little space for a potential impact of efficient linguistic behavior
    corecore