423 research outputs found

    Particles, word order, and intonation

    Get PDF
    Synopsis: This study explores information structure (IS) within the framework of corpus linguistics and functional linguistics. As a case study, it investigates IS phenomena in spoken Japanese: particles including so-called topic particles, case particles, and zero particles; word order; and intonation. The study discusses how these phenomena are related to cognitive and communicative mechanisms of humans

    Co-reference annotation and resources: a multilingual corpus of typologically diverse languages

    Get PDF
    This article introduces a dialogue corpus containing data from two typologically different languages, Japanese and Kilivila. The corpus is annotated in accordance with language specific annotation schemes for co-referential and similar relations. The article describes the corpus data, the properties of language specific co-reference in the two languages and a methodology for its annotation. Examples from the corpus show how this methodology is used in the workflow of the annotation process

    Information structure in spoken Japanese

    Get PDF
    This study explores information structure (IS) within the framework of corpus linguistics and functional linguistics. As a case study, it investigates IS phenomena in spoken Japanese: particles including so-called topic particles, case particles, and zero particles; word order; and intonation. The study discusses how these phenomena are related to cognitive and communicative mechanisms of humans

    Discourse markers in Slovenian and their applicability for developing speech-to-speech translation technologies

    Get PDF

    Towards Entity Status

    Get PDF
    Discourse entities are an important construct in computational linguistics. They introduce an additional level of representation between referring expressions and that which they refer to: the level of mental representation. In this thesis, I first explore some semiotic and communication theoretic aspects of discourse entities. Then, I develop the concept of "entity status". Entity status is a meta-variable that collects two dimensions formations about the role that an entity plays a discourse, and management informations about how the entity is created, accessed, and updated. Finally, the concept is applied to two case studies: the first one focusses on the choice of referring expressions in radio news, while the second looks at the conditions under which a discourse entity can be mentioned as a pronoun.Diskursentitäten sind ein wichtiger Konstrukt in der Computerlinguistik. Sie führen eine zusätzliche Repräsentationsebene ein zwischen referierenden Ausdrücken, und dem, auf das diese Ausdrücke referieren: die Ebene der mentalen Repräsentation. In dieser Dissertation erkunde ich zunächst einige semiotische und kommunikationstheoretische Aspekte von Diskursentitäten. Danach führe ich den Begriff des "Entitätenstatus" ein. Entitätenstatus ist eine Meta-Variable, die zwei Dimensionen von Information über eine Diskursentität vereinigt: Struktur-Informationen über die Rolle, die eine Entität im Diskurs spielt, und Verwaltungs-Informationen über Erstellung, Zugriff und Update. Dieser Begriff wird schlussendlich auf zwei Fallstudien angewendet: die erste Studie konzentriert sich auf die Wahl referierender Ausdrücke in Radionachrichten, während die zweite Studie die Bedingungen untersucht, in denen eine Diskursentität als Pronomen erwähnt werden kann

    Anaphora resolution for Arabic machine translation :a case study of nafs

    Get PDF
    PhD ThesisIn the age of the internet, email, and social media there is an increasing need for processing online information, for example, to support education and business. This has led to the rapid development of natural language processing technologies such as computational linguistics, information retrieval, and data mining. As a branch of computational linguistics, anaphora resolution has attracted much interest. This is reflected in the large number of papers on the topic published in journals such as Computational Linguistics. Mitkov (2002) and Ji et al. (2005) have argued that the overall quality of anaphora resolution systems remains low, despite practical advances in the area, and that major challenges include dealing with real-world knowledge and accurate parsing. This thesis investigates the following research question: can an algorithm be found for the resolution of the anaphor nafs in Arabic text which is accurate to at least 90%, scales linearly with text size, and requires a minimum of knowledge resources? A resolution algorithm intended to satisfy these criteria is proposed. Testing on a corpus of contemporary Arabic shows that it does indeed satisfy the criteria.Egyptian Government

    A Survey on Semantic Processing Techniques

    Full text link
    Semantic processing is a fundamental research domain in computational linguistics. In the era of powerful pre-trained language models and large language models, the advancement of research in this domain appears to be decelerating. However, the study of semantics is multi-dimensional in linguistics. The research depth and breadth of computational semantic processing can be largely improved with new technologies. In this survey, we analyzed five semantic processing tasks, e.g., word sense disambiguation, anaphora resolution, named entity recognition, concept extraction, and subjectivity detection. We study relevant theoretical research in these fields, advanced methods, and downstream applications. We connect the surveyed tasks with downstream applications because this may inspire future scholars to fuse these low-level semantic processing tasks with high-level natural language processing tasks. The review of theoretical research may also inspire new tasks and technologies in the semantic processing domain. Finally, we compare the different semantic processing techniques and summarize their technical trends, application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN 1566-2535. The equal contribution mark is missed in the published version due to the publication policies. Please contact Prof. Erik Cambria for detail

    Proceedings of the VIIth GSCP International Conference

    Get PDF
    The 7th International Conference of the Gruppo di Studi sulla Comunicazione Parlata, dedicated to the memory of Claire Blanche-Benveniste, chose as its main theme Speech and Corpora. The wide international origin of the 235 authors from 21 countries and 95 institutions led to papers on many different languages. The 89 papers of this volume reflect the themes of the conference: spoken corpora compilation and annotation, with the technological connected fields; the relation between prosody and pragmatics; speech pathologies; and different papers on phonetics, speech and linguistic analysis, pragmatics and sociolinguistics. Many papers are also dedicated to speech and second language studies. The online publication with FUP allows direct access to sound and video linked to papers (when downloaded)

    Demonstratives in discourse

    Get PDF
    This volume explores the use of demonstratives in the structuring and management of discourse, and their role as engagement expressions, from a crosslinguistic perspective. It seeks to establish which types of discourse-related functions are commonly encoded by demonstratives, beyond the well-established reference-tracking and deictic uses, and also investigates which members of demonstrative paradigms typically take on certain functions. Moreover, it looks at the roles of non-deictic demonstratives, that is, members of the paradigm which are dedicated e.g. to contrastive, recognitional, or anaphoric functions and do not express deictic distinctions. Several of the studies also focus on manner demonstratives, which have been little studied from a crosslinguistic perspective. The volume thus broadens the scope of investigation of demonstratives to look at how their core functions interact with a wider range of discourse functions in a number of different languages. The volume covers languages from a range of geographical locations and language families, including Cushitic and Mande languages in Africa, Oceanic and Papuan languages in the Pacific region, Algonquian and Guaykuruan in the Americas, and Germanic, Slavic and Finno-Ugric languages in the Eurasian region. It also includes two papers taking a broader typological approach to specific discourse functions of demonstratives
    • …
    corecore