523 research outputs found

    Diacritic Restoration and the Development of a Part-of-Speech Tagset for the Māori Language

    Get PDF
    This thesis investigates two fundamental problems in natural language processing: diacritic restoration and part-of-speech tagging. Over the past three decades, statistical approaches to diacritic restoration and part-of-speech tagging have grown in interest as a consequence of the increasing availability of manually annotated training data in major languages such as English and French. However, these approaches are not practical for most minority languages, where appropriate training data is either non-existent or not publically available. Furthermore, before developing a part-of-speech tagging system, a suitable tagset is required for that language. In this thesis, we make the following contributions to bridge this gap: Firstly, we propose a method for diacritic restoration based on naive Bayes classifiers that act at word-level. Classifications are based on a rich set of features, extracted automatically from training data in the form of diacritically marked text. This method requires no additional resources, which makes it language independent. The algorithm was evaluated on one language, namely Māori, and an accuracy exceeding 99% was observed. Secondly, we present our work on creating one of the necessary resources for the development of a part-of-speech tagging system in Māori, that of a suitable tagset. The tagset described was developed in accordance with the EAGLES guidelines for morphosyntactic annotation of corpora, and was the result of in-depth analysis of the Māori grammar

    An Empirical Analysis of the Role of Amplifiers, Downtoners, and Negations in Emotion Classification in Microblogs

    Full text link
    The effect of amplifiers, downtoners, and negations has been studied in general and particularly in the context of sentiment analysis. However, there is only limited work which aims at transferring the results and methods to discrete classes of emotions, e. g., joy, anger, fear, sadness, surprise, and disgust. For instance, it is not straight-forward to interpret which emotion the phrase "not happy" expresses. With this paper, we aim at obtaining a better understanding of such modifiers in the context of emotion-bearing words and their impact on document-level emotion classification, namely, microposts on Twitter. We select an appropriate scope detection method for modifiers of emotion words, incorporate it in a document-level emotion classification model as additional bag of words and show that this approach improves the performance of emotion classification. In addition, we build a term weighting approach based on the different modifiers into a lexical model for the analysis of the semantics of modifiers and their impact on emotion meaning. We show that amplifiers separate emotions expressed with an emotion- bearing word more clearly from other secondary connotations. Downtoners have the opposite effect. In addition, we discuss the meaning of negations of emotion-bearing words. For instance we show empirically that "not happy" is closer to sadness than to anger and that fear-expressing words in the scope of downtoners often express surprise.Comment: Accepted for publication at The 5th IEEE International Conference on Data Science and Advanced Analytics (DSAA), https://dsaa2018.isi.it

    Sentiment Analysis: State of the Art

    Get PDF
    We present the state of art in sentiment analysis which covers the purpose of sentiment analysis, levels of sentiment analysis and processes that could be used to measure polarity and classify labels. Moreover, brief details about some resources of sentiment analysis are included

    Syntax of Hungarian

    Get PDF
    Syntax of Hungarian aims to present a synthesis of the currently available syntactic knowledge of the Hungarian language, rooted in theory but providing highly detailed descriptions, and intended to be of use to researchers, as well as advanced students of language and linguistics. As research in language leads to extensive changes in our understanding and representations of grammar, the Comprehensive Grammar Resources series intends to present the most current understanding of grammar and syntax as completely as possible in a way that will both speak to modern linguists and serve as a resource for the non-specialist. This volume provides a comprehensive overview and description of coordinate structures, the syntactic and semantic types of conjunctions, as well as the types of ellipses in sentences and short dialogues. It discusses multiple conjunctions, coordinated wh-constructions, sluicing, and sentence fragments

    On the Impact of Emotions on Author Profiling

    Full text link
    This is the author’s version of a work that was accepted for publication in Information Processing and Management. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Information Processing and Management 52 (2016) 73–92. DOI 10.1016/j.ipm.2015.06.003.[EN] In this paper, we investigate the impact of emotions on author profiling, concretely identifying age and gender. Firstly, we propose the EmoGraph method for modelling the way people use the language to express themselves on the basis of an emotion-labelled graph. We apply this representation model for identifying gender and age in the Spanish partition of the PAN-AP-13 corpus, obtaining comparable results to the best performing systems of the PAN Lab of CLEF. © 2015 Elsevier B.V. All rights reserved.The work of the first author was partially funded by Autoritas Consulting SA and by Spanish Ministry of Economics under grant ECOPORTUNITY IPT-2012-1220-430000. The work of the second author was carried out in the framework of the WIQ-EI IRSES project (Grant No. 269180) within the FP 7 Marie Curie, the DIANA APPLICATIONS: Finding Hidden Knowledge in Texts: Applications (TIN2012-38603-C02-01) project and the VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems. A special mention to Maria Dolores Rangel Pardo for her linguistic contribution to this investigation.Rangel-Pardo, FM.; Rosso, P. (2016). On the Impact of Emotions on Author Profiling. Information Processing and Management. 52(1):73-92. https://doi.org/10.1016/j.ipm.2015.06.003S739252

    Language and Linguistics in a Complex World Data, Interdisciplinarity, Transfer, and the Next Generation. ICAME41 Extended Book of Abstracts

    Get PDF
    This is a collection of papers, work-in-progress reports, and other contributions that were part of the ICAME41 digital conference

    Language and Linguistics in a Complex World Data, Interdisciplinarity, Transfer, and the Next Generation. ICAME41 Extended Book of Abstracts

    Get PDF
    This is a collection of papers, work-in-progress reports, and other contributions that were part of the ICAME41 digital conference
    • 

    corecore