6,002 research outputs found

    Joint morphological-lexical language modeling for processing morphologically rich languages with application to dialectal Arabic

    Get PDF
    Language modeling for an inflected language such as Arabic poses new challenges for speech recognition and machine translation due to its rich morphology. Rich morphology results in large increases in out-of-vocabulary (OOV) rate and poor language model parameter estimation in the absence of large quantities of data. In this study, we present a joint morphological-lexical language model (JMLLM) that takes advantage of Arabic morphology. JMLLM combines morphological segments with the underlying lexical items and additional available information sources with regards to morphological segments and lexical items in a single joint model. Joint representation and modeling of morphological and lexical items reduces the OOV rate and provides smooth probability estimates while keeping the predictive power of whole words. Speech recognition and machine translation experiments in dialectal-Arabic show improvements over word and morpheme based trigram language models. We also show that as the tightness of integration between different information sources increases, both speech recognition and machine translation performances improve

    Identifying Emotions in Social Media: Comparison of Word-emotion lexica

    Get PDF
    In recent years, emotions expressed in social media messages have become a vivid research topic due to their influence on the spread of misinformation and online radicalization over online social networks. Thus, it is important to correctly identify emotions in order to make inferences from social media messages. In this paper, we report on the performance of three publicly available word-emotion lexicons (NRC, DepecheMood, EmoSenticNet) over a set of Facebook and Twitter messages. To this end, we designed and implemented an algorithm that applies natural language processing (NLP) techniques along with a number of heuristics that reflect the way humans naturally assess emotions in written texts. In order to evaluate the appropriateness of the obtained emotion scores, we conducted a questionnaire-based survey with human raters. Our results show that there are noticeable differences between the performance of the lexicons as well as with respect to emotion scores the human raters provided in our surve

    Noise or music? Investigating the usefulness of normalisation for robust sentiment analysis on social media data

    Get PDF
    In the past decade, sentiment analysis research has thrived, especially on social media. While this data genre is suitable to extract opinions and sentiment, it is known to be noisy. Complex normalisation methods have been developed to transform noisy text into its standard form, but their effect on tasks like sentiment analysis remains underinvestigated. Sentiment analysis approaches mostly include spell checking or rule-based normalisation as preprocess- ing and rarely investigate its impact on the task performance. We present an optimised sentiment classifier and investigate to what extent its performance can be enhanced by integrating SMT-based normalisation as preprocessing. Experiments on a test set comprising a variety of user-generated content genres revealed that normalisation improves sentiment classification performance on tweets and blog posts, showing the model’s ability to generalise to other data genres

    An Empirical Analysis of the Role of Amplifiers, Downtoners, and Negations in Emotion Classification in Microblogs

    Full text link
    The effect of amplifiers, downtoners, and negations has been studied in general and particularly in the context of sentiment analysis. However, there is only limited work which aims at transferring the results and methods to discrete classes of emotions, e. g., joy, anger, fear, sadness, surprise, and disgust. For instance, it is not straight-forward to interpret which emotion the phrase "not happy" expresses. With this paper, we aim at obtaining a better understanding of such modifiers in the context of emotion-bearing words and their impact on document-level emotion classification, namely, microposts on Twitter. We select an appropriate scope detection method for modifiers of emotion words, incorporate it in a document-level emotion classification model as additional bag of words and show that this approach improves the performance of emotion classification. In addition, we build a term weighting approach based on the different modifiers into a lexical model for the analysis of the semantics of modifiers and their impact on emotion meaning. We show that amplifiers separate emotions expressed with an emotion- bearing word more clearly from other secondary connotations. Downtoners have the opposite effect. In addition, we discuss the meaning of negations of emotion-bearing words. For instance we show empirically that "not happy" is closer to sadness than to anger and that fear-expressing words in the scope of downtoners often express surprise.Comment: Accepted for publication at The 5th IEEE International Conference on Data Science and Advanced Analytics (DSAA), https://dsaa2018.isi.it
    • …
    corecore