838 research outputs found

    Learning domain-specific sentiment lexicon with supervised sentiment-aware LDA

    Get PDF
    Frontiers in Artificial Intelligence and Applications, v. 263 entitled: ECAI 2014: 21st European Conference on Artificial Intelligence, 18-22 August 2014, Prague, Czech Republic - Including Prestigious Applications of Intelligent Systems (PAIS 2014)Analyzing and understanding people's sentiments towards different topics has become an interesting task due to the explosion of opinion-rich resources. In most sentiment analysis applications, sentiment lexicons play a crucial role, to be used as metadata of sentiment polarity. However, most previous works focus on discovering general-purpose sentiment lexicons. They cannot capture domain-specific sentiment words, or implicit and connotative sentiment words that are seemingly objective. In this paper, we propose a supervised sentiment-aware LDA model (ssLDA). The model uses a minimal set of domain-independent seed words and document labels to discover a domain-specific lexicon, learning a lexicon much richer and adaptive to the sentiment of specific document. Experiments on two publicly-available datasets (movie reviews and Obama-McCain debate dataset) show that our model is effective in constructing a comprehensive and high-quality domain-specific sentiment lexicon. Furthermore, the resulting lexicon significantly improves the performance of sentiment classification tasks. © 2014 The Authors and IOS Press.published_or_final_versio

    LCCT: a semisupervised model for sentiment classification

    Get PDF
    Conference Theme: Human Language TechnologiesAnalyzing public opinions towards products, services and social events is an important but challenging task. An accurate sentiment analyzer should take both lexicon-level information and corpus-level information into account. It also needs to exploit the domain-specific knowledge and utilize the common knowledge shared across domains. In addition, we want the algorithm being able to deal with missing labels and learning from incomplete sentiment lexicons. This paper presents a LCCT (Lexicon-based and Corpus-based, Co-Training) model for semi-supervised sentiment classification. The proposed method combines the idea of lexicon-based learning and corpus-based learning in a unified co-training framework. It is capable of incorporating both domain-specific and domain-independent knowledge. Extensive experiments show that it achieves very competitive classification accuracy, even with a small portion of labeled data. Comparing to state-of-the-art sentiment classification methods, the LCCT approach exhibits significantly better performances on a variety of datasets in both English and Chinese. © 2015 Association for Computational Linguisticspublished_or_final_versio

    Weakly-Supervised Joint Sentiment-Topic Detection from Text

    Get PDF
    publication-status: Acceptedtypes: ArticleCopyright © 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Sentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework called joint sentiment-topic (JST) model based on latent Dirichlet allocation (LDA), which detects sentiment and topic simultaneously from text. A reparameterized version of the JST model called Reverse-JST, by reversing the sequence of sentiment and topic generation in the modelling process, is also studied. Although JST is equivalent to Reverse-JST without hierarchical prior, extensive experiments show that when sentiment priors are added, JST performs consistently better than Reverse-JST. Besides, unlike supervised approaches to sentiment classification which often fail to produce satisfactory performance when shifting to other domains, the weakly-supervised nature of JST makes it highly portable to other domains. This is verified by the experimental results on datasets from five different domains where the JST model even outperforms existing semi-supervised approaches in some of the datasets despite using no labelled documents. Moreover, the topics and topic sentiment detected by JST are indeed coherent and informative. We hypothesize that the JST model can readily meet the demand of large-scale sentiment analysis from the web in an open-ended fashion

    Ordering-sensitive and Semantic-aware Topic Modeling

    Full text link
    Topic modeling of textual corpora is an important and challenging problem. In most previous work, the "bag-of-words" assumption is usually made which ignores the ordering of words. This assumption simplifies the computation, but it unrealistically loses the ordering information and the semantic of words in the context. In this paper, we present a Gaussian Mixture Neural Topic Model (GMNTM) which incorporates both the ordering of words and the semantic meaning of sentences into topic modeling. Specifically, we represent each topic as a cluster of multi-dimensional vectors and embed the corpus into a collection of vectors generated by the Gaussian mixture model. Each word is affected not only by its topic, but also by the embedding vector of its surrounding words and the context. The Gaussian mixture components and the topic of documents, sentences and words can be learnt jointly. Extensive experiments show that our model can learn better topics and more accurate word distributions for each topic. Quantitatively, comparing to state-of-the-art topic modeling approaches, GMNTM obtains significantly better performance in terms of perplexity, retrieval accuracy and classification accuracy.Comment: To appear in proceedings of AAAI 201

    Task-specific Word Identification from Short Texts Using a Convolutional Neural Network

    Full text link
    Task-specific word identification aims to choose the task-related words that best describe a short text. Existing approaches require well-defined seed words or lexical dictionaries (e.g., WordNet), which are often unavailable for many applications such as social discrimination detection and fake review detection. However, we often have a set of labeled short texts where each short text has a task-related class label, e.g., discriminatory or non-discriminatory, specified by users or learned by classification algorithms. In this paper, we focus on identifying task-specific words and phrases from short texts by exploiting their class labels rather than using seed words or lexical dictionaries. We consider the task-specific word and phrase identification as feature learning. We train a convolutional neural network over a set of labeled texts and use score vectors to localize the task-specific words and phrases. Experimental results on sentiment word identification show that our approach significantly outperforms existing methods. We further conduct two case studies to show the effectiveness of our approach. One case study on a crawled tweets dataset demonstrates that our approach can successfully capture the discrimination-related words/phrases. The other case study on fake review detection shows that our approach can identify the fake-review words/phrases.Comment: accepted by Intelligent Data Analysis, an International Journa

    Multilingual opinion mining

    Get PDF
    170 p.Cada día se genera gran cantidad de texto en diferentes medios online. Gran parte de ese texto contiene opiniones acerca de multitud de entidades, productos, servicios, etc. Dada la creciente necesidad de disponer de medios automatizados para analizar, procesar y explotar esa información, las técnicas de análisis de sentimiento han recibido gran cantidad de atención por parte de la industria y la comunidad científica durante la última década y media. No obstante, muchas de las técnicas empleadas suelen requerir de entrenamiento supervisado utilizando para ello ejemplos anotados manualmente, u otros recursos lingüísticos relacionados con un idioma o dominio de aplicación específicos. Esto limita la aplicación de este tipo de técnicas, ya que dicho recursos y ejemplos anotados no son sencillos de obtener. En esta tesis se explora una serie de métodos para realizar diversos análisis automáticos de texto en el marco del análisis de sentimiento, incluyendo la obtención automática de términos de un dominio, palabras que expresan opinión, polaridad del sentimiento de dichas palabras (positivas o negativas), etc. Finalmente se propone y se evalúa un método que combina representación continua de palabras (continuous word embeddings) y topic-modelling inspirado en la técnica de Latent Dirichlet Allocation (LDA), para obtener un sistema de análisis de sentimiento basado en aspectos (ABSA), que sólo necesita unas pocas palabras semilla para procesar textos de un idioma o dominio determinados. De este modo, la adaptación a otro idioma o dominio se reduce a la traducción de las palabras semilla correspondientes

    Domain-specific lexicon generation for emotion detection from text.

    Get PDF
    Emotions play a key role in effective and successful human communication. Text is popularly used on the internet and social media websites to express and share emotions, feelings and sentiments. However useful applications and services built to understand emotions from text are limited in effectiveness due to reliance on general purpose emotion lexicons that have static vocabulary and sentiment lexicons that can only interpret emotions coarsely. Thus emotion detection from text calls for methods and knowledge resources that can deal with challenges such as dynamic and informal vocabulary, domain-level variations in emotional expressions and other linguistic nuances. In this thesis we demonstrate how labelled (e.g. blogs, news headlines) and weakly-labelled (e.g. tweets) emotional documents can be harnessed to learn word-emotion lexicons that can account for dynamic and domain-specific emotional vocabulary. We model the characteristics of realworld emotional documents to propose a generative mixture model, which iteratively estimates the language models that best describe the emotional documents using expectation maximization (EM). The proposed mixture model has the ability to model both emotionally charged words and emotion-neutral words. We then generate a word-emotion lexicon using the mixture model to quantify word-emotion associations in the form of a probability vectors. Secondly we introduce novel feature extraction methods to utilize the emotion rich knowledge being captured by our word-emotion lexicon. The extracted features are used to classify text into emotion classes using machine learning. Further we also propose hybrid text representations for emotion classification that use the knowledge of lexicon based features in conjunction with other representations such as n-grams, part-of-speech and sentiment information. Thirdly we propose two different methods which jointly use an emotion-labelled corpus of tweets and emotion-sentiment mapping proposed in psychology to learn word-level numerical quantification of sentiment strengths over a positive to negative spectrum. Finally we evaluate all the proposed methods in this thesis through a variety of emotion detection and sentiment analysis tasks on benchmark data sets covering domains from blogs to news articles to tweets and incident reports

    Comprehensive Review of Opinion Summarization

    Get PDF
    The abundance of opinions on the web has kindled the study of opinion summarization over the last few years. People have introduced various techniques and paradigms to solving this special task. This survey attempts to systematically investigate the different techniques and approaches used in opinion summarization. We provide a multi-perspective classification of the approaches used and highlight some of the key weaknesses of these approaches. This survey also covers evaluation techniques and data sets used in studying the opinion summarization problem. Finally, we provide insights into some of the challenges that are left to be addressed as this will help set the trend for future research in this area.unpublishednot peer reviewe
    • …
    corecore