9 research outputs found

    A Machine Learning Approach For Opinion Holder Extraction In Arabic Language

    Full text link
    Opinion mining aims at extracting useful subjective information from reliable amounts of text. Opinion mining holder recognition is a task that has not been considered yet in Arabic Language. This task essentially requires deep understanding of clauses structures. Unfortunately, the lack of a robust, publicly available, Arabic parser further complicates the research. This paper presents a leading research for the opinion holder extraction in Arabic news independent from any lexical parsers. We investigate constructing a comprehensive feature set to compensate the lack of parsing structural outcomes. The proposed feature set is tuned from English previous works coupled with our proposed semantic field and named entities features. Our feature analysis is based on Conditional Random Fields (CRF) and semi-supervised pattern recognition techniques. Different research models are evaluated via cross-validation experiments achieving 54.03 F-measure. We publicly release our own research outcome corpus and lexicon for opinion mining community to encourage further research

    Multilingual subjectivity analysis using machine translation

    Get PDF
    Although research in other languages is increasing, much of the work in subjectivity analysis has been applied to English data, mainly due to the large body of electronic resources and tools that are available for this language. In this paper, we propose and evaluate methods that can be employed to transfer a repository of subjectivity resources across languages. Specifically, we attempt to leverage on the resources available for English and, by employing machine translation, generate resources for subjectivity analysis in other languages. Through comparative evaluations on two different languages (Romanian and Spanish), we show that automatic translation is a viable alternative for the construction of resources and tools for subjectivity analysis in a new target language.

    Construction and Expansion of Dictionary of Idiomatic Emotional Expressions and Idiomatic Emotional Expression Corpus

    Get PDF
    Objective: In the study of sentiment estimation from language, methods focusing on words, phrases, sentence patterns, and sentence-final expressions have been proposed. However, it is difficult to deal with a wide variety of emotional expressions by only assigning emotions to words and phrases. In particular, it is difficult to analyze metaphorical expressions and idiomatic expressions on a word-by-word basis, and it is impossible to register all expressions in a dictionary because new expressions can be created by flexibly replacing words. However, it is difficult to determine the constraints on the words to be replaced, and not all expressions can be registered in the dictionary as sentence patterns. Methods: In this paper, we construct a dictionary of idiomatic sentiment expressions, which contains idioms expressing emotions. In this paper, we construct a pseudo-emotional corpus by collecting utterances containing emotional idioms from social media and automatically assigning emotions expressed by the idioms. Results: This corpus includes expressions other than idioms, and can be an effective resource for estimating emotions in sentences that do not contain idioms. In this study, we create an emotion estimation model for utterances based on the constructed corpus, and conduct evaluation experiments to explore the problems of the idiomatic emotion corpus. In addition, using the constructed sentiment corpus, we investigate how to expand the dictionary of sentiment expressions in idiomatic phrases by using deep learning methods. Conclusion: Using the corpus of idiomatic sentiments constructed by the proposed method as training data, models with and without idioms were constructed by machine learning models. The results show that the F-values of all emotions with idioms exceed 0.8. On the other hand, when idioms were not included, the F-values tended to decrease overall. However, the F-values of emotions such as "shame" and "excitement" were around 0.7, indicating that the characteristics of emotional expressions other than idioms were expressed

    Sentence-based sentiment analysis with domain adaptation capability

    Get PDF
    Sentiment analysis aims to automatically estimate the sentiment in a given text as positive, objective or negative, possibly together with the strength of the sentiment. Polarity lexicons that indicate how positive or negative each term is, are often used as the basis of many sentiment analysis approaches. Domain-specific polarity lexicons are expensive and time-consuming to build; hence, researchers often use a general purpose or domainindependent lexicon as the basis of their analysis. In this work, we address two sub-tasks in sentiment analysis. We introduce a simple method to adapt a general purpose polarity lexicon to a specific domain. Subsequently, we propose new features to be used in a term polarity based approach to sentiment analysis. We consider different aspects of sentences, such as length, purity, irrealis content, subjectivity, and position within the opinionated text. This analysis is used to find sentences that may convey better information about the overall review polarity. Therefore, our work is also focused on the sentence-based sentiment analysis differently from the other works. Moreover, we worked on two distinct domains, hotel and Twitter with three different systems which are compared with the existing state-of-the-art approaches in the literature

    Semi-Supervised Learning For Identifying Opinions In Web Content

    Get PDF
    Thesis (Ph.D.) - Indiana University, Information Science, 2011Opinions published on the World Wide Web (Web) offer opportunities for detecting personal attitudes regarding topics, products, and services. The opinion detection literature indicates that both a large body of opinions and a wide variety of opinion features are essential for capturing subtle opinion information. Although a large amount of opinion-labeled data is preferable for opinion detection systems, opinion-labeled data is often limited, especially at sub-document levels, and manual annotation is tedious, expensive and error-prone. This shortage of opinion-labeled data is less challenging in some domains (e.g., movie reviews) than in others (e.g., blog posts). While a simple method for improving accuracy in challenging domains is to borrow opinion-labeled data from a non-target data domain, this approach often fails because of the domain transfer problem: Opinion detection strategies designed for one data domain generally do not perform well in another domain. However, while it is difficult to obtain opinion-labeled data, unlabeled user-generated opinion data are readily available. Semi-supervised learning (SSL) requires only limited labeled data to automatically label unlabeled data and has achieved promising results in various natural language processing (NLP) tasks, including traditional topic classification; but SSL has been applied in only a few opinion detection studies. This study investigates application of four different SSL algorithms in three types of Web content: edited news articles, semi-structured movie reviews, and the informal and unstructured content of the blogosphere. SSL algorithms are also evaluated for their effectiveness in sparse data situations and domain adaptation. Research findings suggest that, when there is limited labeled data, SSL is a promising approach for opinion detection in Web content. Although the contributions of SSL varied across data domains, significant improvement was demonstrated for the most challenging data domain--the blogosphere--when a domain transfer-based SSL strategy was implemented

    Latent Variable Models for Semantic Orientations of Phrases

    No full text
    We propose models for semantic orientations of phrases as well as classification methods based on the models. Although each phrase consists of multiple words, the semantic orientation of the phrase is not a mere sum of the orientations of the component words. Some words can invert the orientation. In order to capture the property of such phrases, we introduce latent variables into the models. Through experiments, we show that the proposed latent variable models work well in the classification of semantic orientations of phrases and achieved nearly 82% classification accuracy
    corecore