Search CORE

10 research outputs found

Sentence-level sentiment tagging across different domains and genres

Author: Andreevskaia Alina
Publication venue
Publication date: 01/01/2009
Field of study

The demand for information about sentiment expressed in texts has stimulated a growing interest into automatic sentiment analysis in Natural Language Processing (NLP). This dissertation is motivated by an unmet need for high-performance domain-independent sentiment taggers and by pressing theoretical questions in NLP, where the exploration of limitations of specific approaches, as well as synergies between them, remain practically unaddressed. This study focuses on sentiment tagging at the sentence level and covers four genres: news, blogs, movie reviews, and product reviews. It draws comparisons between sentiment annotation at different linguistic levels (words, sentences, and texts) and highlights the key differences between supervised machine learning methods that rely on annotated corpora (corpus-based, CBA) and lexicon-based approaches (LBA) to sentiment tagging. Exploring the performance of supervised corpus-based approach to sentiment tagging, this study highlights the strong domain-dependence of the CBA. I present the development of LBA approaches based on general lexicons, such as WordNet, as a potential solution to the domain portability problem. A system for sentiment marker extraction from WordNet's relations and glosses is developed and used to acquire lists for a lexicon-based system for sentiment annotation at the sentence and text levels. It demonstrates that LBA's performance across domains is more stable than that of CBA. Finally, the study proposes an integration of LBA and CBA in an ensemble of classifiers using a precision-based voting technique that allows the ensemble system to incorporate the best features of both CBA and LBA. This combined approach outperforms both base learners and provides a promising solution to the domain-adaptation problem. The study contributes to NLP (1) by developing algorithms for automatic acquisition of sentiment-laden words from dictionary definitions; (2) by conducting a systematic study of approaches to sentiment classification and of factors affecting their performance; (3) by refining the lexicon-based approach by introducing valence shifter handling and parse tree information; and (4) by development of the combined, CBA/LBA approach that brings together the strengths of the two approaches and allows domain-adaptation with limited amounts of labeled training data

Concordia University Research Repository

Abstract All Blogs Are Not Made Equal: Exploring Genre Differences in Sentiment Tagging of Blogs

Author: Alina Andreevskaia
Publication venue
Publication date
Field of study

One of the essential characteristics of blogs is their subjectivity, which makes blogs a particularly interesting domain for research on automatic sentiment determination. In this paper, we explore the properties of two most common subgenres of blogs – personal diaries and “notebooks ” – and the effects that these properties have on performance of an automatic sentiment annotation system, which we developed for binary (positive vs. negative) and ternary (positive vs. negative vs. neutral) classification of sentiment at the sentence level. We also investigate the differential effect of inclusion of negations and other valence shifters on the performance of our system on these two subgenres of blogs

CiteSeerX

Knowledge acquisition for dynamic personalization in e-commerce

Author: Andreevskaia Alina
Publication venue
Publication date: 01/01/2003
Field of study

Information technology is playing an increasingly important role in today's world. Commerce through Internet is not an exception to this phenomenon. Currently the focus in the retailer e-commerce is shifting toward catering to the needs of repeat customers by offering them more personalized services. One of the barriers to such an individualized approach to each customer is the difficulty of collecting information about individual users. This thesis addresses this knowledge acquisition problem. Based on a thorough analysis of different kinds of knowledge acquisition tools and techniques, we propose an architecture that allows the use of a combination of different approaches for knowledge acquisition about users in e-commerce. This architecture is designed to support dynamic adaptation of the user profile to changes in the user interests as well as in the store. The architecture is based on two core concepts, namely dynamic personalization and software agent-support. To reduce the time and effort put in the process of knowledge acquisition by the user and by the knowledge engineer software agents in the proposed architecture assist in different aspects of the process, such as profile initialization, processing of results discovered by the web mining, making changes to user profile and tracking their effects, and in trust related issues and interaction with other agents and systems. A proof of concept prototype has been implemented to demonstrate the feasibility of the architectur

Concordia University Research Repository

Mining WordNet for Fuzzy Sentiment: Sentiment Tag Extraction from WordNet Glosses

Author: Alina Andreevskaia
Sabine Bergler
Publication venue
Publication date
Field of study

Many of the tasks required for semantic tagging of phrases and texts rely on a list of words annotated with some semantic features. We present a method for extracting sentiment-bearing adjectives from WordNet using the Sentiment Tag Extraction Program (STEP). We did 58 STEP runs on unique non-intersecting seed lists drawn from manually annotated list of positive and negative adjectives and evaluated the results against other manually annotated lists. The 58 runs were then collapsed into a single set of 7, 813 unique words. For each word we computed a Net Overlap Score by subtracting the total number of runs assigning this word a negative sentiment from the total of the runs that consider it positive. We demonstrate that Net Overlap Score can be used as a measure of the words degree of membership in the fuzzy category of sentiment: the core adjectives, which had the highest Net Overlap scores, were identified most accurately both by STEP and by human annotators, while the words on the periphery of the category had the lowest scores and were associated with low rates of inter-annotator agreement

CiteSeerX

Neural Co-training for Sentiment Classification with Product Attributes

Author: Andreevskaia Alina
Andrew
Blitzer John
Chorowski Jan
Denis Francois
Devitt Ann
Ganu Gayatree
Johnson Rie
Johnson Rie
Kanayama Hiroshi
Kim Soo-Min
Li Linghui
Li Shoushan
McDonald Ryan T.
Mikolov Tomas
Mullen Tony
Quoc
Socher Richard
Turney Peter D.
Xu Weidi
Yang Zichao
Zhu Xiaojin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

A Random Walk–Based Model for Identifying Semantic Orientation

Author: Ahmed Hassan
Amjad Abu-Jbara
Andreevskaia Alina
Banea Carmen
Black W.
Blair-Goldensohn Sasha
Brody Samuel
Dragomir Radev
Elkateb S.
Elkateb S.
Esuli Andrea
Etzioni Oren
Hassan Ahmed
Jha S.
Kamps Jaap
Kok Stanley
Lewis D. D.
Mihalcea Rada
Narayan Dipak
Stone Philip
Szummer Martin
Tong Richard M.
Turney Peter D.
Velikovich Leonid
Vossen P.
Wanchen Lu
Wiebe Janyce
Zhu Xiaojin
Publication venue: 'MIT Press - Journals'
Publication date
Field of study

Crossref