    Cross-domain opinion word extraction model

    In this paper we consider a new approach for domain-specific opinion word extraction in Russian. We propose a set of statistical features and algorithm combination that can discriminate opinion words in a particular domain. The extraction model is trained in a movie domain and then applied to four other domains. We evaluate the quality of obtained sentiment lexicons intrinsically. Finally, our method is adapted to a movie domain in English and demonstrates comparable results

    Domain-Specific Sentiment Lexicon for Classification

    Nowadays people express their opinions about products, government policies, schemes and programs over social media sites using web or mobile. At the present time, in our country, government changes policies in every sector and people follow with the eyes or the mind on these policies and express their opinion by writing comments on social media especially using Facebook news media pages. Therefore, our research group intends to do sentiment analysis on new articles. Domain-specific sentiment lexicon has played an important role in opinion mining system. Due to the ubiquitous domain diversity and absence of domain-specific prior knowledge, construction of domain-specific lexicon has become a challenging research topic in recent year. In this paper, lexicon construction for sentiment analysis is described. In this work, there are two main steps: (1) pre-processing on raw data comments that are extracted from Facebook news media pages and (2) constructing lexicon for coming classification work. The word correlation and chi-square statistic are applied to construct lexicon as desired. Experimental results on comments datasets demonstrate that proposed approach is suitable for construction the domain-specific lexicon

    Task-specific Word Identification from Short Texts Using a Convolutional Neural Network

    Task-specific word identification aims to choose the task-related words that best describe a short text. Existing approaches require well-defined seed words or lexical dictionaries (e.g., WordNet), which are often unavailable for many applications such as social discrimination detection and fake review detection. However, we often have a set of labeled short texts where each short text has a task-related class label, e.g., discriminatory or non-discriminatory, specified by users or learned by classification algorithms. In this paper, we focus on identifying task-specific words and phrases from short texts by exploiting their class labels rather than using seed words or lexical dictionaries. We consider the task-specific word and phrase identification as feature learning. We train a convolutional neural network over a set of labeled texts and use score vectors to localize the task-specific words and phrases. Experimental results on sentiment word identification show that our approach significantly outperforms existing methods. We further conduct two case studies to show the effectiveness of our approach. One case study on a crawled tweets dataset demonstrates that our approach can successfully capture the discrimination-related words/phrases. The other case study on fake review detection shows that our approach can identify the fake-review words/phrases.Comment: accepted by Intelligent Data Analysis, an International Journa

    Identifying Entity Aspects in Microblog Posts

    ABSTRACT Online reputation management is about monitoring and handling the public image of entities (such as companies) on the Web. An important task in this area is identifying aspects of the entity of interest (such as products, services, competitors, key people, etc.) given a stream of microblog posts referring to the entity. In this paper we compare different IR techniques and opinion target identification methods for automatically identifying aspects and find that (i) simple statistical methods such as TF.IDF are a strong baseline for the task, significantly outperforming opinion-oriented methods, and (ii) only considering terms tagged as nouns improves the results for all the methods analyzed

    Análisis de sentimientos de reseñas para determinar la acogida de un producto utilizando técnicas de machine learning y data mining

    Leer múltiples reseñas de productos puede resultar tedioso, y concluir si un producto ha gustado o no a sus consumidores es complicado, por lo que es necesario implementar una herramienta que analice todas las reseñas de un producto y determine su polaridad. Lo anterior con el fin de agilizar y mejorar la toma de decisiones sobre un producto por parte de los interesados, así como la relación cliente-empresa, evaluando las reseñas bajo un mismo críterio. Durante el desarrollo del proyecto se diseñó e implementó la estrategia utilizando técnicas de Machine learning y Data mining para solucionar el problema planteado. Como resultado se implemento un modelo por medio de un dataset, luego se aplicó web scrapping a la página web de Amazon, un reconocido E-commerce, con el fin de extraer las reseñas de un producto dado, se visualizaron las reseñas de este a través de librerías de Python para luego ser procesadas y así realizar un analisis de sentimientos. Lo anterior permitió concluir la polaridad de un producto dado haciendo uso de tecnicas de machine learning y data mining.Reading multiple product reviews can be tedious, and concluding whether or not consumers liked a product is complicated, so it is necessary to implement a tool that analyzes all reviews of a product and determines their polarity. The foregoing in order to streamline and improve decision-making about a product by the interested parties, as well as the client-company relationship, evaluating the reviews under the same criteria. During the development of the project, the strategy was developed and implemented using Machine learning and Data mining techniques to solve the problem posed. As a result, a model was implemented through a data set, then web scrapping was applied to the Amazon website, a recognized E-commerce, in order to extract the reviews of a given product, the reviews of this product were displayed. through Python libraries to later be processed and thus carry out a sentiment analysis. The above concluded the polarity of a given product making use of machine learning and data mining techniques

    Построение модели для извлечения оценочной лексики в различных предметных областях

    In this paper we consider a new approach for domain-specific opinion word extraction in the Russian language. We propose a set of statistical features and an algorithm combination that can extract opinion words in a particular domain. The extraction model was trained in the movie domain and then applied to four other domains. The quality of the obtained sentiment lexicons was evaluated intrinsically on the base of an expert markup and remained on the high level during the model transfer to various domains. Finally, our method is adapted to the movie domain in English and it demonstrated good results.В данной работе предлагается новый подход к извлечению оценочных слов для различных предметных областей. В рамках этого подхода была разработана модель, включающая набор характеристик и комбинацию алгоритмов, которые позволяют извлекать оценочные слова в конкретной предметной области. Данная модель была обучена в предметной области о фильмах и затем применена в четырёх других областях. Качество работы метода оценивалось на основании разметки экспертов и оставалось на высоком уровне при переносе модели на различные предметные области. Кроме того, созданная модель была использована в предметной области о фильмах на английском языке и продемонстрировала высокое качество извлечения оценочных слов