562 research outputs found

    Combining granularity-based topic-dependent and topic-independent evidences for opinion detection

    Get PDF
    Fouille des opinion, une sous-discipline dans la recherche d'information (IR) et la linguistique computationnelle, fait référence aux techniques de calcul pour l'extraction, la classification, la compréhension et l'évaluation des opinions exprimées par diverses sources de nouvelles en ligne, social commentaires des médias, et tout autre contenu généré par l'utilisateur. Il est également connu par de nombreux autres termes comme trouver l'opinion, la détection d'opinion, l'analyse des sentiments, la classification sentiment, de détection de polarité, etc. Définition dans le contexte plus spécifique et plus simple, fouille des opinion est la tâche de récupération des opinions contre son besoin aussi exprimé par l'utilisateur sous la forme d'une requête. Il y a de nombreux problèmes et défis liés à l'activité fouille des opinion. Dans cette thèse, nous nous concentrons sur quelques problèmes d'analyse d'opinion. L'un des défis majeurs de fouille des opinion est de trouver des opinions concernant spécifiquement le sujet donné (requête). Un document peut contenir des informations sur de nombreux sujets à la fois et il est possible qu'elle contienne opiniâtre texte sur chacun des sujet ou sur seulement quelques-uns. Par conséquent, il devient très important de choisir les segments du document pertinentes à sujet avec leurs opinions correspondantes. Nous abordons ce problème sur deux niveaux de granularité, des phrases et des passages. Dans notre première approche de niveau de phrase, nous utilisons des relations sémantiques de WordNet pour trouver cette association entre sujet et opinion. Dans notre deuxième approche pour le niveau de passage, nous utilisons plus robuste modèle de RI i.e. la language modèle de se concentrer sur ce problème. L'idée de base derrière les deux contributions pour l'association d'opinion-sujet est que si un document contient plus segments textuels (phrases ou passages) opiniâtre et pertinentes à sujet, il est plus opiniâtre qu'un document avec moins segments textuels opiniâtre et pertinentes. La plupart des approches d'apprentissage-machine basée à fouille des opinion sont dépendants du domaine i.e. leurs performances varient d'un domaine à d'autre. D'autre part, une approche indépendant de domaine ou un sujet est plus généralisée et peut maintenir son efficacité dans différents domaines. Cependant, les approches indépendant de domaine souffrent de mauvaises performances en général. C'est un grand défi dans le domaine de fouille des opinion à développer une approche qui est plus efficace et généralisé. Nos contributions de cette thèse incluent le développement d'une approche qui utilise de simples fonctions heuristiques pour trouver des documents opiniâtre. Fouille des opinion basée entité devient très populaire parmi les chercheurs de la communauté IR. Il vise à identifier les entités pertinentes pour un sujet donné et d'en extraire les opinions qui leur sont associées à partir d'un ensemble de documents textuels. Toutefois, l'identification et la détermination de la pertinence des entités est déjà une tâche difficile. Nous proposons un système qui prend en compte à la fois l'information de l'article de nouvelles en cours ainsi que des articles antérieurs pertinents afin de détecter les entités les plus importantes dans les nouvelles actuelles. En plus de cela, nous présentons également notre cadre d'analyse d'opinion et tâches relieés. Ce cadre est basée sur les évidences contents et les évidences sociales de la blogosphère pour les tâches de trouver des opinions, de prévision et d'avis de classement multidimensionnel. Cette contribution d'prématurée pose les bases pour nos travaux futurs. L'évaluation de nos méthodes comprennent l'utilisation de TREC 2006 Blog collection et de TREC Novelty track 2004 collection. La plupart des évaluations ont été réalisées dans le cadre de TREC Blog track.Opinion mining is a sub-discipline within Information Retrieval (IR) and Computational Linguistics. It refers to the computational techniques for extracting, classifying, understanding, and assessing the opinions expressed in various online sources like news articles, social media comments, and other user-generated content. It is also known by many other terms like opinion finding, opinion detection, sentiment analysis, sentiment classification, polarity detection, etc. Defining in more specific and simpler context, opinion mining is the task of retrieving opinions on an issue as expressed by the user in the form of a query. There are many problems and challenges associated with the field of opinion mining. In this thesis, we focus on some major problems of opinion mining

    Comprehensive Review of Opinion Summarization

    Get PDF
    The abundance of opinions on the web has kindled the study of opinion summarization over the last few years. People have introduced various techniques and paradigms to solving this special task. This survey attempts to systematically investigate the different techniques and approaches used in opinion summarization. We provide a multi-perspective classification of the approaches used and highlight some of the key weaknesses of these approaches. This survey also covers evaluation techniques and data sets used in studying the opinion summarization problem. Finally, we provide insights into some of the challenges that are left to be addressed as this will help set the trend for future research in this area.unpublishednot peer reviewe

    Hierarchical organization of consumer reviews for products and its applications

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Opinion mining: Reviewed from word to document level

    Get PDF
    International audienceOpinion mining is one of the most challenging tasks of the field of information retrieval. Research community has been publishing a number of articles on this topic but a significant increase in interest has been observed during the past decade especially after the launch of several online social networks. In this paper, we provide a very detailed overview of the related work of opinion mining. Following features of our review make it stand unique among the works of similar kind: (1) it presents a very different perspective of the opinion mining field by discussing the work on different granularity levels (like word, sentences, and document levels) which is very unique and much required, (2) discussion of the related work in terms of challenges of the field of opinion mining, (3) document level discussion of the related work gives an overview of opinion mining task in blogosphere, one of most popular online social network, and (4) highlights the importance of online social networks for opinion mining task and other related sub-tasks

    Automatic Extraction and Assessment of Entities from the Web

    Get PDF
    The search for information about entities, such as people or movies, plays an increasingly important role on the Web. This information is still scattered across many Web pages, making it more time consuming for a user to find all relevant information about an entity. This thesis describes techniques to extract entities and information about these entities from the Web, such as facts, opinions, questions and answers, interactive multimedia objects, and events. The findings of this thesis are that it is possible to create a large knowledge base automatically using a manually-crafted ontology. The precision of the extracted information was found to be between 75–90 % (facts and entities respectively) after using assessment algorithms. The algorithms from this thesis can be used to create such a knowledge base, which can be used in various research fields, such as question answering, named entity recognition, and information retrieval

    Sentiment classification with case-base approach

    Get PDF
    L'augmentation de la croissance des réseaux, des blogs et des utilisateurs des sites d'examen sociaux font d'Internet une énorme source de données, en particulier sur la façon dont les gens pensent, sentent et agissent envers différentes questions. Ces jours-ci, les opinions des gens jouent un rôle important dans la politique, l'industrie, l'éducation, etc. Alors, les gouvernements, les grandes et petites industries, les instituts universitaires, les entreprises et les individus cherchent à étudier des techniques automatiques fin d’extraire les informations dont ils ont besoin dans les larges volumes de données. L’analyse des sentiments est une véritable réponse à ce besoin. Elle est une application de traitement du langage naturel et linguistique informatique qui se compose de techniques de pointe telles que l'apprentissage machine et les modèles de langue pour capturer les évaluations positives, négatives ou neutre, avec ou sans leur force, dans des texte brut. Dans ce mémoire, nous étudions une approche basée sur les cas pour l'analyse des sentiments au niveau des documents. Notre approche basée sur les cas génère un classificateur binaire qui utilise un ensemble de documents classifies, et cinq lexiques de sentiments différents pour extraire la polarité sur les scores correspondants aux commentaires. Puisque l'analyse des sentiments est en soi une tâche dépendante du domaine qui rend le travail difficile et coûteux, nous appliquons une approche «cross domain» en basant notre classificateur sur les six différents domaines au lieu de le limiter à un seul domaine. Pour améliorer la précision de la classification, nous ajoutons la détection de la négation comme une partie de notre algorithme. En outre, pour améliorer la performance de notre approche, quelques modifications innovantes sont appliquées. Il est intéressant de mentionner que notre approche ouvre la voie à nouveaux développements en ajoutant plus de lexiques de sentiment et ensembles de données à l'avenir.Increasing growth of the social networks, blogs, and user review sites make Internet a huge source of data especially about how people think, feel, and act toward different issues. These days, people opinions play an important role in the politic, industry, education, etc. Thus governments, large and small industries, academic institutes, companies, and individuals are looking for investigating automatic techniques to extract their desire information from large amount of data. Sentiment analysis is one true answer to this need. Sentiment analysis is an application of natural language processing and computational linguistic that consists of advanced techniques such as machine learning and language model approaches to capture the evaluative factors such as positive, negative, or neutral, with or without their strength, from plain texts. In this thesis we study a case-based approach on cross-domain for sentiment analysis on the document level. Our case-based algorithm generates a binary classifier that uses a set of the processed cases, and five different sentiment lexicons to extract the polarity along the corresponding scores from the reviews. Since sentiment analysis inherently is a domain dependent task that makes it problematic and expensive work, we use a cross-domain approach by training our classifier on the six different domains instead of limiting it to one domain. To improve the accuracy of the classifier, we add negation detection as a part of our algorithm. Moreover, to improve the performance of our approach, some innovative modifications are applied. It is worth to mention that our approach allows for further developments by adding more sentiment lexicons and data sets in the future

    CREATE: Concept Representation and Extraction from Heterogeneous Evidence

    Get PDF
    Traditional information retrieval methodology is guided by document retrieval paradigm, where relevant documents are returned in response to user queries. This paradigm faces serious drawback if the desired result is not explicitly present in a single document. The problem becomes more obvious when a user tries to obtain complete information about a real world entity, such as person, company, location etc. In such cases, various facts about the target entity or concept need to be gathered from multiple document sources. In this work, we present a method to extract information about a target entity based on the concept retrieval paradigm that focuses on extracting and blending information related to a concept from multiple sources if necessary. The paradigm is built around a generic notion of concept which is defined as any item that can be thought of as a topic of interest. Concepts may correspond to any real world entity such as restaurant, person, city, organization, etc, or any abstract item such as news topic, event, theory, etc. Web is a heterogeneous collection of data in different forms such as facts, news, opinions etc. We propose different models for different forms of data, all of which work towards the same goal of concept centric retrieval. We motivate our work based on studies about current trends and demands for information seeking. The framework helps in understanding the intent of content, i.e. opinion versus fact. Our work has been conducted on free text data in English. Nevertheless, our framework can be easily transferred to other languages

    Subjectivity Analysis In Opinion Mining - A Systematic Literature Review

    Get PDF
    Subjectivity analysis determines existence of subjectivity in text using subjective clues.It is the first task in opinion mining process.The difference between subjectivity analysis and polarity determination is the latter process subjective text to determine the orientation as positive or negative.There were many techniques used to solve the problem of segregating subjective and objective text.This paper used systematic literature review (SLR) to compile the undertaking study in subjective analysis.SLR is a literature review that collects multiple and critically analyse multiple studies to answer the research questions.Eight research questions were drawn for this purpose.Information such as technique,corpus,subjective clues representation and performance were extracted from 97 articles known as primary studies.This information was analysed to identify the strengths and weaknesses of the technique,affecting elements to the performance and missing elements from the subjectivity analysis.The SLR has found that majority of the study are using machine learning approach to identify and learn subjective text due to the nature of subjectivity analysis problem that is viewed as classification problem.The performance of this approach outperformed other approaches though currently it is at satisfactory level.Therefore,more studies are needed to improve the performance of subjectivity analysis

    Knowledge fixation and accretion: Longitudinal analysis of a social question-answering site

    Get PDF
    © 2014, Emerald Group Publishing Limited. Purpose – The purpose of this paper is to investigate longitudinal features of an established social question-answering (Q&A) site to study how question-answer resources and other community features change over time. Design/methodology/approach – Statistical analysis and visualisation was performed on the full data dump from the Stack Overflow social Q&A site for programmers. Findings – The timing of answers is as strong a predictor of acceptance – a proxy for user satisfaction – as the structural features of provided answers sometimes associated with quality. While many questions and answer exchanges are short-lived, there is a small yet interesting subset of questions where new answers receive community approval and which may end up being ranked more highly than early answers. Research limitations/implications – As a large-scale data oriented research study, this work says little about user motivations to find and contribute new knowledge to old questions or about the impact of the resource on the consumer. This will require complementary studies using qualitative and evaluative methods. Practical implications –While content contribution to social question-asking is largely undertaken within a very short time frame, content consumption is usually over far longer periods. Methods and incentives by which content can be updated and maintained need to be considered. This work should be of interest to knowledge exchange community designers and managers. Originality/value – Few studies have looked at temporal patterns in social Q&A and how time and the moderation and voting systems employed may shape resource quality
    corecore