67 research outputs found

    Opinion mining: Reviewed from word to document level

    Get PDF
    International audienceOpinion mining is one of the most challenging tasks of the field of information retrieval. Research community has been publishing a number of articles on this topic but a significant increase in interest has been observed during the past decade especially after the launch of several online social networks. In this paper, we provide a very detailed overview of the related work of opinion mining. Following features of our review make it stand unique among the works of similar kind: (1) it presents a very different perspective of the opinion mining field by discussing the work on different granularity levels (like word, sentences, and document levels) which is very unique and much required, (2) discussion of the related work in terms of challenges of the field of opinion mining, (3) document level discussion of the related work gives an overview of opinion mining task in blogosphere, one of most popular online social network, and (4) highlights the importance of online social networks for opinion mining task and other related sub-tasks

    New features for sentiment analysis: Do sentences matter?

    Get PDF
    1st International Workshop on Sentiment Discovery from Affective Data 2012, SDAD 2012 - In Conjunction with ECML-PKDD 2012; Bristol; United Kingdom; 28 September 2012 through 28 September 2012In this work, we propose and evaluate new features to be used in a word polarity based approach to sentiment classification. In particular, we analyze sentences as the first step before estimating the overall review polarity. We consider different aspects of sentences, such as length, purity, irrealis content, subjectivity, and position within the opinionated text. This analysis is then used to find sentences that may convey better information about the overall review polarity. The TripAdvisor dataset is used to evaluate the effect of sentence level features on polarity classification. Our initial results indicate a small improvement in classification accuracy when using the newly proposed features. However, the benefit of these features is not limited to improving sentiment classification accuracy since sentence level features can be used for other important tasks such as review summarization.European Commission, FP7, under UBIPOL (Ubiquitous Participation Platform for Policy Making) Projec

    Combining granularity-based topic-dependent and topic-independent evidences for opinion detection

    Get PDF
    Fouille des opinion, une sous-discipline dans la recherche d'information (IR) et la linguistique computationnelle, fait référence aux techniques de calcul pour l'extraction, la classification, la compréhension et l'évaluation des opinions exprimées par diverses sources de nouvelles en ligne, social commentaires des médias, et tout autre contenu généré par l'utilisateur. Il est également connu par de nombreux autres termes comme trouver l'opinion, la détection d'opinion, l'analyse des sentiments, la classification sentiment, de détection de polarité, etc. Définition dans le contexte plus spécifique et plus simple, fouille des opinion est la tâche de récupération des opinions contre son besoin aussi exprimé par l'utilisateur sous la forme d'une requête. Il y a de nombreux problèmes et défis liés à l'activité fouille des opinion. Dans cette thèse, nous nous concentrons sur quelques problèmes d'analyse d'opinion. L'un des défis majeurs de fouille des opinion est de trouver des opinions concernant spécifiquement le sujet donné (requête). Un document peut contenir des informations sur de nombreux sujets à la fois et il est possible qu'elle contienne opiniâtre texte sur chacun des sujet ou sur seulement quelques-uns. Par conséquent, il devient très important de choisir les segments du document pertinentes à sujet avec leurs opinions correspondantes. Nous abordons ce problème sur deux niveaux de granularité, des phrases et des passages. Dans notre première approche de niveau de phrase, nous utilisons des relations sémantiques de WordNet pour trouver cette association entre sujet et opinion. Dans notre deuxième approche pour le niveau de passage, nous utilisons plus robuste modèle de RI i.e. la language modèle de se concentrer sur ce problème. L'idée de base derrière les deux contributions pour l'association d'opinion-sujet est que si un document contient plus segments textuels (phrases ou passages) opiniâtre et pertinentes à sujet, il est plus opiniâtre qu'un document avec moins segments textuels opiniâtre et pertinentes. La plupart des approches d'apprentissage-machine basée à fouille des opinion sont dépendants du domaine i.e. leurs performances varient d'un domaine à d'autre. D'autre part, une approche indépendant de domaine ou un sujet est plus généralisée et peut maintenir son efficacité dans différents domaines. Cependant, les approches indépendant de domaine souffrent de mauvaises performances en général. C'est un grand défi dans le domaine de fouille des opinion à développer une approche qui est plus efficace et généralisé. Nos contributions de cette thèse incluent le développement d'une approche qui utilise de simples fonctions heuristiques pour trouver des documents opiniâtre. Fouille des opinion basée entité devient très populaire parmi les chercheurs de la communauté IR. Il vise à identifier les entités pertinentes pour un sujet donné et d'en extraire les opinions qui leur sont associées à partir d'un ensemble de documents textuels. Toutefois, l'identification et la détermination de la pertinence des entités est déjà une tâche difficile. Nous proposons un système qui prend en compte à la fois l'information de l'article de nouvelles en cours ainsi que des articles antérieurs pertinents afin de détecter les entités les plus importantes dans les nouvelles actuelles. En plus de cela, nous présentons également notre cadre d'analyse d'opinion et tâches relieés. Ce cadre est basée sur les évidences contents et les évidences sociales de la blogosphère pour les tâches de trouver des opinions, de prévision et d'avis de classement multidimensionnel. Cette contribution d'prématurée pose les bases pour nos travaux futurs. L'évaluation de nos méthodes comprennent l'utilisation de TREC 2006 Blog collection et de TREC Novelty track 2004 collection. La plupart des évaluations ont été réalisées dans le cadre de TREC Blog track.Opinion mining is a sub-discipline within Information Retrieval (IR) and Computational Linguistics. It refers to the computational techniques for extracting, classifying, understanding, and assessing the opinions expressed in various online sources like news articles, social media comments, and other user-generated content. It is also known by many other terms like opinion finding, opinion detection, sentiment analysis, sentiment classification, polarity detection, etc. Defining in more specific and simpler context, opinion mining is the task of retrieving opinions on an issue as expressed by the user in the form of a query. There are many problems and challenges associated with the field of opinion mining. In this thesis, we focus on some major problems of opinion mining

    Sentence-based sentiment analysis with domain adaptation capability

    Get PDF
    Sentiment analysis aims to automatically estimate the sentiment in a given text as positive, objective or negative, possibly together with the strength of the sentiment. Polarity lexicons that indicate how positive or negative each term is, are often used as the basis of many sentiment analysis approaches. Domain-specific polarity lexicons are expensive and time-consuming to build; hence, researchers often use a general purpose or domainindependent lexicon as the basis of their analysis. In this work, we address two sub-tasks in sentiment analysis. We introduce a simple method to adapt a general purpose polarity lexicon to a specific domain. Subsequently, we propose new features to be used in a term polarity based approach to sentiment analysis. We consider different aspects of sentences, such as length, purity, irrealis content, subjectivity, and position within the opinionated text. This analysis is used to find sentences that may convey better information about the overall review polarity. Therefore, our work is also focused on the sentence-based sentiment analysis differently from the other works. Moreover, we worked on two distinct domains, hotel and Twitter with three different systems which are compared with the existing state-of-the-art approaches in the literature

    Investigating the Practices and Needs of Agricultural Researchers at the University of Nebraska-Lincoln

    Get PDF
    University of Nebraska-Lincoln (UNL) Libraries was one of 19 libraries participating in a national study, initiated by Ithaka S+R, of the research practices and needs of agricultural researchers. Two UNL Libraries faculty members participated in this study by interviewing 11 UNL agricultural scholars during the summer of 2016. The ethnographic research approach revealed four core themes explored in this UNL-specific report: interdisciplinarity and collaborations; scientific communication practices; scientific research data; and challenges and opportunities. Illustrated by the sample of faculty comments presented here, the themes have direct implications for the UNL Libraries, while in other cases these point to concerns and opportunities for the university, the academy, and the nation

    Investigating the Practices and Needs of Agricultural Researchers at the University of Nebraska-Lincoln

    Get PDF
    University of Nebraska-Lincoln (UNL) Libraries was one of 19 libraries participating in a national study, initiated by Ithaka S+R, of the research practices and needs of agricultural researchers. Two UNL Libraries faculty members participated in this study by interviewing 11 UNL agricultural scholars during the summer of 2016. The ethnographic research approach revealed four core themes explored in this UNL-specific report: interdisciplinarity and collaborations; scientific communication practices; scientific research data; and challenges and opportunities. Illustrated by the sample of faculty comments presented here, the themes have direct implications for the UNL Libraries, while in other cases these point to concerns and opportunities for the university, the academy, and the nation

    Political Advocacy on the Web: Issue Networks in Online Debate Over the USA Patriot Act

    Get PDF
    This dissertation examines how people and organizations used the World Wide Web to discuss and debate a public policy in 2005, at a point of time when the Internet was viewed as a maturing medium for communication. Combining descriptive and quantitative frame analyses with an issue network analysis, the study evaluated the frames apparent in discourse concerning two key sections of the USA Patriot Act, while the issue network analysis probed hypertext linkages among Web pages where discussion was occurring. Sections 214 and 215 of the USA Patriot Act provided a contentious national issue with multiple stakeholders presumed to be attempting to frame issues connected to the two sections. The focus on two sections allowed frame and issue network contrasts to be made. The study sought evidence of an Internet effect to determine whether the Web, through the way people were using it, was having a polarizing, synthesizing, or fragmentizing effect on discussion and debate. Frame overlap and hypertext linkage patterns among actors in the issue networks indicated an overall tendency toward synthesis. The study also probed the degree to which there is a joining, or symbiosis, of Web content and structure, in part evidenced by whether patterns exist that like-minded groups are coming together to form online community through hypertext linkages. Evidence was found to support this conclusion among Web pages in several Internet domains, although questions remain about linking patterns among blogs due to limitations of the software used in the study. Organizational Web sites on average used a similar number of frames compared to other Web page types, including blogs. The organizational Web pages were found to be briefer in how they discussed issues, however. The study contributes to theory by offering the first known empirical study of online community formation and issue advocacy on a matter of public policy and through its finding of a linkage between Web content and Web structure. Methodologically, the study presents a flexible mixed-methods model of descriptive and quantitative approaches that appears excellently suited for Internet studies. The dissertation’s use of fuzzy clustering and discriminant analysis offer important improvements over existing approaches in factor-based frame analysis and frame mapping techniques

    Medios y periodistas en la era del gobierno abierto y la transparencia

    Get PDF
    Sección Deptal. de Derecho Constitucional (Ciencias de la Información)Fac. de Ciencias de la Informaciónpu

    Multilabel Classification through Structured Output Learning - Methods and Applications

    Get PDF
    Multilabel classification is an important topic in machine learning that arises naturally from many real world applications. For example, in document classification, a research article can be categorized as “science”, “drug discovery” and “genomics” at the same time. The goal of multilabel classification is to reliably predict multiple outputs for a given input. As multiple interdependent labels can be “on” and “off” simultaneously, the central problem in multilabel classification is how to best exploit the correlation between labels to make accurate predictions. Compared to the previous flat multilabel classification approaches which treat multiple labels as a flat vector, structured output learning relies on an output graph connecting multiple labels to model the correlation between labels in a comprehensive manner. The main question studied in this thesis is how to tackle multilabel classification through structured output learning. This thesis starts with an extensive review on the topic of classification learning including both single-label and multilabel classification. The first problem we address is how to solve the multilabel classification problem when the output graph is observed apriori. We discuss several well-established structured output learning algorithms and study the network response prediction problem within the context of social network analysis. As the current structured output learning algorithms rely on the output graph to exploit the dependency between labels, the second problem we address is how to use structured output learning when the output graph is not known. Specifically, we examine the potential of learning on a set of random output graphs when the “real” one is hidden. This problem is relevant as in most multilabel classification problems there does not exist any output graph that reveals the dependency between labels. The third problem we address is how to analyze the proposed learning algorithms in a theoretical manner. Specifically, we want to explain the intuition behind the proposed models and to study the generalization error. The main contributions of this thesis are several new learning algorithms that widen the applicability of structured output learning. For the problem with an observed output graph, the proposed algorithm “SPIN” is able to predict an optimal directed acyclic graph from an observed underlying network that best responses to an input. For general multilabel classification problems without any known output graph, we proposed several learning algorithms that combine a set of structured output learners built on random output graphs. In addition, we develop a joint learning and inference framework which is based on max-margin learning over a random sample of spanning trees. The theoretic analysis also guarantees the generalization error of the proposed methods
    corecore