37 research outputs found
Towards a classifier for digital sensitivity review
The sensitivity review of government records is essential before they can be released to the official government archives, to prevent sensitive information (such as personal information, or that which is prejudicial to international relations) from being released. As records are typically reviewed and released after a period of decades, sensitivity review practices are still based on paper records. The transition to digital records brings new challenges, e.g. increased volume of digital records, making current practices impractical to use. In this paper, we describe our current work towards developing a sensitivity review classifier that can identify and prioritise potentially sensitive digital records for review. Using a test collection built from government records with real sensitivities identified by government assessors, we show that considering the entities present in each record can markedly improve upon a text classification baseline
Automatic domain ontology extraction for context-sensitive opinion mining
Automated analysis of the sentiments presented in online consumer feedbacks can facilitate both organizations’ business strategy development and individual consumers’ comparison shopping. Nevertheless, existing opinion mining methods either adopt a context-free sentiment classification approach or rely on a large number of manually annotated training examples to perform context sensitive sentiment classification. Guided by the design science research methodology, we illustrate the design, development, and evaluation of a novel fuzzy domain ontology based contextsensitive opinion mining system. Our novel ontology extraction mechanism underpinned by a variant of Kullback-Leibler divergence can automatically acquire contextual sentiment knowledge across various product domains to improve the sentiment analysis processes. Evaluated based on a benchmark dataset and real consumer reviews collected from Amazon.com, our system shows remarkable performance improvement over the context-free baseline
Language modeling approaches to blog post and feed finding
Language modeling approaches to blog post and feed finding Ernsting, B.J.; Weerkamp, W.; de Rijke, M. General rights It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons). Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible. In the opinion task we looked at the differences in performance between Indri and our mixture model, the influence of external expansion and document priors to improve opinion finding; results show that an out-of-the-box Indri implementation outperforms our mixture model, and that external expansion on a news corpus is very benificial. Opinion finding can be improved using either lexicons or the number of comments as document priors. Our approach to the feed distillation task is based on aggregating post-level scores to obtain a feed-level ranking. We integrated time-based and persistence aspects into the retrieval model. After correcting bugs in our post-score aggregation module we found that time-based retrieval improves results only marginally, while persistence-based ranking results in substantial improvements under the right circumstances
Exploiting multiple sources of evidence for opinion search in the web
In this thesis we study Opinion Mining and Sentiment Analysis and propose
a ne-grained analysis of the opinions conveyed in texts. Concretely, the aim of
this research is to gain an understanding on how to combine di erent types of
evidence to e ectively determine on-topic opinions in texts. To meet this aim,
we consider content-match evidence, obtained at document and passage level,
as well as di erent structural aspects of the text.
Current Opinion Mining technology is not mature yet. As a matter of fact,
people often use regular search engines, which lack evolved opinion search ca-
pabilities, to nd opinions about their interests. This means that the e ort of
detecting what are the key relevant opinions relies on the user. The lack of
widely accepted Opinion Mining technology is due to the limitations of cur-
rent models, which are simplistic and perform poorly. In this thesis we study
a speci c set of factors that are indicative of subjectivity and relevance and we
try to understand how to e ectively combine them to detect opinionated docu-
ments, to extract relevant opinions and to estimate their polarity. We propose
innovative methods and models able to incorporate di erent types of evidence
and it is our intention to contribute in di erent areas, including those related
to i) search for opinionated documents, ii) detection of subjectivity at docu-
ment and passage level, and iii) estimation of polarity. An important concern
that guides this research is e ciency. Some types of evidence, such as discourse
structure, have only been tested with small collections from narrow domains
(e.g., movie reviews). We demonstrate here that evolved linguistic features {
based on discourse analysis{ can potentially lead to a better understanding of
how subjectivity
ows in texts. And we show that this type of features can be
e ciently injected into general-purpose opinion retrieval solutions that operate
at large scale
Combining granularity-based topic-dependent and topic-independent evidences for opinion detection
Fouille des opinion, une sous-discipline dans la recherche d'information (IR) et la linguistique computationnelle, fait référence aux techniques de calcul pour l'extraction, la classification, la compréhension et l'évaluation des opinions exprimées par diverses sources de nouvelles en ligne, social commentaires des médias, et tout autre contenu généré par l'utilisateur. Il est également connu par de nombreux autres termes comme trouver l'opinion, la détection d'opinion, l'analyse des sentiments, la classification sentiment, de détection de polarité, etc. Définition dans le contexte plus spécifique et plus simple, fouille des opinion est la tâche de récupération des opinions contre son besoin aussi exprimé par l'utilisateur sous la forme d'une requête. Il y a de nombreux problèmes et défis liés à l'activité fouille des opinion. Dans cette thèse, nous nous concentrons sur quelques problèmes d'analyse d'opinion. L'un des défis majeurs de fouille des opinion est de trouver des opinions concernant spécifiquement le sujet donné (requête). Un document peut contenir des informations sur de nombreux sujets à la fois et il est possible qu'elle contienne opiniâtre texte sur chacun des sujet ou sur seulement quelques-uns. Par conséquent, il devient très important de choisir les segments du document pertinentes à sujet avec leurs opinions correspondantes. Nous abordons ce problème sur deux niveaux de granularité, des phrases et des passages. Dans notre première approche de niveau de phrase, nous utilisons des relations sémantiques de WordNet pour trouver cette association entre sujet et opinion. Dans notre deuxième approche pour le niveau de passage, nous utilisons plus robuste modèle de RI i.e. la language modèle de se concentrer sur ce problème. L'idée de base derrière les deux contributions pour l'association d'opinion-sujet est que si un document contient plus segments textuels (phrases ou passages) opiniâtre et pertinentes à sujet, il est plus opiniâtre qu'un document avec moins segments textuels opiniâtre et pertinentes. La plupart des approches d'apprentissage-machine basée à fouille des opinion sont dépendants du domaine i.e. leurs performances varient d'un domaine à d'autre. D'autre part, une approche indépendant de domaine ou un sujet est plus généralisée et peut maintenir son efficacité dans différents domaines. Cependant, les approches indépendant de domaine souffrent de mauvaises performances en général. C'est un grand défi dans le domaine de fouille des opinion à développer une approche qui est plus efficace et généralisé. Nos contributions de cette thèse incluent le développement d'une approche qui utilise de simples fonctions heuristiques pour trouver des documents opiniâtre. Fouille des opinion basée entité devient très populaire parmi les chercheurs de la communauté IR. Il vise à identifier les entités pertinentes pour un sujet donné et d'en extraire les opinions qui leur sont associées à partir d'un ensemble de documents textuels. Toutefois, l'identification et la détermination de la pertinence des entités est déjà une tâche difficile. Nous proposons un système qui prend en compte à la fois l'information de l'article de nouvelles en cours ainsi que des articles antérieurs pertinents afin de détecter les entités les plus importantes dans les nouvelles actuelles. En plus de cela, nous présentons également notre cadre d'analyse d'opinion et tâches relieés. Ce cadre est basée sur les évidences contents et les évidences sociales de la blogosphère pour les tâches de trouver des opinions, de prévision et d'avis de classement multidimensionnel. Cette contribution d'prématurée pose les bases pour nos travaux futurs. L'évaluation de nos méthodes comprennent l'utilisation de TREC 2006 Blog collection et de TREC Novelty track 2004 collection. La plupart des évaluations ont été réalisées dans le cadre de TREC Blog track.Opinion mining is a sub-discipline within Information Retrieval (IR) and Computational Linguistics. It refers to the computational techniques for extracting, classifying, understanding, and assessing the opinions expressed in various online sources like news articles, social media comments, and other user-generated content. It is also known by many other terms like opinion finding, opinion detection, sentiment analysis, sentiment classification, polarity detection, etc. Defining in more specific and simpler context, opinion mining is the task of retrieving opinions on an issue as expressed by the user in the form of a query. There are many problems and challenges associated with the field of opinion mining. In this thesis, we focus on some major problems of opinion mining
Direct Negative Opinions in Online Discussions
In this paper we investigate the impact of antagonism in online discussions. We define antagonism as a new class of textual opinions - direct sentiment towards the authors of previous comments. We detect the negative sentiment using aspect-based opinion mining techniques. We create a model of human behavior in online communities, based on the network topology and on the communication content. The model contains seven hypotheses, which validate two intuitions. The first intuition is that the content of the messages exchanged in an online community can separate good and insightful contributions from the rest. The second intuition is that there is a delay until the network stabilizes and until standard measures, such as betweenness centrality, can be used accurately. Taken together, these intuitions are a solid case for using the content of the communication along with network measures. We show that the sentiment within the messages, especially antagonism, can significantly alter the community perception. We use real world data, taken from the Slashdot1 discussion forum to validate our model. All the findings are accompanied by extremely significant t-test p-values
Identifying Influential Bloggers: Time Does Matter
Blogs have recently become one of the most favored services on the Web. Many
users maintain a blog and write posts to express their opinion, experience and
knowledge about a product, an event and every subject of general or specific
interest. More users visit blogs to read these posts and comment them. This
"participatory journalism" of blogs has such an impact upon the masses that
Keller and Berry argued that through blogging "one American in tens tells the
other nine how to vote, where to eat and what to buy" \cite{keller1}.
Therefore, a significant issue is how to identify such influential bloggers.
This problem is very new and the relevant literature lacks sophisticated
solutions, but most importantly these solutions have not taken into account
temporal aspects for identifying influential bloggers, even though the time is
the most critical aspect of the Blogosphere. This article investigates the
issue of identifying influential bloggers by proposing two easily computed
blogger ranking methods, which incorporate temporal aspects of the blogging
activity. Each method is based on a specific metric to score the blogger's
posts. The first metric, termed MEIBI, takes into consideration the number of
the blog post's inlinks and its comments, along with the publication date of
the post. The second metric, MEIBIX, is used to score a blog post according to
the number and age of the blog post's inlinks and its comments. These methods
are evaluated against the state-of-the-art influential blogger identification
method utilizing data collected from a real-world community blog site. The
obtained results attest that the new methods are able to better identify
significant temporal patterns in the blogging behaviour
Online Crowds Opinion-Mining it to Analyze Current Trend: A Review
Online presence of the user has increased, there is a huge growth in the number of active users and thus the volume of data created on the online social networks is massive. Much are concentrating on the Internet Lingo. Notably most of the data on the social networking sites is made public which opens doors for companies, researchers and analyst to collect and analyze the data. We have huge volume of opinioned data available on the web we have to mine it so that we could get some interesting results out of it with could enhance the decision making process. In order to analyze the current scenario of what people are thinking focus is shifted towards opinion mining. This study presents a systematic literature review that contains a comprehensive overview of components of opinion mining, subjectivity of data, sources of opinion, the process and how does it let one analyze the current tendency of the online crowd in a particular context. Different perspectives from different authors regarding the above scenario have been presented. Research challenges and different applications that were developed with the motive opinion mining are also discussed
Opinion mining: Reviewed from word to document level
International audienceOpinion mining is one of the most challenging tasks of the field of information retrieval. Research community has been publishing a number of articles on this topic but a significant increase in interest has been observed during the past decade especially after the launch of several online social networks. In this paper, we provide a very detailed overview of the related work of opinion mining. Following features of our review make it stand unique among the works of similar kind: (1) it presents a very different perspective of the opinion mining field by discussing the work on different granularity levels (like word, sentences, and document levels) which is very unique and much required, (2) discussion of the related work in terms of challenges of the field of opinion mining, (3) document level discussion of the related work gives an overview of opinion mining task in blogosphere, one of most popular online social network, and (4) highlights the importance of online social networks for opinion mining task and other related sub-tasks
Aplicación del análisis de sentimientos a la evaluación de datos generados en medios sociales
El presente documento describe el proceso de investigación y desarrollo llevado a cabo en la disciplina del análisis de sentimientos. El objetivo principal de esta investigación fue evaluar la aplicación de las tecnologías del análisis de sentimientos al contenido generado por los usuarios de distintos medios sociales y presentar propuestas de aprovechamiento de los resultados de estas tecnologías a las organizaciones y usuarios. Se estudió el grado de confiabilidad de las herramientas en línea de análisis de sentimientos que trabajan con Twitter como fuente de corpus; se presentó una propuesta heurística que simplifica el análisis de sentimientos de los mensajes de Twitter centrándose en las opiniones directamente relacionadas con los objetos de opinión en lugar de determinar el sentimiento de forma global y que genera información adicional que pudiese resultar útil para el boca a boca electrónico; Finalmente se desarrolló y evaluó una propuesta de predicción de calificaciones cuantitativas de hoteles a partir de las críticas emitidas por los usuarios de sus servicios. Los resultados de esta investigación demuestran que el análisis de sentimientos es una disciplina que en su estado actual puede ser útil para la toma de decisiones para compañías e individuos y que sin embargo es susceptible de ser mejorada para el aprovechamiento de la cantidad masiva de opiniones en texto emitidas por los usuarios de los medios sociales