37 research outputs found

    Towards a classifier for digital sensitivity review

    Get PDF
    The sensitivity review of government records is essential before they can be released to the official government archives, to prevent sensitive information (such as personal information, or that which is prejudicial to international relations) from being released. As records are typically reviewed and released after a period of decades, sensitivity review practices are still based on paper records. The transition to digital records brings new challenges, e.g. increased volume of digital records, making current practices impractical to use. In this paper, we describe our current work towards developing a sensitivity review classifier that can identify and prioritise potentially sensitive digital records for review. Using a test collection built from government records with real sensitivities identified by government assessors, we show that considering the entities present in each record can markedly improve upon a text classification baseline

    Automatic domain ontology extraction for context-sensitive opinion mining

    Get PDF
    Automated analysis of the sentiments presented in online consumer feedbacks can facilitate both organizations’ business strategy development and individual consumers’ comparison shopping. Nevertheless, existing opinion mining methods either adopt a context-free sentiment classification approach or rely on a large number of manually annotated training examples to perform context sensitive sentiment classification. Guided by the design science research methodology, we illustrate the design, development, and evaluation of a novel fuzzy domain ontology based contextsensitive opinion mining system. Our novel ontology extraction mechanism underpinned by a variant of Kullback-Leibler divergence can automatically acquire contextual sentiment knowledge across various product domains to improve the sentiment analysis processes. Evaluated based on a benchmark dataset and real consumer reviews collected from Amazon.com, our system shows remarkable performance improvement over the context-free baseline

    Language modeling approaches to blog post and feed finding

    Get PDF
    Language modeling approaches to blog post and feed finding Ernsting, B.J.; Weerkamp, W.; de Rijke, M. General rights It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons). Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible. In the opinion task we looked at the differences in performance between Indri and our mixture model, the influence of external expansion and document priors to improve opinion finding; results show that an out-of-the-box Indri implementation outperforms our mixture model, and that external expansion on a news corpus is very benificial. Opinion finding can be improved using either lexicons or the number of comments as document priors. Our approach to the feed distillation task is based on aggregating post-level scores to obtain a feed-level ranking. We integrated time-based and persistence aspects into the retrieval model. After correcting bugs in our post-score aggregation module we found that time-based retrieval improves results only marginally, while persistence-based ranking results in substantial improvements under the right circumstances

    Exploiting multiple sources of evidence for opinion search in the web

    Get PDF
    In this thesis we study Opinion Mining and Sentiment Analysis and propose a ne-grained analysis of the opinions conveyed in texts. Concretely, the aim of this research is to gain an understanding on how to combine di erent types of evidence to e ectively determine on-topic opinions in texts. To meet this aim, we consider content-match evidence, obtained at document and passage level, as well as di erent structural aspects of the text. Current Opinion Mining technology is not mature yet. As a matter of fact, people often use regular search engines, which lack evolved opinion search ca- pabilities, to nd opinions about their interests. This means that the e ort of detecting what are the key relevant opinions relies on the user. The lack of widely accepted Opinion Mining technology is due to the limitations of cur- rent models, which are simplistic and perform poorly. In this thesis we study a speci c set of factors that are indicative of subjectivity and relevance and we try to understand how to e ectively combine them to detect opinionated docu- ments, to extract relevant opinions and to estimate their polarity. We propose innovative methods and models able to incorporate di erent types of evidence and it is our intention to contribute in di erent areas, including those related to i) search for opinionated documents, ii) detection of subjectivity at docu- ment and passage level, and iii) estimation of polarity. An important concern that guides this research is e ciency. Some types of evidence, such as discourse structure, have only been tested with small collections from narrow domains (e.g., movie reviews). We demonstrate here that evolved linguistic features { based on discourse analysis{ can potentially lead to a better understanding of how subjectivity ows in texts. And we show that this type of features can be e ciently injected into general-purpose opinion retrieval solutions that operate at large scale

    Combining granularity-based topic-dependent and topic-independent evidences for opinion detection

    Get PDF
    Fouille des opinion, une sous-discipline dans la recherche d'information (IR) et la linguistique computationnelle, fait référence aux techniques de calcul pour l'extraction, la classification, la compréhension et l'évaluation des opinions exprimées par diverses sources de nouvelles en ligne, social commentaires des médias, et tout autre contenu généré par l'utilisateur. Il est également connu par de nombreux autres termes comme trouver l'opinion, la détection d'opinion, l'analyse des sentiments, la classification sentiment, de détection de polarité, etc. Définition dans le contexte plus spécifique et plus simple, fouille des opinion est la tâche de récupération des opinions contre son besoin aussi exprimé par l'utilisateur sous la forme d'une requête. Il y a de nombreux problèmes et défis liés à l'activité fouille des opinion. Dans cette thèse, nous nous concentrons sur quelques problèmes d'analyse d'opinion. L'un des défis majeurs de fouille des opinion est de trouver des opinions concernant spécifiquement le sujet donné (requête). Un document peut contenir des informations sur de nombreux sujets à la fois et il est possible qu'elle contienne opiniâtre texte sur chacun des sujet ou sur seulement quelques-uns. Par conséquent, il devient très important de choisir les segments du document pertinentes à sujet avec leurs opinions correspondantes. Nous abordons ce problème sur deux niveaux de granularité, des phrases et des passages. Dans notre première approche de niveau de phrase, nous utilisons des relations sémantiques de WordNet pour trouver cette association entre sujet et opinion. Dans notre deuxième approche pour le niveau de passage, nous utilisons plus robuste modèle de RI i.e. la language modèle de se concentrer sur ce problème. L'idée de base derrière les deux contributions pour l'association d'opinion-sujet est que si un document contient plus segments textuels (phrases ou passages) opiniâtre et pertinentes à sujet, il est plus opiniâtre qu'un document avec moins segments textuels opiniâtre et pertinentes. La plupart des approches d'apprentissage-machine basée à fouille des opinion sont dépendants du domaine i.e. leurs performances varient d'un domaine à d'autre. D'autre part, une approche indépendant de domaine ou un sujet est plus généralisée et peut maintenir son efficacité dans différents domaines. Cependant, les approches indépendant de domaine souffrent de mauvaises performances en général. C'est un grand défi dans le domaine de fouille des opinion à développer une approche qui est plus efficace et généralisé. Nos contributions de cette thèse incluent le développement d'une approche qui utilise de simples fonctions heuristiques pour trouver des documents opiniâtre. Fouille des opinion basée entité devient très populaire parmi les chercheurs de la communauté IR. Il vise à identifier les entités pertinentes pour un sujet donné et d'en extraire les opinions qui leur sont associées à partir d'un ensemble de documents textuels. Toutefois, l'identification et la détermination de la pertinence des entités est déjà une tâche difficile. Nous proposons un système qui prend en compte à la fois l'information de l'article de nouvelles en cours ainsi que des articles antérieurs pertinents afin de détecter les entités les plus importantes dans les nouvelles actuelles. En plus de cela, nous présentons également notre cadre d'analyse d'opinion et tâches relieés. Ce cadre est basée sur les évidences contents et les évidences sociales de la blogosphère pour les tâches de trouver des opinions, de prévision et d'avis de classement multidimensionnel. Cette contribution d'prématurée pose les bases pour nos travaux futurs. L'évaluation de nos méthodes comprennent l'utilisation de TREC 2006 Blog collection et de TREC Novelty track 2004 collection. La plupart des évaluations ont été réalisées dans le cadre de TREC Blog track.Opinion mining is a sub-discipline within Information Retrieval (IR) and Computational Linguistics. It refers to the computational techniques for extracting, classifying, understanding, and assessing the opinions expressed in various online sources like news articles, social media comments, and other user-generated content. It is also known by many other terms like opinion finding, opinion detection, sentiment analysis, sentiment classification, polarity detection, etc. Defining in more specific and simpler context, opinion mining is the task of retrieving opinions on an issue as expressed by the user in the form of a query. There are many problems and challenges associated with the field of opinion mining. In this thesis, we focus on some major problems of opinion mining

    Direct Negative Opinions in Online Discussions

    Get PDF
    In this paper we investigate the impact of antagonism in online discussions. We define antagonism as a new class of textual opinions - direct sentiment towards the authors of previous comments. We detect the negative sentiment using aspect-based opinion mining techniques. We create a model of human behavior in online communities, based on the network topology and on the communication content. The model contains seven hypotheses, which validate two intuitions. The first intuition is that the content of the messages exchanged in an online community can separate good and insightful contributions from the rest. The second intuition is that there is a delay until the network stabilizes and until standard measures, such as betweenness centrality, can be used accurately. Taken together, these intuitions are a solid case for using the content of the communication along with network measures. We show that the sentiment within the messages, especially antagonism, can significantly alter the community perception. We use real world data, taken from the Slashdot1 discussion forum to validate our model. All the findings are accompanied by extremely significant t-test p-values

    Identifying Influential Bloggers: Time Does Matter

    Full text link
    Blogs have recently become one of the most favored services on the Web. Many users maintain a blog and write posts to express their opinion, experience and knowledge about a product, an event and every subject of general or specific interest. More users visit blogs to read these posts and comment them. This "participatory journalism" of blogs has such an impact upon the masses that Keller and Berry argued that through blogging "one American in tens tells the other nine how to vote, where to eat and what to buy" \cite{keller1}. Therefore, a significant issue is how to identify such influential bloggers. This problem is very new and the relevant literature lacks sophisticated solutions, but most importantly these solutions have not taken into account temporal aspects for identifying influential bloggers, even though the time is the most critical aspect of the Blogosphere. This article investigates the issue of identifying influential bloggers by proposing two easily computed blogger ranking methods, which incorporate temporal aspects of the blogging activity. Each method is based on a specific metric to score the blogger's posts. The first metric, termed MEIBI, takes into consideration the number of the blog post's inlinks and its comments, along with the publication date of the post. The second metric, MEIBIX, is used to score a blog post according to the number and age of the blog post's inlinks and its comments. These methods are evaluated against the state-of-the-art influential blogger identification method utilizing data collected from a real-world community blog site. The obtained results attest that the new methods are able to better identify significant temporal patterns in the blogging behaviour

    Online Crowds Opinion-Mining it to Analyze Current Trend: A Review

    Get PDF
    Online presence of the user has increased, there is a huge growth in the number of active users and thus the volume of data created on the online social networks is massive. Much are concentrating on the Internet Lingo. Notably most of the data on the social networking sites is made public which opens doors for companies, researchers and analyst to collect and analyze the data. We have huge volume of opinioned data available on the web we have to mine it so that we could get some interesting results out of it with could enhance the decision making process. In order to analyze the current scenario of what people are thinking focus is shifted towards opinion mining. This study presents a systematic literature review that contains a comprehensive overview of components of opinion mining, subjectivity of data, sources of opinion, the process and how does it let one analyze the current tendency of the online crowd in a particular context. Different perspectives from different authors regarding the above scenario have been presented. Research challenges and different applications that were developed with the motive opinion mining are also discussed

    Opinion mining: Reviewed from word to document level

    Get PDF
    International audienceOpinion mining is one of the most challenging tasks of the field of information retrieval. Research community has been publishing a number of articles on this topic but a significant increase in interest has been observed during the past decade especially after the launch of several online social networks. In this paper, we provide a very detailed overview of the related work of opinion mining. Following features of our review make it stand unique among the works of similar kind: (1) it presents a very different perspective of the opinion mining field by discussing the work on different granularity levels (like word, sentences, and document levels) which is very unique and much required, (2) discussion of the related work in terms of challenges of the field of opinion mining, (3) document level discussion of the related work gives an overview of opinion mining task in blogosphere, one of most popular online social network, and (4) highlights the importance of online social networks for opinion mining task and other related sub-tasks

    Aplicación del análisis de sentimientos a la evaluación de datos generados en medios sociales

    Get PDF
    El presente documento describe el proceso de investigación y desarrollo llevado a cabo en la disciplina del análisis de sentimientos. El objetivo principal de esta investigación fue evaluar la aplicación de las tecnologías del análisis de sentimientos al contenido generado por los usuarios de distintos medios sociales y presentar propuestas de aprovechamiento de los resultados de estas tecnologías a las organizaciones y usuarios. Se estudió el grado de confiabilidad de las herramientas en línea de análisis de sentimientos que trabajan con Twitter como fuente de corpus; se presentó una propuesta heurística que simplifica el análisis de sentimientos de los mensajes de Twitter centrándose en las opiniones directamente relacionadas con los objetos de opinión en lugar de determinar el sentimiento de forma global y que genera información adicional que pudiese resultar útil para el boca a boca electrónico; Finalmente se desarrolló y evaluó una propuesta de predicción de calificaciones cuantitativas de hoteles a partir de las críticas emitidas por los usuarios de sus servicios. Los resultados de esta investigación demuestran que el análisis de sentimientos es una disciplina que en su estado actual puede ser útil para la toma de decisiones para compañías e individuos y que sin embargo es susceptible de ser mejorada para el aprovechamiento de la cantidad masiva de opiniones en texto emitidas por los usuarios de los medios sociales
    corecore