25 research outputs found
Combining granularity-based topic-dependent and topic-independent evidences for opinion detection
Fouille des opinion, une sous-discipline dans la recherche d'information (IR) et la linguistique computationnelle, fait rĂ©fĂ©rence aux techniques de calcul pour l'extraction, la classification, la comprĂ©hension et l'Ă©valuation des opinions exprimĂ©es par diverses sources de nouvelles en ligne, social commentaires des mĂ©dias, et tout autre contenu gĂ©nĂ©rĂ© par l'utilisateur. Il est Ă©galement connu par de nombreux autres termes comme trouver l'opinion, la dĂ©tection d'opinion, l'analyse des sentiments, la classification sentiment, de dĂ©tection de polaritĂ©, etc. DĂ©finition dans le contexte plus spĂ©cifique et plus simple, fouille des opinion est la tĂąche de rĂ©cupĂ©ration des opinions contre son besoin aussi exprimĂ© par l'utilisateur sous la forme d'une requĂȘte. Il y a de nombreux problĂšmes et dĂ©fis liĂ©s Ă l'activitĂ© fouille des opinion. Dans cette thĂšse, nous nous concentrons sur quelques problĂšmes d'analyse d'opinion. L'un des dĂ©fis majeurs de fouille des opinion est de trouver des opinions concernant spĂ©cifiquement le sujet donnĂ© (requĂȘte). Un document peut contenir des informations sur de nombreux sujets Ă la fois et il est possible qu'elle contienne opiniĂątre texte sur chacun des sujet ou sur seulement quelques-uns. Par consĂ©quent, il devient trĂšs important de choisir les segments du document pertinentes Ă sujet avec leurs opinions correspondantes. Nous abordons ce problĂšme sur deux niveaux de granularitĂ©, des phrases et des passages. Dans notre premiĂšre approche de niveau de phrase, nous utilisons des relations sĂ©mantiques de WordNet pour trouver cette association entre sujet et opinion. Dans notre deuxiĂšme approche pour le niveau de passage, nous utilisons plus robuste modĂšle de RI i.e. la language modĂšle de se concentrer sur ce problĂšme. L'idĂ©e de base derriĂšre les deux contributions pour l'association d'opinion-sujet est que si un document contient plus segments textuels (phrases ou passages) opiniĂątre et pertinentes Ă sujet, il est plus opiniĂątre qu'un document avec moins segments textuels opiniĂątre et pertinentes. La plupart des approches d'apprentissage-machine basĂ©e Ă fouille des opinion sont dĂ©pendants du domaine i.e. leurs performances varient d'un domaine Ă d'autre. D'autre part, une approche indĂ©pendant de domaine ou un sujet est plus gĂ©nĂ©ralisĂ©e et peut maintenir son efficacitĂ© dans diffĂ©rents domaines. Cependant, les approches indĂ©pendant de domaine souffrent de mauvaises performances en gĂ©nĂ©ral. C'est un grand dĂ©fi dans le domaine de fouille des opinion Ă dĂ©velopper une approche qui est plus efficace et gĂ©nĂ©ralisĂ©. Nos contributions de cette thĂšse incluent le dĂ©veloppement d'une approche qui utilise de simples fonctions heuristiques pour trouver des documents opiniĂątre. Fouille des opinion basĂ©e entitĂ© devient trĂšs populaire parmi les chercheurs de la communautĂ© IR. Il vise Ă identifier les entitĂ©s pertinentes pour un sujet donnĂ© et d'en extraire les opinions qui leur sont associĂ©es Ă partir d'un ensemble de documents textuels. Toutefois, l'identification et la dĂ©termination de la pertinence des entitĂ©s est dĂ©jĂ une tĂąche difficile. Nous proposons un systĂšme qui prend en compte Ă la fois l'information de l'article de nouvelles en cours ainsi que des articles antĂ©rieurs pertinents afin de dĂ©tecter les entitĂ©s les plus importantes dans les nouvelles actuelles. En plus de cela, nous prĂ©sentons Ă©galement notre cadre d'analyse d'opinion et tĂąches relieĂ©s. Ce cadre est basĂ©e sur les Ă©vidences contents et les Ă©vidences sociales de la blogosphĂšre pour les tĂąches de trouver des opinions, de prĂ©vision et d'avis de classement multidimensionnel. Cette contribution d'prĂ©maturĂ©e pose les bases pour nos travaux futurs. L'Ă©valuation de nos mĂ©thodes comprennent l'utilisation de TREC 2006 Blog collection et de TREC Novelty track 2004 collection. La plupart des Ă©valuations ont Ă©tĂ© rĂ©alisĂ©es dans le cadre de TREC Blog track.Opinion mining is a sub-discipline within Information Retrieval (IR) and Computational Linguistics. It refers to the computational techniques for extracting, classifying, understanding, and assessing the opinions expressed in various online sources like news articles, social media comments, and other user-generated content. It is also known by many other terms like opinion finding, opinion detection, sentiment analysis, sentiment classification, polarity detection, etc. Defining in more specific and simpler context, opinion mining is the task of retrieving opinions on an issue as expressed by the user in the form of a query. There are many problems and challenges associated with the field of opinion mining. In this thesis, we focus on some major problems of opinion mining
Predictive Modeling for Navigating Social Media
Social media changes the way people use the Web. It has transformed ordinary Web users from information consumers to content contributors. One popular form of content contribution is social tagging, in which users assign tags to Web resources. By the collective efforts of the social tagging community, a new information space has been created for information navigation. Navigation allows serendipitous discovery of information by examining the information objects linked to one another in the social tagging space. In this dissertation, we study prediction tasks that facilitate navigation in social tagging systems. For social tagging systems to meet complex navigation needs of users, two issues are fundamental, namely link sparseness and object selection. Link sparseness is observed for many resources that are untagged or inadequately tagged, hindering navigation to the resources. Object selection is concerned when there are a large number of information objects that are linked to the current object, requiring to select the more interesting or relevant ones for guiding navigation effectively. This dissertation focuses on three dimensions, namely the semantic, social and temporal dimensions, to address link sparseness and object selection. To address link sparseness, we study the task of tag prediction. This task aims to enrich tags for the untagged or inadequately tagged resources, such that the predicted tags can serve as navigable links to these resources. For this task, we take a topic modeling approach to exploit the latent semantic relationships between resource content and tags. To address object selection, we study the task of personalized tag recommendation and trend discovery using social annotations. Personalized tag recommendation leverages the collective wisdom from the social tagging community to recommend tags that are semantically relevant to the target resource, while being tailored to the tagging preferences of individual users. For this task, we propose a probabilistic framework which leverages the implicit social links between like-minded users, i.e. who show similar tagging preferences, to recommend suitable tags. Social tags capture the interest of the users in the annotated resources at different times. These social annotations allow us to construct temporal profiles for the annotated resources. By analyzing these temporal profiles, we unveil the non-trivial temporal trends of the annotated resources, which provide novel metrics for selecting relevant and interesting resources for guiding navigation. For trend discovery using social annotations, we propose a trend discovery process which enables us to analyze trends for a multitude of semantics encapsulated in the temporal profiles of the annotated resources
Making sense of strangers' expertise from digital artifacts
In organizations, individuals typically rely on their personal networks to obtain expertise when faced with ill-defined problems that require answers that are beyond the scope of their own knowledge. However, individuals cannot always get the needed expertise from their local colleagues. This issue is particularly acute for members in large geographically dispersed organizations since it is difficult to know ?who knows what? among numerous colleagues. The proliferation of social computing technologies such as blogs, online forums, social tags and bookmarks, and social network connection information have expanded the reach and ease at which knowledge workers may become aware of others? expertise. While all these technologies facilitate access to a stranger that can potentially provide needed expertise or advice, there has been little theoretical work on how individuals actually go about this process. I refer to the process of gathering complex, changing and potentially equivocal information, and comprehending it by connecting nuggets of information from many sources to answer vague, non-procedural questions as the process of ?sensemaking?. Through a study of 81 fulltime IBM employees in 21 countries, I look at how existing models and theories of sensemaking and information search may be inadequate to describe the ?people sensemaking? process individuals go through when considering contacting strangers for expertise. Using signaling theory as an interpretive framework, I describe how certain ?signals? in various social software are hard to fake, and are thus more reliable indicators of expertise, approachability, and responsiveness. This research has the potential to inform models of sensemaking and information search when the search is for people, as opposed to documents
Corporate impression formation in online communities - determinants and consequences of online community corporate impressions
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The purpose of this study is to gain in-depth knowledge of how the members of
online communities form impressions of organisations that use online communities
in their communication activities. Online impression formation has its peculiarities
and in order to succeed companies need to better understand this phenomenon.
In order to appreciate and evaluate an interaction, those involved in it must know
their own identity. Hence, individuals as well as companies engage in identity
production by trying to project a favourable impression. The process of identity
production can take place in both the offline and the online world. This study focuses
on the online world, more specifically on online communities, by investigating how
online community members form impressions of companies that produce their
identities in online communities.
Technology has changed customer behaviours dramatically. People have embraced
the Internet to meet and interact with one another. This behaviour is in line with the
postmodern assumption that there is a movement towards re-socialisation. Online
communication platforms connect people globally and give them the possibility to
interact and form online social networks. These platforms are interactive, and thus
change the traditional way of communication. Companies therefore have to embrace
those interactive ways of communication. In the online world consumers are quick to
react to communication weaknesses. Inappropriate corporate communication
activities can affect the image they have formed of the company in question
Big Data for Social Sciences: Measuring patterns of human behavior through large-scale mobile phone data
Through seven publications this dissertation shows how anonymized mobile
phone data can contribute to the social good and provide insights into human
behaviour on a large scale. The size of the datasets analysed ranges from 500
million to 300 billion phone records, covering millions of people. The key
contributions are two-fold:
1. Big Data for Social Good: Through prediction algorithms the results show
how mobile phone data can be useful to predict important socio-economic
indicators, such as income, illiteracy and poverty in developing countries.
Such knowledge can be used to identify where vulnerable groups in society are,
reduce economic shocks and is a critical component for monitoring poverty rates
over time. Further, the dissertation demonstrates how mobile phone data can be
used to better understand human behaviour during large shocks in society,
exemplified by an analysis of data from the terror attack in Norway and a
natural disaster on the south-coast in Bangladesh. This work leads to an
increased understanding of how information spreads, and how millions of people
move around. The intention is to identify displaced people faster, cheaper and
more accurately than existing survey-based methods.
2. Big Data for efficient marketing: Finally, the dissertation offers an
insight into how anonymised mobile phone data can be used to map out large
social networks, covering millions of people, to understand how products spread
inside these networks. Results show that by including social patterns and
machine learning techniques in a large-scale marketing experiment in Asia, the
adoption rate is increased by 13 times compared to the approach used by
experienced marketers. A data-driven and scientific approach to marketing,
through more tailored campaigns, contributes to less irrelevant offers for the
customers, and better cost efficiency for the companies.Comment: 166 pages, PHD thesi
Unsupervised Graph-Based Similarity Learning Using Heterogeneous Features.
Relational data refers to data that contains explicit relations among objects. Nowadays, relational
data are universal and have a broad appeal in many different application domains. The
problem of estimating similarity between objects is a core requirement for many standard
Machine Learning (ML), Natural Language Processing (NLP) and Information Retrieval
(IR) problems such as clustering, classiffication, word sense disambiguation, etc. Traditional
machine learning approaches represent the data using simple, concise representations such
as feature vectors. While this works very well for homogeneous data, i.e, data with a single
feature type such as text, it does not exploit the availability of dfferent feature types fully.
For example, scientic publications have text, citations, authorship information, venue information.
Each of the features can be used for estimating similarity. Representing such
objects has been a key issue in efficient mining (Getoor and Taskar, 2007). In this thesis,
we propose natural representations for relational data using multiple, connected layers of
graphs; one for each feature type. Also, we propose novel algorithms for estimating similarity
using multiple heterogeneous features. Also, we present novel algorithms for tasks like topic detection and music recommendation using the estimated similarity measure. We
demonstrate superior performance of the proposed algorithms (root mean squared error of
24.81 on the Yahoo! KDD Music recommendation data set and classiffication accuracy of
88% on the ACL Anthology Network data set) over many of the state of the art algorithms,
such as Latent Semantic Analysis (LSA), Multiple Kernel Learning (MKL) and spectral
clustering and baselines on large, standard data sets.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89824/1/mpradeep_1.pd
High-Performance Modelling and Simulation for Big Data Applications
This open access book was prepared as a Final Publication of the COST Action IC1406 âHigh-Performance Modelling and Simulation for Big Data Applications (cHiPSet)â project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications