4 research outputs found
Discovering Organizational Correlations from Twitter
Organizational relationships are usually very complex in real life. It is
difficult or impossible to directly measure such correlations among different
organizations, because important information is usually not publicly available
(e.g., the correlations of terrorist organizations). Nowadays, an increasing
amount of organizational information can be posted online by individuals and
spread instantly through Twitter. Such information can be crucial for detecting
organizational correlations. In this paper, we study the problem of discovering
correlations among organizations from Twitter. Mining organizational
correlations is a very challenging task due to the following reasons: a) Data
in Twitter occurs as large volumes of mixed information. The most relevant
information about organizations is often buried. Thus, the organizational
correlations can be scattered in multiple places, represented by different
forms; b) Making use of information from Twitter collectively and judiciously
is difficult because of the multiple representations of organizational
correlations that are extracted. In order to address these issues, we propose
multi-CG (multiple Correlation Graphs based model), an unsupervised framework
that can learn a consensus of correlations among organizations based on
multiple representations extracted from Twitter, which is more accurate and
robust than correlations based on a single representation. Empirical study
shows that the consensus graph extracted from Twitter can capture the
organizational correlations effectively.Comment: 11 pages, 4 figure
Relation Strength-Aware Clustering of Heterogeneous Information Networks with Incomplete Attributes
With the rapid development of online social media, online shopping sites and
cyber-physical systems, heterogeneous information networks have become
increasingly popular and content-rich over time. In many cases, such networks
contain multiple types of objects and links, as well as different kinds of
attributes. The clustering of these objects can provide useful insights in many
applications. However, the clustering of such networks can be challenging since
(a) the attribute values of objects are often incomplete, which implies that an
object may carry only partial attributes or even no attributes to correctly
label itself; and (b) the links of different types may carry different kinds of
semantic meanings, and it is a difficult task to determine the nature of their
relative importance in helping the clustering for a given purpose. In this
paper, we address these challenges by proposing a model-based clustering
algorithm. We design a probabilistic model which clusters the objects of
different types into a common hidden space, by using a user-specified set of
attributes, as well as the links from different relations. The strengths of
different types of links are automatically learned, and are determined by the
given purpose of clustering. An iterative algorithm is designed for solving the
clustering problem, in which the strengths of different types of links and the
quality of clustering results mutually enhance each other. Our experimental
results on real and synthetic data sets demonstrate the effectiveness and
efficiency of the algorithm.Comment: VLDB201
Recommended from our members
Exploiting Social Networks for Recommendation in Online Image Sharing Systems
This thesis aims to demonstrate the distinct and so far little explored value of knowledge derived from social interaction data within large web-scale image sharing systems like Flickr, Picasa Web, Facebook and others for image recommendation. I have shown how such systems can be significantly improved through personalisation that takes into account the social context of users by modelling their interactions by mining data, building and evaluating systems that incorporate this information. These improvements allow users to search and browse large online image collections more quickly and to find results that more accurately match their personal information needs when compared to existing methods.
Traditional information retrieval and recommendation datasets are contrived to provide stable baselines for researchers to compare against but they rarely accurately reflect the media systems users tend to encounter online. The online photo sharing site Flickr provides rich and varied data that can be used by researchers to analyse and understand users’ interactions with images and with each other. I analyse such data by modelling the connections between users as multigraphs and exploiting the resultant topologies to produce features that can be used to train recommender systems based on machine learnt classifiers.
The core contributions of this work include insight into the nature of very large-scale on- line photo collections and the communities that form around them, as well as the dynamic nature of the interactions users have with their media. I do this through the rigorous evaluation of both a probabilistic tag recommendation system and a machine learnt classifier trained to mimic user decisions regarding image preference. These implementations focus on treating the user as both a unique individual and as a member of potentially many explicit and implicit communities. I also explore the validity of the Flickr ‘Favourite’ feedback label as proxy for user preference, which is particularly important when considering other analogous media systems to which my findings transfer. My conclusions highlight how vital both
social context information and the understanding of user behaviour are for online image sharing systems.
In the field of information retrieval the diverse nature of users is often forgotten in the hunt for increases in esoteric performance metrics. This thesis places them back at the centre of the problem of multimedia information retrieval and shows how their variety and uniqueness are valuable traits that can be exploited to augment and improve the experience of browsing and searching shared online image collections