27 research outputs found
Exploiting Social Annotation for Automatic Resource Discovery
Information integration applications, such as mediators or mashups, that
require access to information resources currently rely on users manually
discovering and integrating them in the application. Manual resource discovery
is a slow process, requiring the user to sift through results obtained via
keyword-based search. Although search methods have advanced to include evidence
from document contents, its metadata and the contents and link structure of the
referring pages, they still do not adequately cover information sources --
often called ``the hidden Web''-- that dynamically generate documents in
response to a query. The recently popular social bookmarking sites, which allow
users to annotate and share metadata about various information sources, provide
rich evidence for resource discovery. In this paper, we describe a
probabilistic model of the user annotation process in a social bookmarking
system del.icio.us. We then use the model to automatically find resources
relevant to a particular information domain. Our experimental results on data
obtained from \emph{del.icio.us} show this approach as a promising method for
helping automate the resource discovery task.Comment: 6 pages, submitted to AAAI07 workshop on Information Integration on
the We
Growing a Tree in the Forest: Constructing Folksonomies by Integrating Structured Metadata
Many social Web sites allow users to annotate the content with descriptive
metadata, such as tags, and more recently to organize content hierarchically.
These types of structured metadata provide valuable evidence for learning how a
community organizes knowledge. For instance, we can aggregate many personal
hierarchies into a common taxonomy, also known as a folksonomy, that will aid
users in visualizing and browsing social content, and also to help them in
organizing their own content. However, learning from social metadata presents
several challenges, since it is sparse, shallow, ambiguous, noisy, and
inconsistent. We describe an approach to folksonomy learning based on
relational clustering, which exploits structured metadata contained in personal
hierarchies. Our approach clusters similar hierarchies using their structure
and tag statistics, then incrementally weaves them into a deeper, bushier tree.
We study folksonomy learning using social metadata extracted from the
photo-sharing site Flickr, and demonstrate that the proposed approach addresses
the challenges. Moreover, comparing to previous work, the approach produces
larger, more accurate folksonomies, and in addition, scales better.Comment: 10 pages, To appear in the Proceedings of ACM SIGKDD Conference on
Knowledge Discovery and Data Mining(KDD) 201
Modeling Social Annotation: a Bayesian Approach
Collaborative tagging systems, such as Delicious, CiteULike, and others,
allow users to annotate resources, e.g., Web pages or scientific papers, with
descriptive labels called tags. The social annotations contributed by thousands
of users, can potentially be used to infer categorical knowledge, classify
documents or recommend new relevant information. Traditional text inference
methods do not make best use of social annotation, since they do not take into
account variations in individual users' perspectives and vocabulary. In a
previous work, we introduced a simple probabilistic model that takes interests
of individual annotators into account in order to find hidden topics of
annotated resources. Unfortunately, that approach had one major shortcoming:
the number of topics and interests must be specified a priori. To address this
drawback, we extend the model to a fully Bayesian framework, which offers a way
to automatically estimate these numbers. In particular, the model allows the
number of interests and topics to change as suggested by the structure of the
data. We evaluate the proposed model in detail on the synthetic and real-world
data by comparing its performance to Latent Dirichlet Allocation on the topic
extraction task. For the latter evaluation, we apply the model to infer topics
of Web resources from social annotations obtained from Delicious in order to
discover new resources similar to a specified one. Our empirical results
demonstrate that the proposed model is a promising method for exploiting social
knowledge contained in user-generated annotations.Comment: 29 Pages, Accepted for publication at ACM Transactions on Knowledge
Discovery from Data(TKDD) on March 2, 201
Analyzing microblogs with affinity propagation
Recently, there has been a great deal of interest in analyz-ing inherent structures in posts on microblogs such as Twit-ter. While many works utilize a well-known topic modeling technique, we instead propose to apply Affinity Propaga-tion [4] (AP) to analyze such a corpus, and we hypothesize that AP may provide different perspective to the traditional approach. Our preliminary analysis raises some interesting facts and issues, which suggest future research directions
Improving the Navigability of Tagging Systems with Hierarchically Constructed Resource Lists and Tag Trails
Recent research has shown that the navigability of tagging systems leaves much to be desired. In general, it was observed that tagging systems are not navigable if the resource lists of the tagging system are limited to a certain factor k. Hence, in this paper a novel resource list generation approach is introduced that addresses this issue. The proposed approach is based on a hierarchical network model. The paper shows through a number of experiments based on a tagging dataset from a large online encyclopedia system called Austria-Forum, that the new algorithm is able to create tag network structures that are navigable in an efficient manner. Contrary to previous work, the method featured in this paper is completely generic, i.e. the introduced resource list generation approach could be used to improve the navigability of any tagging system. This work is relevant for researchers interested in navigability of emergent hypertext structures and for engineers seeking to improve the navigability of tagging systems