7,220 research outputs found
DocTag2Vec: An Embedding Based Multi-label Learning Approach for Document Tagging
Tagging news articles or blog posts with relevant tags from a collection of
predefined ones is coined as document tagging in this work. Accurate tagging of
articles can benefit several downstream applications such as recommendation and
search. In this work, we propose a novel yet simple approach called DocTag2Vec
to accomplish this task. We substantially extend Word2Vec and Doc2Vec---two
popular models for learning distributed representation of words and documents.
In DocTag2Vec, we simultaneously learn the representation of words, documents,
and tags in a joint vector space during training, and employ the simple
-nearest neighbor search to predict tags for unseen documents. In contrast
to previous multi-label learning methods, DocTag2Vec directly deals with raw
text instead of provided feature vector, and in addition, enjoys advantages
like the learning of tag representation, and the ability of handling newly
created tags. To demonstrate the effectiveness of our approach, we conduct
experiments on several datasets and show promising results against
state-of-the-art methods.Comment: 10 page
Evolutionary Subject Tagging in the Humanities; Supporting Discovery and Examination in Digital Cultural Landscapes
In this paper, the authors attempt to identify problematic issues for subject tagging in the humanities, particularly those associated with information objects in digital formats. In the third major section, the authors identify a number of assumptions that lie behind the current practice of subject classification that we think should be challenged. We move then to propose features of classification systems that could increase their effectiveness. These emerged as recurrent themes in many of the conversations with scholars, consultants, and colleagues. Finally, we suggest next steps that we believe will help scholars and librarians develop better subject classification systems to support research in the humanities.NEH Office of Digital Humanities: Digital Humanities Start-Up Grant (HD-51166-10
Bridging the gap between folksonomies and the semantic web: an experience report
Abstract. While folksonomies allow tagging of similar resources with a variety of tags, their content retrieval mechanisms are severely hampered by being agnostic to the relations that exist between these tags. To overcome this limitation, several methods have been proposed to find groups of implicitly inter-related tags. We believe that content retrieval can be further improved by making the relations between tags explicit. In this paper we propose the semantic enrichment of folksonomy tags with explicit relations by harvesting the Semantic Web, i.e., dynamically selecting and combining relevant bits of knowledge from online ontologies. Our experimental results show that, while semantic enrichment needs to be aware of the particular characteristics of folksonomies and the Semantic Web, it is beneficial for both.
Tag-Aware Recommender Systems: A State-of-the-art Survey
In the past decade, Social Tagging Systems have attracted increasing
attention from both physical and computer science communities. Besides the
underlying structure and dynamics of tagging systems, many efforts have been
addressed to unify tagging information to reveal user behaviors and
preferences, extract the latent semantic relations among items, make
recommendations, and so on. Specifically, this article summarizes recent
progress about tag-aware recommender systems, emphasizing on the contributions
from three mainstream perspectives and approaches: network-based methods,
tensor-based methods, and the topic-based methods. Finally, we outline some
other tag-related works and future challenges of tag-aware recommendation
algorithms.Comment: 19 pages, 3 figure
Discovering the Impact of Knowledge in Recommender Systems: A Comparative Study
Recommender systems engage user profiles and appropriate filtering techniques
to assist users in finding more relevant information over the large volume of
information. User profiles play an important role in the success of
recommendation process since they model and represent the actual user needs.
However, a comprehensive literature review of recommender systems has
demonstrated no concrete study on the role and impact of knowledge in user
profiling and filtering approache. In this paper, we review the most prominent
recommender systems in the literature and examine the impression of knowledge
extracted from different sources. We then come up with this finding that
semantic information from the user context has substantial impact on the
performance of knowledge based recommender systems. Finally, some new clues for
improvement the knowledge-based profiles have been proposed.Comment: 14 pages, 3 tables; International Journal of Computer Science &
Engineering Survey (IJCSES) Vol.2, No.3, August 201
Word Embeddings for Entity-annotated Texts
Learned vector representations of words are useful tools for many information
retrieval and natural language processing tasks due to their ability to capture
lexical semantics. However, while many such tasks involve or even rely on named
entities as central components, popular word embedding models have so far
failed to include entities as first-class citizens. While it seems intuitive
that annotating named entities in the training corpus should result in more
intelligent word features for downstream tasks, performance issues arise when
popular embedding approaches are naively applied to entity annotated corpora.
Not only are the resulting entity embeddings less useful than expected, but one
also finds that the performance of the non-entity word embeddings degrades in
comparison to those trained on the raw, unannotated corpus. In this paper, we
investigate approaches to jointly train word and entity embeddings on a large
corpus with automatically annotated and linked entities. We discuss two
distinct approaches to the generation of such embeddings, namely the training
of state-of-the-art embeddings on raw-text and annotated versions of the
corpus, as well as node embeddings of a co-occurrence graph representation of
the annotated corpus. We compare the performance of annotated embeddings and
classical word embeddings on a variety of word similarity, analogy, and
clustering evaluation tasks, and investigate their performance in
entity-specific tasks. Our findings show that it takes more than training
popular word embedding models on an annotated corpus to create entity
embeddings with acceptable performance on common test cases. Based on these
results, we discuss how and when node embeddings of the co-occurrence graph
representation of the text can restore the performance.Comment: This paper is accepted in 41st European Conference on Information
Retrieva
A Data-Driven Approach for Tag Refinement and Localization in Web Videos
Tagging of visual content is becoming more and more widespread as web-based
services and social networks have popularized tagging functionalities among
their users. These user-generated tags are used to ease browsing and
exploration of media collections, e.g. using tag clouds, or to retrieve
multimedia content. However, not all media are equally tagged by users. Using
the current systems is easy to tag a single photo, and even tagging a part of a
photo, like a face, has become common in sites like Flickr and Facebook. On the
other hand, tagging a video sequence is more complicated and time consuming, so
that users just tag the overall content of a video. In this paper we present a
method for automatic video annotation that increases the number of tags
originally provided by users, and localizes them temporally, associating tags
to keyframes. Our approach exploits collective knowledge embedded in
user-generated tags and web sources, and visual similarity of keyframes and
images uploaded to social sites like YouTube and Flickr, as well as web sources
like Google and Bing. Given a keyframe, our method is able to select on the fly
from these visual sources the training exemplars that should be the most
relevant for this test sample, and proceeds to transfer labels across similar
images. Compared to existing video tagging approaches that require training
classifiers for each tag, our system has few parameters, is easy to implement
and can deal with an open vocabulary scenario. We demonstrate the approach on
tag refinement and localization on DUT-WEBV, a large dataset of web videos, and
show state-of-the-art results.Comment: Preprint submitted to Computer Vision and Image Understanding (CVIU
- …