68 research outputs found

    DocTag2Vec: An Embedding Based Multi-label Learning Approach for Document Tagging

    Full text link
    Tagging news articles or blog posts with relevant tags from a collection of predefined ones is coined as document tagging in this work. Accurate tagging of articles can benefit several downstream applications such as recommendation and search. In this work, we propose a novel yet simple approach called DocTag2Vec to accomplish this task. We substantially extend Word2Vec and Doc2Vec---two popular models for learning distributed representation of words and documents. In DocTag2Vec, we simultaneously learn the representation of words, documents, and tags in a joint vector space during training, and employ the simple kk-nearest neighbor search to predict tags for unseen documents. In contrast to previous multi-label learning methods, DocTag2Vec directly deals with raw text instead of provided feature vector, and in addition, enjoys advantages like the learning of tag representation, and the ability of handling newly created tags. To demonstrate the effectiveness of our approach, we conduct experiments on several datasets and show promising results against state-of-the-art methods.Comment: 10 page

    Automatic tagging and geotagging in video collections and communities

    Get PDF
    Automatically generated tags and geotags hold great promise to improve access to video collections and online communi- ties. We overview three tasks offered in the MediaEval 2010 benchmarking initiative, for each, describing its use scenario, definition and the data set released. For each task, a reference algorithm is presented that was used within MediaEval 2010 and comments are included on lessons learned. The Tagging Task, Professional involves automatically matching episodes in a collection of Dutch television with subject labels drawn from the keyword thesaurus used by the archive staff. The Tagging Task, Wild Wild Web involves automatically predicting the tags that are assigned by users to their online videos. Finally, the Placing Task requires automatically assigning geo-coordinates to videos. The specification of each task admits the use of the full range of available information including user-generated metadata, speech recognition transcripts, audio, and visual features

    Automated Text Abstraction from Documents and Webpages Metadata using Probabilistic Clusteringalgorithms

    Get PDF
    Annotations are comments, notes, explanations, tagsor other types of external remarks. Annotation can be added to a text document or few portions of document or to a webpage. Annotation helps effective information retrievals. Webpage metadata is the data related with website, it is machine understandable information about web resources or other tags.Collaborative annotations are based on user created tags to annotate new objects. These tags are related user created labels for entities and allows user to organize and index the contents. Tagging is the act of adding keywords to objects. There have been significant amount of work to be performed in coming up with the tags for text documents or other resources like webpages, images and videos. Automated Annotation System (AAS) which uses algorithms like K-Means and Distributed Hash Table (DHT) to automatically create the attribute or annotation from documents or metadata of webpages. This proposed annotation technique provides the processing of metadata and/or text to efficiently come up with annotations rather than manually understanding the metadata or analyzing the text

    Tag-Aware Recommender Systems: A State-of-the-art Survey

    Get PDF
    In the past decade, Social Tagging Systems have attracted increasing attention from both physical and computer science communities. Besides the underlying structure and dynamics of tagging systems, many efforts have been addressed to unify tagging information to reveal user behaviors and preferences, extract the latent semantic relations among items, make recommendations, and so on. Specifically, this article summarizes recent progress about tag-aware recommender systems, emphasizing on the contributions from three mainstream perspectives and approaches: network-based methods, tensor-based methods, and the topic-based methods. Finally, we outline some other tag-related works and future challenges of tag-aware recommendation algorithms.Comment: 19 pages, 3 figure

    Adaptive Technique for Document Annotation to Identify Attributes of Interest

    Get PDF
    Many application domains generate and share information which describes their products and services. Such description contains unstructured information. So, it is always difficult to find the useful metadata. The information extraction algorithms are very expensive or inaccurate when operating on such unstructured information. This paper proposes adaptive technique for document annotation process to retrieve the useful information. This approach is based on Collaborative Adaptive Data Sharing (CADS) platform for document annotation. A CADS uses query workload to direct the annotation process. A key attribute of CADS is that it identifies important data attributes of the application. Further it uses this information to direct the data insertion and querying

    Propagating fine-grained topic labels in news snippets

    Get PDF
    We propose an unsupervised method for propagating automatically extracted fine-grained topic labels among news items to improve their topic description for subsequent text classification procedure. This method compares vector representations of news items and assigns to each news item the label of its closest neighbour with a different topic label. Results obtained show that high precision can be achieved in propagating the top ranked topic label, and that 2-gram and 3-gram feature representations optimize the precision
    corecore