1,631 research outputs found
A Data-Driven Approach for Tag Refinement and Localization in Web Videos
Tagging of visual content is becoming more and more widespread as web-based
services and social networks have popularized tagging functionalities among
their users. These user-generated tags are used to ease browsing and
exploration of media collections, e.g. using tag clouds, or to retrieve
multimedia content. However, not all media are equally tagged by users. Using
the current systems is easy to tag a single photo, and even tagging a part of a
photo, like a face, has become common in sites like Flickr and Facebook. On the
other hand, tagging a video sequence is more complicated and time consuming, so
that users just tag the overall content of a video. In this paper we present a
method for automatic video annotation that increases the number of tags
originally provided by users, and localizes them temporally, associating tags
to keyframes. Our approach exploits collective knowledge embedded in
user-generated tags and web sources, and visual similarity of keyframes and
images uploaded to social sites like YouTube and Flickr, as well as web sources
like Google and Bing. Given a keyframe, our method is able to select on the fly
from these visual sources the training exemplars that should be the most
relevant for this test sample, and proceeds to transfer labels across similar
images. Compared to existing video tagging approaches that require training
classifiers for each tag, our system has few parameters, is easy to implement
and can deal with an open vocabulary scenario. We demonstrate the approach on
tag refinement and localization on DUT-WEBV, a large dataset of web videos, and
show state-of-the-art results.Comment: Preprint submitted to Computer Vision and Image Understanding (CVIU
Learning to Hash-tag Videos with Tag2Vec
User-given tags or labels are valuable resources for semantic understanding
of visual media such as images and videos. Recently, a new type of labeling
mechanism known as hash-tags have become increasingly popular on social media
sites. In this paper, we study the problem of generating relevant and useful
hash-tags for short video clips. Traditional data-driven approaches for tag
enrichment and recommendation use direct visual similarity for label transfer
and propagation. We attempt to learn a direct low-cost mapping from video to
hash-tags using a two step training process. We first employ a natural language
processing (NLP) technique, skip-gram models with neural network training to
learn a low-dimensional vector representation of hash-tags (Tag2Vec) using a
corpus of 10 million hash-tags. We then train an embedding function to map
video features to the low-dimensional Tag2vec space. We learn this embedding
for 29 categories of short video clips with hash-tags. A query video without
any tag-information can then be directly mapped to the vector space of tags
using the learned embedding and relevant tags can be found by performing a
simple nearest-neighbor retrieval in the Tag2Vec space. We validate the
relevance of the tags suggested by our system qualitatively and quantitatively
with a user study
Incremental Tag Suggestion for Landmark Image Collections
In recent social media applications, descriptive information is collected through user tagging, such as face recognition, and automatic environment sensing, such as GPS. There are many applications that recognize landmarks using information gathered from GPS data. However, GPS is dependent on the location of the camera, not the landmark. In this research, we propose an automatic landmark tagging scheme using secondary regions to distinguish between similar landmarks. We propose two algorithms: 1) landmark tagging by secondary objects and 2) automatic new landmark recognition. Images of 30 famous landmarks from various public databases were used in our experiment. Results show increments of tagged areas and the improvement of landmark tagging accuracy
Movie Tags Prediction and Segmentation Using Deep Learning
The sheer volume of movies generated these days requires an automated analytics for ef cient
classi cation, query-based search, and extraction of desired information. These tasks can only be ef ciently
performed by a machine learning based algorithm. We address the same issue in this paper by proposing a
deep learning based technique for predicting the relevant tags for a movie and segmenting the movie with
respect to the predicted tags. We construct a tag vocabulary and create the corresponding dataset in order to
train a deep learning model. Subsequently, we propose an ef cient shot detection algorithm to nd the key
frames in the movie. The extracted key frames are analyzed by the deep learning model to predict the top
three tags for each frame. The tags are then assigned weighted scores and are ltered to generate a compact
set of most relevant tags. This process also generates a corpus which is further used to segment a movie based
on a selected tag. We present a rigorous analysis of the segmentation quality with respect to the number of
tags selected for the segmentation. Our detailed experiments demonstrate that the proposed technique is not
only ef cacious in predicting the most relevant tags for a movie, but also in segmenting the movie with
respect to the selected tags with a high accuracy
Social software for music
Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 200
E-SCAPE New tools and new opportunities for the localization of Expo 2015 general interest services along the Canale Cavour, a backbone of the Milan-Turin urban region
Final publication of the Alta Scuola Politecnica project "E-SCAPE New tools and new opportunities for the localization of Expo 2015 general interest services along the Canale Cavour, a backbone of the Milan-Turin urban region"
- …