1,287 research outputs found
A Data-Driven Approach for Tag Refinement and Localization in Web Videos
Tagging of visual content is becoming more and more widespread as web-based
services and social networks have popularized tagging functionalities among
their users. These user-generated tags are used to ease browsing and
exploration of media collections, e.g. using tag clouds, or to retrieve
multimedia content. However, not all media are equally tagged by users. Using
the current systems is easy to tag a single photo, and even tagging a part of a
photo, like a face, has become common in sites like Flickr and Facebook. On the
other hand, tagging a video sequence is more complicated and time consuming, so
that users just tag the overall content of a video. In this paper we present a
method for automatic video annotation that increases the number of tags
originally provided by users, and localizes them temporally, associating tags
to keyframes. Our approach exploits collective knowledge embedded in
user-generated tags and web sources, and visual similarity of keyframes and
images uploaded to social sites like YouTube and Flickr, as well as web sources
like Google and Bing. Given a keyframe, our method is able to select on the fly
from these visual sources the training exemplars that should be the most
relevant for this test sample, and proceeds to transfer labels across similar
images. Compared to existing video tagging approaches that require training
classifiers for each tag, our system has few parameters, is easy to implement
and can deal with an open vocabulary scenario. We demonstrate the approach on
tag refinement and localization on DUT-WEBV, a large dataset of web videos, and
show state-of-the-art results.Comment: Preprint submitted to Computer Vision and Image Understanding (CVIU
Exquisitor: Breaking the Interaction Barrier for Exploration of 100 Million Images
International audienceIn this demonstration, we present Exquisitor, a media explorer capable of learning user preferences in real-time during interactions with the 99.2 million images of YFCC100M. Exquisitor owes its efficiency to innovations in data representation, compression, and indexing. Exquisitor can complete each interaction round, including learning preferences and presenting the most relevant results, in less than 30 ms using only a single CPU core and modest RAM. In short, Exquisitor can bring large-scale interactive learning to standard desktops and laptops, and even high-end mobile devices
Effects of environmental colour on mood: a wearable life colour capture device
Colour is everywhere in our daily lives and impacts things like our mood, yet we rarely take notice of it. One method of capturing and analysing the predominant colours that we encounter is through visual lifelogging devices such as the SenseCam. However an issue related to these devices is the privacy concerns of capturing image level detail. Therefore in this work we demonstrate a hardware prototype wearable camera that captures only one pixel - of the dominant colour prevelant in front of the user, thus circumnavigating the privacy concerns raised in relation to lifelogging. To simulate whether the capture of dominant colour would be sufficient we report on a simulation carried out on 1.2 million SenseCam images captured by a group of 20 individuals.
We compare the dominant colours that different groups of people are exposed to and show that useful inferences can be made from this data. We believe our prototype may be valuable in future experiments to capture colour correlated associated with an individual's mood
Exploiting multimedia in creating and analysing multimedia Web archives
The data contained on the web and the social web are inherently multimedia and consist of a mixture of textual, visual and audio modalities. Community memories embodied on the web and social web contain a rich mixture of data from these modalities. In many ways, the web is the greatest resource ever created by human-kind. However, due to the dynamic and distributed nature of the web, its content changes, appears and disappears on a daily basis. Web archiving provides a way of capturing snapshots of (parts of) the web for preservation and future analysis. This paper provides an overview of techniques we have developed within the context of the EU funded ARCOMEM (ARchiving COmmunity MEMories) project to allow multimedia web content to be leveraged during the archival process and for post-archival analysis. Through a set of use cases, we explore several practical applications of multimedia analytics within the realm of web archiving, web archive analysis and multimedia data on the web in general
Compact Hash Codes for Efficient Visual Descriptors Retrieval in Large Scale Databases
In this paper we present an efficient method for visual descriptors retrieval
based on compact hash codes computed using a multiple k-means assignment. The
method has been applied to the problem of approximate nearest neighbor (ANN)
search of local and global visual content descriptors, and it has been tested
on different datasets: three large scale public datasets of up to one billion
descriptors (BIGANN) and, supported by recent progress in convolutional neural
networks (CNNs), also on the CIFAR-10 and MNIST datasets. Experimental results
show that, despite its simplicity, the proposed method obtains a very high
performance that makes it superior to more complex state-of-the-art methods
- …