75 research outputs found
Recurrent Neural Networks for Online Video Popularity Prediction
In this paper, we address the problem of popularity prediction of online
videos shared in social media. We prove that this challenging task can be
approached using recently proposed deep neural network architectures. We cast
the popularity prediction problem as a classification task and we aim to solve
it using only visual cues extracted from videos. To that end, we propose a new
method based on a Long-term Recurrent Convolutional Network (LRCN) that
incorporates the sequentiality of the information in the model. Results
obtained on a dataset of over 37'000 videos published on Facebook show that
using our method leads to over 30% improvement in prediction performance over
the traditional shallow approaches and can provide valuable insights for
content creators
What Makes a Place? Building Bespoke Place Dependent Object Detectors for Robotics
This paper is about enabling robots to improve their perceptual performance
through repeated use in their operating environment, creating local expert
detectors fitted to the places through which a robot moves. We leverage the
concept of 'experiences' in visual perception for robotics, accounting for bias
in the data a robot sees by fitting object detector models to a particular
place. The key question we seek to answer in this paper is simply: how do we
define a place? We build bespoke pedestrian detector models for autonomous
driving, highlighting the necessary trade off between generalisation and model
capacity as we vary the extent of the place we fit to. We demonstrate a
sizeable performance gain over a current state-of-the-art detector when using
computationally lightweight bespoke place-fitted detector models.Comment: IROS 201
Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval
This paper presents a new state-of-the-art for document image classification
and retrieval, using features learned by deep convolutional neural networks
(CNNs). In object and scene analysis, deep neural nets are capable of learning
a hierarchical chain of abstraction from pixel inputs to concise and
descriptive representations. The current work explores this capacity in the
realm of document analysis, and confirms that this representation strategy is
superior to a variety of popular hand-crafted alternatives. Experiments also
show that (i) features extracted from CNNs are robust to compression, (ii) CNNs
trained on non-document images transfer well to document analysis tasks, and
(iii) enforcing region-specific feature-learning is unnecessary given
sufficient training data. This work also makes available a new labelled subset
of the IIT-CDIP collection, containing 400,000 document images across 16
categories, useful for training new CNNs for document analysis
Orientation covariant aggregation of local descriptors with embeddings
Image search systems based on local descriptors typically achieve orientation
invariance by aligning the patches on their dominant orientations. Albeit
successful, this choice introduces too much invariance because it does not
guarantee that the patches are rotated consistently. This paper introduces an
aggregation strategy of local descriptors that achieves this covariance
property by jointly encoding the angle in the aggregation stage in a continuous
manner. It is combined with an efficient monomial embedding to provide a
codebook-free method to aggregate local descriptors into a single vector
representation. Our strategy is also compatible and employed with several
popular encoding methods, in particular bag-of-words, VLAD and the Fisher
vector. Our geometric-aware aggregation strategy is effective for image search,
as shown by experiments performed on standard benchmarks for image and
particular object retrieval, namely Holidays and Oxford buildings.Comment: European Conference on Computer Vision (2014
Leveraging Image based Prior for Visual Place Recognition
In this study, we propose a novel scene descriptor for visual place
recognition. Unlike popular bag-of-words scene descriptors which rely on a
library of vector quantized visual features, our proposed descriptor is based
on a library of raw image data, such as publicly available photo collections
from Google StreetView and Flickr. The library images need not to be associated
with spatial information regarding the viewpoint and orientation of the scene.
As a result, these images are cheaper than the database images; in addition,
they are readily available. Our proposed descriptor directly mines the image
library to discover landmarks (i.e., image patches) that suitably match an
input query/database image. The discovered landmarks are then compactly
described by their pose and shape (i.e., library image ID, bounding boxes) and
used as a compact discriminative scene descriptor for the input image. We
evaluate the effectiveness of our scene description framework by comparing its
performance to that of previous approaches.Comment: 8 pages, 6 figures, preprint. Accepted for publication in MVA2015
(oral presentation
- …