54 research outputs found
Deep Architectures and Ensembles for Semantic Video Classification
This work addresses the problem of accurate semantic labelling of short
videos. To this end, a multitude of different deep nets, ranging from
traditional recurrent neural networks (LSTM, GRU), temporal agnostic networks
(FV,VLAD,BoW), fully connected neural networks mid-stage AV fusion and others.
Additionally, we also propose a residual architecture-based DNN for video
classification, with state-of-the art classification performance at
significantly reduced complexity. Furthermore, we propose four new approaches
to diversity-driven multi-net ensembling, one based on fast correlation measure
and three incorporating a DNN-based combiner. We show that significant
performance gains can be achieved by ensembling diverse nets and we investigate
factors contributing to high diversity. Based on the extensive YouTube8M
dataset, we provide an in-depth evaluation and analysis of their behaviour. We
show that the performance of the ensemble is state-of-the-art achieving the
highest accuracy on the YouTube-8M Kaggle test data. The performance of the
ensemble of classifiers was also evaluated on the HMDB51 and UCF101 datasets,
and show that the resulting method achieves comparable accuracy with
state-of-the-art methods using similar input features
REMAP: Multi-layer entropy-guided pooling of dense CNN features for image retrieval
This paper addresses the problem of very large-scale image retrieval,
focusing on improving its accuracy and robustness. We target enhanced
robustness of search to factors such as variations in illumination, object
appearance and scale, partial occlusions, and cluttered backgrounds -
particularly important when search is performed across very large datasets with
significant variability. We propose a novel CNN-based global descriptor, called
REMAP, which learns and aggregates a hierarchy of deep features from multiple
CNN layers, and is trained end-to-end with a triplet loss. REMAP explicitly
learns discriminative features which are mutually-supportive and complementary
at various semantic levels of visual abstraction. These dense local features
are max-pooled spatially at each layer, within multi-scale overlapping regions,
before aggregation into a single image-level descriptor. To identify the
semantically useful regions and layers for retrieval, we propose to measure the
information gain of each region and layer using KL-divergence. Our system
effectively learns during training how useful various regions and layers are
and weights them accordingly. We show that such relative entropy-guided
aggregation outperforms classical CNN-based aggregation controlled by SGD. The
entire framework is trained in an end-to-end fashion, outperforming the latest
state-of-the-art results. On image retrieval datasets Holidays, Oxford and
MPEG, the REMAP descriptor achieves mAP of 95.5%, 91.5%, and 80.1%
respectively, outperforming any results published to date. REMAP also formed
the core of the winning submission to the Google Landmark Retrieval Challenge
on Kaggle.Comment: Submitted to IEEE Trans. Image Processing on 24 May 2018, published
22 May 201
Can DMD obtain a scene background in color?
A background model describes a scene without any foreground objects and has a number of applications, ranging from video surveillance to computational photography. Recent studies have introduced the method of Dynamic Mode Decomposition (DMD) for robustly separating video frames into a background model and foreground components. While the method introduced operates by converting color images to grayscale, we in this study propose a technique to obtain the background model in the color domain. The effectiveness of our technique is demonstrated using a publicly available Scene Background Initialisation (SBI) dataset. Our results both qualitatively and quantitatively show that DMD can successfully obtain a colored background model
On aggregation of local binary descriptors
This paper addresses the problem of aggregating local binary descriptors for large scale image retrieval in mobile scenarios. Binary descriptors are becoming increasingly popular, especially in mobile applications, as they deliver high matching speed, have a small memory footprint and are fast to extract. However, little research has been done on how to efficiently aggregate binary descriptors. Direct application of methods developed for conventional descriptors, such as SIFT, results in unsatisfactory performance. In this paper we introduce and evaluate several algorithms to compress high-dimensional binary local descriptors, for efficient retrieval in large databases. In addition, we propose a robust global image representation; Binary Robust Visual Descriptor (B-RVD), with rank-based multi-assignment of local descriptors and direction-based aggregation, achieved by the use of L1-norm on residual vectors. The performance of the B-RVD is further improved by balancing the variances of residual vector directions in order to maximize the discriminatory power of the aggregated vectors. Standard datasets and measures have been used for evaluation showing significant improvement of around 4% mean Average Precision as compared to the state-of-the-art
Single-cell Subcellular Protein Localisation Using Novel Ensembles of Diverse Deep Architectures
Unravelling protein distributions within individual cells is key to
understanding their function and state and indispensable to developing new
treatments. Here we present the Hybrid subCellular Protein Localiser (HCPL),
which learns from weakly labelled data to robustly localise single-cell
subcellular protein patterns. It comprises innovative DNN architectures
exploiting wavelet filters and learnt parametric activations that successfully
tackle drastic cell variability. HCPL features correlation-based ensembling of
novel architectures that boosts performance and aids generalisation.
Large-scale data annotation is made feasible by our "AI-trains-AI" approach,
which determines the visual integrity of cells and emphasises reliable labels
for efficient training. In the Human Protein Atlas context, we demonstrate that
HCPL defines state-of-the-art in the single-cell classification of protein
localisation patterns. To better understand the inner workings of HCPL and
assess its biological relevance, we analyse the contributions of each system
component and dissect the emergent features from which the localisation
predictions are derived
- …