8,300 research outputs found
ModDrop: adaptive multi-modal gesture recognition
We present a method for gesture detection and localisation based on
multi-scale and multi-modal deep learning. Each visual modality captures
spatial information at a particular spatial scale (such as motion of the upper
body or a hand), and the whole system operates at three temporal scales. Key to
our technique is a training strategy which exploits: i) careful initialization
of individual modalities; and ii) gradual fusion involving random dropping of
separate channels (dubbed ModDrop) for learning cross-modality correlations
while preserving uniqueness of each modality-specific representation. We
present experiments on the ChaLearn 2014 Looking at People Challenge gesture
recognition track, in which we placed first out of 17 teams. Fusing multiple
modalities at several spatial and temporal scales leads to a significant
increase in recognition rates, allowing the model to compensate for errors of
the individual classifiers as well as noise in the separate channels.
Futhermore, the proposed ModDrop training technique ensures robustness of the
classifier to missing signals in one or several channels to produce meaningful
predictions from any number of available modalities. In addition, we
demonstrate the applicability of the proposed fusion scheme to modalities of
arbitrary nature by experiments on the same dataset augmented with audio.Comment: 14 pages, 7 figure
Classification under Streaming Emerging New Classes: A Solution using Completely Random Trees
This paper investigates an important problem in stream mining, i.e.,
classification under streaming emerging new classes or SENC. The common
approach is to treat it as a classification problem and solve it using either a
supervised learner or a semi-supervised learner. We propose an alternative
approach by using unsupervised learning as the basis to solve this problem. The
SENC problem can be decomposed into three sub problems: detecting emerging new
classes, classifying for known classes, and updating models to enable
classification of instances of the new class and detection of more emerging new
classes. The proposed method employs completely random trees which have been
shown to work well in unsupervised learning and supervised learning
independently in the literature. This is the first time, as far as we know,
that completely random trees are used as a single common core to solve all
three sub problems: unsupervised learning, supervised learning and model update
in data streams. We show that the proposed unsupervised-learning-focused method
often achieves significantly better outcomes than existing
classification-focused methods
Multimodal Classification of Urban Micro-Events
In this paper we seek methods to effectively detect urban micro-events. Urban
micro-events are events which occur in cities, have limited geographical
coverage and typically affect only a small group of citizens. Because of their
scale these are difficult to identify in most data sources. However, by using
citizen sensing to gather data, detecting them becomes feasible. The data
gathered by citizen sensing is often multimodal and, as a consequence, the
information required to detect urban micro-events is distributed over multiple
modalities. This makes it essential to have a classifier capable of combining
them. In this paper we explore several methods of creating such a classifier,
including early, late, hybrid fusion and representation learning using
multimodal graphs. We evaluate performance on a real world dataset obtained
from a live citizen reporting system. We show that a multimodal approach yields
higher performance than unimodal alternatives. Furthermore, we demonstrate that
our hybrid combination of early and late fusion with multimodal embeddings
performs best in classification of urban micro-events
Learning Deep Belief Networks from Non-Stationary Streams
Deep learning has proven to be beneficial for complex tasks such as classifying images. However, this approach has been mostly applied to static datasets. The analysis of non-stationary (e.g., concept drift) streams of data involves specific issues connected with the temporal and changing nature of the data. In this paper, we propose a proof-of-concept method, called Adaptive Deep Belief Networks, of how deep learning can be generalized to learn online from changing streams of data. We do so by exploiting the generative properties of the model to incrementally re-train the Deep Belief Network whenever new data are collected. This approach eliminates the need to store past observations and, therefore, requires only constant memory consumption. Hence, our approach can be valuable for life-long learning from non-stationary data streams. © 2012 Springer-Verlag
Dynamic Adaptation on Non-Stationary Visual Domains
Domain adaptation aims to learn models on a supervised source domain that
perform well on an unsupervised target. Prior work has examined domain
adaptation in the context of stationary domain shifts, i.e. static data sets.
However, with large-scale or dynamic data sources, data from a defined domain
is not usually available all at once. For instance, in a streaming data
scenario, dataset statistics effectively become a function of time. We
introduce a framework for adaptation over non-stationary distribution shifts
applicable to large-scale and streaming data scenarios. The model is adapted
sequentially over incoming unsupervised streaming data batches. This enables
improvements over several batches without the need for any additionally
annotated data. To demonstrate the effectiveness of our proposed framework, we
modify associative domain adaptation to work well on source and target data
batches with unequal class distributions. We apply our method to several
adaptation benchmark datasets for classification and show improved classifier
accuracy not only for the currently adapted batch, but also when applied on
future stream batches. Furthermore, we show the applicability of our
associative learning modifications to semantic segmentation, where we achieve
competitive results
Video Stream Retrieval of Unseen Queries using Semantic Memory
Retrieval of live, user-broadcast video streams is an under-addressed and
increasingly relevant challenge. The on-line nature of the problem requires
temporal evaluation and the unforeseeable scope of potential queries motivates
an approach which can accommodate arbitrary search queries. To account for the
breadth of possible queries, we adopt a no-example approach to query retrieval,
which uses a query's semantic relatedness to pre-trained concept classifiers.
To adapt to shifting video content, we propose memory pooling and memory
welling methods that favor recent information over long past content. We
identify two stream retrieval tasks, instantaneous retrieval at any particular
time and continuous retrieval over a prolonged duration, and propose means for
evaluating them. Three large scale video datasets are adapted to the challenge
of stream retrieval. We report results for our search methods on the new stream
retrieval tasks, as well as demonstrate their efficacy in a traditional,
non-streaming video task.Comment: Presented at BMVC 2016, British Machine Vision Conference, 201
- …