60,025 research outputs found
Zero-Shot Deep Domain Adaptation
Domain adaptation is an important tool to transfer knowledge about a task
(e.g. classification) learned in a source domain to a second, or target domain.
Current approaches assume that task-relevant target-domain data is available
during training. We demonstrate how to perform domain adaptation when no such
task-relevant target-domain data is available. To tackle this issue, we propose
zero-shot deep domain adaptation (ZDDA), which uses privileged information from
task-irrelevant dual-domain pairs. ZDDA learns a source-domain representation
which is not only tailored for the task of interest but also close to the
target-domain representation. Therefore, the source-domain task of interest
solution (e.g. a classifier for classification tasks) which is jointly trained
with the source-domain representation can be applicable to both the source and
target representations. Using the MNIST, Fashion-MNIST, NIST, EMNIST, and SUN
RGB-D datasets, we show that ZDDA can perform domain adaptation in
classification tasks without access to task-relevant target-domain training
data. We also extend ZDDA to perform sensor fusion in the SUN RGB-D scene
classification task by simulating task-relevant target-domain representations
with task-relevant source-domain data. To the best of our knowledge, ZDDA is
the first domain adaptation and sensor fusion method which requires no
task-relevant target-domain data. The underlying principle is not particular to
computer vision data, but should be extensible to other domains.Comment: This paper is accepted to the European Conference on Computer Vision
(ECCV), 201
FusionVAE: A Deep Hierarchical Variational Autoencoder for RGB Image Fusion
Sensor fusion can significantly improve the performance of many computer
vision tasks. However, traditional fusion approaches are either not data-driven
and cannot exploit prior knowledge nor find regularities in a given dataset or
they are restricted to a single application. We overcome this shortcoming by
presenting a novel deep hierarchical variational autoencoder called FusionVAE
that can serve as a basis for many fusion tasks. Our approach is able to
generate diverse image samples that are conditioned on multiple noisy,
occluded, or only partially visible input images. We derive and optimize a
variational lower bound for the conditional log-likelihood of FusionVAE. In
order to assess the fusion capabilities of our model thoroughly, we created
three novel datasets for image fusion based on popular computer vision
datasets. In our experiments, we show that FusionVAE learns a representation of
aggregated information that is relevant to fusion tasks. The results
demonstrate that our approach outperforms traditional methods significantly.
Furthermore, we present the advantages and disadvantages of different design
choices.Comment: Accepted at ECCV 202
Uncertainty in Ontologies: Dempster-Shafer Theory for Data Fusion Applications
Nowadays ontologies present a growing interest in Data Fusion applications.
As a matter of fact, the ontologies are seen as a semantic tool for describing
and reasoning about sensor data, objects, relations and general domain
theories. In addition, uncertainty is perhaps one of the most important
characteristics of the data and information handled by Data Fusion. However,
the fundamental nature of ontologies implies that ontologies describe only
asserted and veracious facts of the world. Different probabilistic, fuzzy and
evidential approaches already exist to fill this gap; this paper recaps the
most popular tools. However none of the tools meets exactly our purposes.
Therefore, we constructed a Dempster-Shafer ontology that can be imported into
any specific domain ontology and that enables us to instantiate it in an
uncertain manner. We also developed a Java application that enables reasoning
about these uncertain ontological instances.Comment: Workshop on Theory of Belief Functions, Brest: France (2010
Deep HMResNet Model for Human Activity-Aware Robotic Systems
Endowing the robotic systems with cognitive capabilities for recognizing
daily activities of humans is an important challenge, which requires
sophisticated and novel approaches. Most of the proposed approaches explore
pattern recognition techniques which are generally based on hand-crafted
features or learned features. In this paper, a novel Hierarchal Multichannel
Deep Residual Network (HMResNet) model is proposed for robotic systems to
recognize daily human activities in the ambient environments. The introduced
model is comprised of multilevel fusion layers. The proposed Multichannel 1D
Deep Residual Network model is, at the features level, combined with a
Bottleneck MLP neural network to automatically extract robust features
regardless of the hardware configuration and, at the decision level, is fully
connected with an MLP neural network to recognize daily human activities.
Empirical experiments on real-world datasets and an online demonstration are
used for validating the proposed model. Results demonstrated that the proposed
model outperforms the baseline models in daily human activity recognition.Comment: Presented at AI-HRI AAAI-FSS, 2018 (arXiv:1809.06606
Multimodal Classification of Urban Micro-Events
In this paper we seek methods to effectively detect urban micro-events. Urban
micro-events are events which occur in cities, have limited geographical
coverage and typically affect only a small group of citizens. Because of their
scale these are difficult to identify in most data sources. However, by using
citizen sensing to gather data, detecting them becomes feasible. The data
gathered by citizen sensing is often multimodal and, as a consequence, the
information required to detect urban micro-events is distributed over multiple
modalities. This makes it essential to have a classifier capable of combining
them. In this paper we explore several methods of creating such a classifier,
including early, late, hybrid fusion and representation learning using
multimodal graphs. We evaluate performance on a real world dataset obtained
from a live citizen reporting system. We show that a multimodal approach yields
higher performance than unimodal alternatives. Furthermore, we demonstrate that
our hybrid combination of early and late fusion with multimodal embeddings
performs best in classification of urban micro-events
A sparsity-driven approach to multi-camera tracking in visual sensor networks
In this paper, a sparsity-driven approach is presented for multi-camera tracking in visual sensor networks (VSNs). VSNs consist of image sensors, embedded processors and wireless transceivers which are powered by batteries. Since the energy and bandwidth resources are limited, setting up a tracking system in VSNs is a challenging problem. Motivated by the goal of tracking in a bandwidth-constrained environment, we present a sparsity-driven method to compress the features extracted by the camera nodes, which are then transmitted across the network for distributed inference. We have designed special overcomplete dictionaries that match the structure of the features, leading to very parsimonious yet accurate representations. We have tested our method in indoor and outdoor people tracking scenarios. Our experimental results demonstrate how our approach leads to communication savings without significant loss in tracking performance
- …