1,644 research outputs found
Representation Learning on Large and Small Data
Deep learning owes its success to three key factors: scale of data, enhanced
models to learn representations from data, and scale of computation. This book
chapter presented the importance of the data-driven approach to learn good
representations from both big data and small data. In terms of big data, it has
been widely accepted in the research community that the more data the better
for both representation and classification improvement. The question is then
how to learn representations from big data, and how to perform representation
learning when data is scarce. We addressed the first question by presenting CNN
model enhancements in the aspects of representation, optimization, and
generalization. To address the small data challenge, we showed transfer
representation learning to be effective. Transfer representation learning
transfers the learned representation from a source domain where abundant
training data is available to a target domain where training data is scarce.
Transfer representation learning gave the OM and melanoma diagnosis modules of
our XPRIZE Tricorder device (which finished out of competing
teams) a significant boost in diagnosis accuracy.Comment: Book chapte
Learning Compositional Representations for Few-Shot Recognition
One of the key limitations of modern deep learning approaches lies in the
amount of data required to train them. Humans, by contrast, can learn to
recognize novel categories from just a few examples. Instrumental to this rapid
learning ability is the compositional structure of concept representations in
the human brain --- something that deep learning models are lacking. In this
work, we make a step towards bridging this gap between human and machine
learning by introducing a simple regularization technique that allows the
learned representation to be decomposable into parts. Our method uses
category-level attribute annotations to disentangle the feature space of a
network into subspaces corresponding to the attributes. These attributes can be
either purely visual, like object parts, or more abstract, like openness and
symmetry. We demonstrate the value of compositional representations on three
datasets: CUB-200-2011, SUN397, and ImageNet, and show that they require fewer
examples to learn classifiers for novel categories
Automatic detection of passable roads after floods in remote sensed and social media data
This paper addresses the problem of floods classification and floods
aftermath detection utilizing both social media and satellite imagery.
Automatic detection of disasters such as floods is still a very challenging
task. The focus lies on identifying passable routes or roads during floods. Two
novel solutions are presented, which were developed for two corresponding tasks
at the MediaEval 2018 benchmarking challenge. The tasks are (i) identification
of images providing evidence for road passability and (ii) differentiation and
detection of passable and non-passable roads in images from two complementary
sources of information. For the first challenge, we mainly rely on object and
scene-level features extracted through multiple deep models pre-trained on the
ImageNet and Places datasets. The object and scene-level features are then
combined using early, late and double fusion techniques. To identify whether or
not it is possible for a vehicle to pass a road in satellite images, we rely on
Convolutional Neural Networks and a transfer learning-based classification
approach. The evaluation of the proposed methods are carried out on the
large-scale datasets provided for the benchmark competition. The results
demonstrate significant improvement in the performance over the recent
state-of-art approaches
Multi-Representation Knowledge Distillation For Audio Classification
As an important component of multimedia analysis tasks, audio classification
aims to discriminate between different audio signal types and has received
intensive attention due to its wide applications. Generally speaking, the raw
signal can be transformed into various representations (such as Short Time
Fourier Transform and Mel Frequency Cepstral Coefficients), and information
implied in different representations can be complementary. Ensembling the
models trained on different representations can greatly boost the
classification performance, however, making inference using a large number of
models is cumbersome and computationally expensive. In this paper, we propose a
novel end-to-end collaborative learning framework for the audio classification
task. The framework takes multiple representations as the input to train the
models in parallel. The complementary information provided by different
representations is shared by knowledge distillation. Consequently, the
performance of each model can be significantly promoted without increasing the
computational overhead in the inference stage. Extensive experimental results
demonstrate that the proposed approach can improve the classification
performance and achieve state-of-the-art results on both acoustic scene
classification tasks and general audio tagging tasks
ACGAN-based Data Augmentation Integrated with Long-term Scalogram for Acoustic Scene Classification
In acoustic scene classification (ASC), acoustic features play a crucial role
in the extraction of scene information, which can be stored over different time
scales. Moreover, the limited size of the dataset may lead to a biased model
with a poor performance for records from unseen cities and confusing scene
classes. In order to overcome this, we propose a long-term wavelet feature that
requires a lower storage capacity and can be classified faster and more
accurately compared with classic Mel filter bank coefficients (FBank). This
feature can be extracted with predefined wavelet scales similar to the FBank.
Furthermore, a novel data augmentation scheme based on generative adversarial
neural networks with auxiliary classifiers (ACGANs) is adopted to improve the
generalization of the ASC systems. The scheme, which contains ACGANs and a
sample filter, extends the database iteratively by splitting the dataset,
training the ACGANs and subsequently filtering samples. Experiments were
conducted on datasets from the Detection and Classification of Acoustic Scenes
and Events (DCASE) challenges. The results on the DCASE19 dataset demonstrate
the improved performance of the proposed techniques compared with the classic
FBank classifier. Moreover, the proposed fusion system achieved first place in
the DCASE19 competition and surpassed the top accuracies on the DCASE17
dataset
A Review on Deep Learning Techniques Applied to Semantic Segmentation
Image semantic segmentation is more and more being of interest for computer
vision and machine learning researchers. Many applications on the rise need
accurate and efficient segmentation mechanisms: autonomous driving, indoor
navigation, and even virtual or augmented reality systems to name a few. This
demand coincides with the rise of deep learning approaches in almost every
field or application target related to computer vision, including semantic
segmentation or scene understanding. This paper provides a review on deep
learning methods for semantic segmentation applied to various application
areas. Firstly, we describe the terminology of this field as well as mandatory
background concepts. Next, the main datasets and challenges are exposed to help
researchers decide which are the ones that best suit their needs and their
targets. Then, existing methods are reviewed, highlighting their contributions
and their significance in the field. Finally, quantitative results are given
for the described methods and the datasets in which they were evaluated,
following up with a discussion of the results. At last, we point out a set of
promising future works and draw our own conclusions about the state of the art
of semantic segmentation using deep learning techniques.Comment: Submitted to TPAMI on Apr. 22, 201
A Survey on Deep Learning Methods for Robot Vision
Deep learning has allowed a paradigm shift in pattern recognition, from using
hand-crafted features together with statistical classifiers to using
general-purpose learning procedures for learning data-driven representations,
features, and classifiers together. The application of this new paradigm has
been particularly successful in computer vision, in which the development of
deep learning methods for vision applications has become a hot research topic.
Given that deep learning has already attracted the attention of the robot
vision community, the main purpose of this survey is to address the use of deep
learning in robot vision. To achieve this, a comprehensive overview of deep
learning and its usage in computer vision is given, that includes a description
of the most frequently used neural models and their main application areas.
Then, the standard methodology and tools used for designing deep-learning based
vision systems are presented. Afterwards, a review of the principal work using
deep learning in robot vision is presented, as well as current and future
trends related to the use of deep learning in robotics. This survey is intended
to be a guide for the developers of robot vision systems
Monocular Depth Estimation using Multi-Scale Continuous CRFs as Sequential Deep Networks
Depth cues have been proved very useful in various computer vision and
robotic tasks. This paper addresses the problem of monocular depth estimation
from a single still image. Inspired by the effectiveness of recent works on
multi-scale convolutional neural networks (CNN), we propose a deep model which
fuses complementary information derived from multiple CNN side outputs.
Different from previous methods using concatenation or weighted average
schemes, the integration is obtained by means of continuous Conditional Random
Fields (CRFs). In particular, we propose two different variations, one based on
a cascade of multiple CRFs, the other on a unified graphical model. By
designing a novel CNN implementation of mean-field updates for continuous CRFs,
we show that both proposed models can be regarded as sequential deep networks
and that training can be performed end-to-end. Through an extensive
experimental evaluation, we demonstrate the effectiveness of the proposed
approach and establish new state of the art results for the monocular depth
estimation task on three publicly available datasets, i.e. NYUD-V2, Make3D and
KITTI.Comment: arXiv admin note: substantial text overlap with arXiv:1704.0215
Segmentation of Skin Lesions and their Attributes Using Multi-Scale Convolutional Neural Networks and Domain Specific Augmentations
Computer-aided diagnosis systems for classification of different type of skin
lesions have been an active field of research in recent decades. It has been
shown that introducing lesions and their attributes masks into lesion
classification pipeline can greatly improve the performance. In this paper, we
propose a framework by incorporating transfer learning for segmenting lesions
and their attributes based on the convolutional neural networks. The proposed
framework is based on the encoder-decoder architecture which utilizes a variety
of pre-trained networks in the encoding path and generates the prediction map
by combining multi-scale information in decoding path using a pyramid pooling
manner. To address the lack of training data and increase the proposed model
generalization, an extensive set of novel domain-specific augmentation routines
have been applied to simulate the real variations in dermoscopy images.
Finally, by performing broad experiments on three different data sets obtained
from International Skin Imaging Collaboration archive (ISIC2016, ISIC2017, and
ISIC2018 challenges data sets), we show that the proposed method outperforms
other state-of-the-art approaches for ISIC2016 and ISIC2017 segmentation task
and achieved the first rank on the leader-board of ISIC2018 attribute detection
task.Comment: 18 page
Measuring the tendency of CNNs to Learn Surface Statistical Regularities
Deep CNNs are known to exhibit the following peculiarity: on the one hand
they generalize extremely well to a test set, while on the other hand they are
extremely sensitive to so-called adversarial perturbations. The extreme
sensitivity of high performance CNNs to adversarial examples casts serious
doubt that these networks are learning high level abstractions in the dataset.
We are concerned with the following question: How can a deep CNN that does not
learn any high level semantics of the dataset manage to generalize so well? The
goal of this article is to measure the tendency of CNNs to learn surface
statistical regularities of the dataset. To this end, we use Fourier filtering
to construct datasets which share the exact same high level abstractions but
exhibit qualitatively different surface statistical regularities. For the SVHN
and CIFAR-10 datasets, we present two Fourier filtered variants: a low
frequency variant and a randomly filtered variant. Each of the Fourier
filtering schemes is tuned to preserve the recognizability of the objects. Our
main finding is that CNNs exhibit a tendency to latch onto the Fourier image
statistics of the training dataset, sometimes exhibiting up to a 28%
generalization gap across the various test sets. Moreover, we observe that
significantly increasing the depth of a network has a very marginal impact on
closing the aforementioned generalization gap. Thus we provide quantitative
evidence supporting the hypothesis that deep CNNs tend to learn surface
statistical regularities in the dataset rather than higher-level abstract
concepts.Comment: Submitte
- …