8,406 research outputs found
AutoDispNet: Improving Disparity Estimation With AutoML
Much research work in computer vision is being spent on optimizing existing
network architectures to obtain a few more percentage points on benchmarks.
Recent AutoML approaches promise to relieve us from this effort. However, they
are mainly designed for comparatively small-scale classification tasks. In this
work, we show how to use and extend existing AutoML techniques to efficiently
optimize large-scale U-Net-like encoder-decoder architectures. In particular,
we leverage gradient-based neural architecture search and Bayesian optimization
for hyperparameter search. The resulting optimization does not require a
large-scale compute cluster. We show results on disparity estimation that
clearly outperform the manually optimized baseline and reach state-of-the-art
performance.Comment: In Proceedings of the 2019 IEEE International Conference on Computer
Vision (ICCV
Unsupervised Feature Learning for Environmental Sound Classification Using Weighted Cycle-Consistent Generative Adversarial Network
In this paper we propose a novel environmental sound classification approach
incorporating unsupervised feature learning from codebook via spherical
-Means++ algorithm and a new architecture for high-level data augmentation.
The audio signal is transformed into a 2D representation using a discrete
wavelet transform (DWT). The DWT spectrograms are then augmented by a novel
architecture for cycle-consistent generative adversarial network. This
high-level augmentation bootstraps generated spectrograms in both intra and
inter class manners by translating structural features from sample to sample. A
codebook is built by coding the DWT spectrograms with the speeded-up robust
feature detector (SURF) and the K-Means++ algorithm. The Random Forest is our
final learning algorithm which learns the environmental sound classification
task from the clustered codewords in the codebook. Experimental results in four
benchmarking environmental sound datasets (ESC-10, ESC-50, UrbanSound8k, and
DCASE-2017) have shown that the proposed classification approach outperforms
the state-of-the-art classifiers in the scope, including advanced and dense
convolutional neural networks such as AlexNet and GoogLeNet, improving the
classification rate between 3.51% and 14.34%, depending on the dataset.Comment: Paper Accepted for Publication in Elsevier Applied Soft Computin
The Hybrid Bootstrap: A Drop-in Replacement for Dropout
Regularization is an important component of predictive model building. The
hybrid bootstrap is a regularization technique that functions similarly to
dropout except that features are resampled from other training points rather
than replaced with zeros. We show that the hybrid bootstrap offers superior
performance to dropout. We also present a sampling based technique to simplify
hyperparameter choice. Next, we provide an alternative sampling technique for
convolutional neural networks. Finally, we demonstrate the efficacy of the
hybrid bootstrap on non-image tasks using tree-based models
Genetic Architect: Discovering Genomic Structure with Learned Neural Architectures
Each human genome is a 3 billion base pair set of encoding instructions.
Decoding the genome using deep learning fundamentally differs from most tasks,
as we do not know the full structure of the data and therefore cannot design
architectures to suit it. As such, architectures that fit the structure of
genomics should be learned not prescribed. Here, we develop a novel search
algorithm, applicable across domains, that discovers an optimal architecture
which simultaneously learns general genomic patterns and identifies the most
important sequence motifs in predicting functional genomic outcomes. The
architectures we find using this algorithm succeed at using only RNA expression
data to predict gene regulatory structure, learn human-interpretable
visualizations of key sequence motifs, and surpass state-of-the-art results on
benchmark genomics challenges.Comment: 10 pages, 4 figure
Connectivity Learning in Multi-Branch Networks
While much of the work in the design of convolutional networks over the last
five years has revolved around the empirical investigation of the importance of
depth, filter sizes, and number of feature channels, recent studies have shown
that branching, i.e., splitting the computation along parallel but distinct
threads and then aggregating their outputs, represents a new promising
dimension for significant improvements in performance. To combat the complexity
of design choices in multi-branch architectures, prior work has adopted simple
strategies, such as a fixed branching factor, the same input being fed to all
parallel branches, and an additive combination of the outputs produced by all
branches at aggregation points.
In this work we remove these predefined choices and propose an algorithm to
learn the connections between branches in the network. Instead of being chosen
a priori by the human designer, the multi-branch connectivity is learned
simultaneously with the weights of the network by optimizing a single loss
function defined with respect to the end task. We demonstrate our approach on
the problem of multi-class image classification using three different datasets
where it yields consistently higher accuracy compared to the state-of-the-art
"ResNeXt" multi-branch network given the same learning capacity
Vector Field Neural Networks
This work begins by establishing a mathematical formalization between
different geometrical interpretations of Neural Networks, providing a first
contribution. From this starting point, a new interpretation is explored, using
the idea of implicit vector fields moving data as particles in a flow. A new
architecture, Vector Fields Neural Networks(VFNN), is proposed based on this
interpretation, with the vector field becoming explicit. A specific
implementation of the VFNN using Euler's method to solve ordinary differential
equations (ODEs) and gaussian vector fields is tested. The first experiments
present visual results remarking the important features of the new architecture
and providing another contribution with the geometrically interpretable
regularization of model parameters. Then, the new architecture is evaluated for
different hyperparameters and inputs, with the objective of evaluating the
influence on model performance, computational time, and complexity. The VFNN
model is compared against the known basic models Naive Bayes, Feed Forward
Neural Networks, and Support Vector Machines(SVM), showing comparable, or
better, results for different datasets. Finally, the conclusion provides many
new questions and ideas for improvement of the model that can be used to
increase model performance.Comment: 121 pages, 141 figures. Masters Dissertation presented at
Universidade Federal do Rio de Janeiro, Brazil. TL DR: Construction and
motivation of Vector Field Neural Networks and evidence of learning in simple
situation
Network of Experts for Large-Scale Image Categorization
We present a tree-structured network architecture for large scale image
classification. The trunk of the network contains convolutional layers
optimized over all classes. At a given depth, the trunk splits into separate
branches, each dedicated to discriminate a different subset of classes. Each
branch acts as an expert classifying a set of categories that are difficult to
tell apart, while the trunk provides common knowledge to all experts in the
form of shared features. The training of our "network of experts" is completely
end-to-end: the partition of categories into disjoint subsets is learned
simultaneously with the parameters of the network trunk and the experts are
trained jointly by minimizing a single learning objective over all classes. The
proposed structure can be built from any existing convolutional neural network
(CNN). We demonstrate its generality by adapting 4 popular CNNs for image
categorization into the form of networks of experts. Our experiments on
CIFAR100 and ImageNet show that in every case our method yields a substantial
improvement in accuracy over the base CNN, and gives the best result achieved
so far on CIFAR100. Finally, the improvement in accuracy comes at little
additional cost: compared to the base network, the training time is only
moderately increased and the number of parameters is comparable or in some
cases even lower.Comment: ECCV 201
Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results
The recently proposed Temporal Ensembling has achieved state-of-the-art
results in several semi-supervised learning benchmarks. It maintains an
exponential moving average of label predictions on each training example, and
penalizes predictions that are inconsistent with this target. However, because
the targets change only once per epoch, Temporal Ensembling becomes unwieldy
when learning large datasets. To overcome this problem, we propose Mean
Teacher, a method that averages model weights instead of label predictions. As
an additional benefit, Mean Teacher improves test accuracy and enables training
with fewer labels than Temporal Ensembling. Without changing the network
architecture, Mean Teacher achieves an error rate of 4.35% on SVHN with 250
labels, outperforming Temporal Ensembling trained with 1000 labels. We also
show that a good network architecture is crucial to performance. Combining Mean
Teacher and Residual Networks, we improve the state of the art on CIFAR-10 with
4000 labels from 10.55% to 6.28%, and on ImageNet 2012 with 10% of the labels
from 35.24% to 9.11%.Comment: In this version: Corrected hyperparameters of the 4000-label CIFAR-10
ResNet experiment. Changed Antti's contact info, Advances in Neural
Information Processing Systems 30 (NIPS 2017) pre-proceeding
Large Margin Deep Networks for Classification
We present a formulation of deep learning that aims at producing a large
margin classifier. The notion of margin, minimum distance to a decision
boundary, has served as the foundation of several theoretically profound and
empirically successful results for both classification and regression tasks.
However, most large margin algorithms are applicable only to shallow models
with a preset feature representation; and conventional margin methods for
neural networks only enforce margin at the output layer. Such methods are
therefore not well suited for deep networks.
In this work, we propose a novel loss function to impose a margin on any
chosen set of layers of a deep network (including input and hidden layers). Our
formulation allows choosing any norm on the metric measuring the margin. We
demonstrate that the decision boundary obtained by our loss has nice properties
compared to standard classification loss functions. Specifically, we show
improved empirical results on the MNIST, CIFAR-10 and ImageNet datasets on
multiple tasks: generalization from small training sets, corrupted labels, and
robustness against adversarial perturbations. The resulting loss is general and
complementary to existing data augmentation (such as random/adversarial input
transform) and regularization techniques (such as weight decay, dropout, and
batch norm)
Deep Ensembles on a Fixed Memory Budget: One Wide Network or Several Thinner Ones?
One of the generally accepted views of modern deep learning is that
increasing the number of parameters usually leads to better quality. The two
easiest ways to increase the number of parameters is to increase the size of
the network, e.g. width, or to train a deep ensemble; both approaches improve
the performance in practice. In this work, we consider a fixed memory budget
setting, and investigate, what is more effective: to train a single wide
network, or to perform a memory split -- to train an ensemble of several
thinner networks, with the same total number of parameters? We find that, for
large enough budgets, the number of networks in the ensemble, corresponding to
the optimal memory split, is usually larger than one. Interestingly, this
effect holds for the commonly used sizes of the standard architectures. For
example, one WideResNet-28-10 achieves significantly worse test accuracy on
CIFAR-100 than an ensemble of sixteen thinner WideResNets: 80.6% and 82.52%
correspondingly. We call the described effect the Memory Split Advantage and
show that it holds for a variety of datasets and model architectures.Comment: Under review by the International Conference on Machine Learning
(ICML 2020
- …