4,644 research outputs found
Evolving Deep Neural Networks by Multi-objective Particle Swarm Optimization for Image Classification
In recent years, convolutional neural networks (CNNs) have become deeper in
order to achieve better classification accuracy in image classification.
However, it is difficult to deploy the state-of-the-art deep CNNs for
industrial use due to the difficulty of manually fine-tuning the
hyperparameters and the trade-off between classification accuracy and
computational cost. This paper proposes a novel multi-objective optimization
method for evolving state-of-the-art deep CNNs in real-life applications, which
automatically evolves the non-dominant solutions at the Pareto front. Three
major contributions are made: Firstly, a new encoding strategy is designed to
encode one of the best state-of-the-art CNNs; With the classification accuracy
and the number of floating point operations as the two objectives, a
multi-objective particle swarm optimization method is developed to evolve the
non-dominant solutions; Last but not least, a new infrastructure is designed to
boost the experiments by concurrently running the experiments on multiple GPUs
across multiple machines, and a Python library is developed and released to
manage the infrastructure. The experimental results demonstrate that the
non-dominant solutions found by the proposed algorithm form a clear Pareto
front, and the proposed infrastructure is able to almost linearly reduce the
running time.Comment: conditionally accepted by gecco201
Nonlinear Acceleration of CNNs
The Regularized Nonlinear Acceleration (RNA) algorithm is an acceleration
method capable of improving the rate of convergence of many optimization
schemes such as gradient descend, SAGA or SVRG. Until now, its analysis is
limited to convex problems, but empirical observations shows that RNA may be
extended to wider settings. In this paper, we investigate further the benefits
of RNA when applied to neural networks, in particular for the task of image
recognition on CIFAR10 and ImageNet. With very few modifications of exiting
frameworks, RNA improves slightly the optimization process of CNNs, after
training
CNNs are Globally Optimal Given Multi-Layer Support
Stochastic Gradient Descent (SGD) is the central workhorse for training
modern CNNs. Although giving impressive empirical performance it can be slow to
converge. In this paper we explore a novel strategy for training a CNN using an
alternation strategy that offers substantial speedups during training. We make
the following contributions: (i) replace the ReLU non-linearity within a CNN
with positive hard-thresholding, (ii) reinterpret this non-linearity as a
binary state vector making the entire CNN linear if the multi-layer support is
known, and (iii) demonstrate that under certain conditions a global optima to
the CNN can be found through local descent. We then employ a novel alternation
strategy (between weights and support) for CNN training that leads to
substantially faster convergence rates, nice theoretical properties, and
achieving state of the art results across large scale datasets (e.g. ImageNet)
as well as other standard benchmarks
Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes
There is a previously identified equivalence between wide fully connected
neural networks (FCNs) and Gaussian processes (GPs). This equivalence enables,
for instance, test set predictions that would have resulted from a fully
Bayesian, infinitely wide trained FCN to be computed without ever instantiating
the FCN, but by instead evaluating the corresponding GP. In this work, we
derive an analogous equivalence for multi-layer convolutional neural networks
(CNNs) both with and without pooling layers, and achieve state of the art
results on CIFAR10 for GPs without trainable kernels. We also introduce a Monte
Carlo method to estimate the GP corresponding to a given neural network
architecture, even in cases where the analytic form has too many terms to be
computationally feasible.
Surprisingly, in the absence of pooling layers, the GPs corresponding to CNNs
with and without weight sharing are identical. As a consequence, translation
equivariance, beneficial in finite channel CNNs trained with stochastic
gradient descent (SGD), is guaranteed to play no role in the Bayesian treatment
of the infinite channel limit - a qualitative difference between the two
regimes that is not present in the FCN case. We confirm experimentally, that
while in some scenarios the performance of SGD-trained finite CNNs approaches
that of the corresponding GPs as the channel count increases, with careful
tuning SGD-trained CNNs can significantly outperform their corresponding GPs,
suggesting advantages from SGD training compared to fully Bayesian parameter
estimation.Comment: Published as a conference paper at ICLR 201
Good Initializations of Variational Bayes for Deep Models
Stochastic variational inference is an established way to carry out
approximate Bayesian inference for deep models. While there have been effective
proposals for good initializations for loss minimization in deep learning, far
less attention has been devoted to the issue of initialization of stochastic
variational inference. We address this by proposing a novel layer-wise
initialization strategy based on Bayesian linear models. The proposed method is
extensively validated on regression and classification tasks, including
Bayesian DeepNets and ConvNets, showing faster and better convergence compared
to alternatives inspired by the literature on initializations for loss
minimization.Comment: 8 pages of main paper (+3 for references and +6 of supplement
material
3D G-CNNs for Pulmonary Nodule Detection
Convolutional Neural Networks (CNNs) require a large amount of annotated data
to learn from, which is often difficult to obtain in the medical domain. In
this paper we show that the sample complexity of CNNs can be significantly
improved by using 3D roto-translation group convolutions (G-Convs) instead of
the more conventional translational convolutions. These 3D G-CNNs were applied
to the problem of false positive reduction for pulmonary nodule detection, and
proved to be substantially more effective in terms of performance, sensitivity
to malignant nodules, and speed of convergence compared to a strong and
comparable baseline architecture with regular convolutions, data augmentation
and a similar number of parameters. For every dataset size tested, the G-CNN
achieved a FROC score close to the CNN trained on ten times more data
A Two-Streamed Network for Estimating Fine-Scaled Depth Maps from Single RGB Images
Estimating depth from a single RGB image is an ill-posed and inherently
ambiguous problem. State-of-the-art deep learning methods can now estimate
accurate 2D depth maps, but when the maps are projected into 3D, they lack
local detail and are often highly distorted. We propose a fast-to-train
two-streamed CNN that predicts depth and depth gradients, which are then fused
together into an accurate and detailed depth map. We also define a novel set
loss over multiple images; by regularizing the estimation between a common set
of images, the network is less prone to over-fitting and achieves better
accuracy than competing methods. Experiments on the NYU Depth v2 dataset shows
that our depth predictions are competitive with state-of-the-art and lead to
faithful 3D projections
A Survey of the Recent Architectures of Deep Convolutional Neural Networks
Deep Convolutional Neural Network (CNN) is a special type of Neural Networks,
which has shown exemplary performance on several competitions related to
Computer Vision and Image Processing. Some of the exciting application areas of
CNN include Image Classification and Segmentation, Object Detection, Video
Processing, Natural Language Processing, and Speech Recognition. The powerful
learning ability of deep CNN is primarily due to the use of multiple feature
extraction stages that can automatically learn representations from the data.
The availability of a large amount of data and improvement in the hardware
technology has accelerated the research in CNNs, and recently interesting deep
CNN architectures have been reported. Several inspiring ideas to bring
advancements in CNNs have been explored, such as the use of different
activation and loss functions, parameter optimization, regularization, and
architectural innovations. However, the significant improvement in the
representational capacity of the deep CNN is achieved through architectural
innovations. Notably, the ideas of exploiting spatial and channel information,
depth and width of architecture, and multi-path information processing have
gained substantial attention. Similarly, the idea of using a block of layers as
a structural unit is also gaining popularity. This survey thus focuses on the
intrinsic taxonomy present in the recently reported deep CNN architectures and,
consequently, classifies the recent innovations in CNN architectures into seven
different categories. These seven categories are based on spatial exploitation,
depth, multi-path, width, feature-map exploitation, channel boosting, and
attention. Additionally, the elementary understanding of CNN components,
current challenges, and applications of CNN are also provided.Comment: Number of Pages: 70, Number of Figures: 11, Number of Tables: 11.
Artif Intell Rev (2020
Poseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines
Deep learning (DL) has achieved notable successes in many machine learning
tasks. A number of frameworks have been developed to expedite the process of
designing and training deep neural networks (DNNs), such as Caffe, Torch and
Theano. Currently they can harness multiple GPUs on a single machine, but are
unable to use GPUs that are distributed across multiple machines; as even
average-sized DNNs can take days to train on a single GPU with 100s of GBs to
TBs of data, distributed GPUs present a prime opportunity for scaling up DL.
However, the limited bandwidth available on commodity Ethernet networks
presents a bottleneck to distributed GPU training, and prevents its trivial
realization.
To investigate how to adapt existing frameworks to efficiently support
distributed GPUs, we propose Poseidon, a scalable system architecture for
distributed inter-machine communication in existing DL frameworks. We integrate
Poseidon with Caffe and evaluate its performance at training DNNs for object
recognition. Poseidon features three key contributions that accelerate DNN
training on clusters: (1) a three-level hybrid architecture that allows
Poseidon to support both CPU-only and GPU-equipped clusters, (2) a distributed
wait-free backpropagation (DWBP) algorithm to improve GPU utilization and to
balance communication, and (3) a structure-aware communication protocol (SACP)
to minimize communication overheads. We empirically show that Poseidon
converges to same objectives as a single machine, and achieves state-of-art
training speedup across multiple models and well-established datasets using a
commodity GPU cluster of 8 nodes (e.g. 4.5x speedup on AlexNet, 4x on
GoogLeNet, 4x on CIFAR-10). On the much larger ImageNet22K dataset, Poseidon
with 8 nodes achieves better speedup and competitive accuracy to recent
CPU-based distributed systems such as Adam and Le et al., which use 10s to
1000s of nodes.Comment: 14 pages, 8 figures, 6 table
Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning?
Training a deep convolutional neural network (CNN) from scratch is difficult
because it requires a large amount of labeled training data and a great deal of
expertise to ensure proper convergence. A promising alternative is to fine-tune
a CNN that has been pre-trained using, for instance, a large set of labeled
natural images. However, the substantial differences between natural and
medical images may advise against such knowledge transfer. In this paper, we
seek to answer the following central question in the context of medical image
analysis: \emph{Can the use of pre-trained deep CNNs with sufficient
fine-tuning eliminate the need for training a deep CNN from scratch?} To
address this question, we considered 4 distinct medical imaging applications in
3 specialties (radiology, cardiology, and gastroenterology) involving
classification, detection, and segmentation from 3 different imaging
modalities, and investigated how the performance of deep CNNs trained from
scratch compared with the pre-trained CNNs fine-tuned in a layer-wise manner.
Our experiments consistently demonstrated that (1) the use of a pre-trained CNN
with adequate fine-tuning outperformed or, in the worst case, performed as well
as a CNN trained from scratch; (2) fine-tuned CNNs were more robust to the size
of training sets than CNNs trained from scratch; (3) neither shallow tuning nor
deep tuning was the optimal choice for a particular application; and (4) our
layer-wise fine-tuning scheme could offer a practical way to reach the best
performance for the application at hand based on the amount of available data
- …