27,619 research outputs found
Accelerating Neural Architecture Search using Performance Prediction
Methods for neural network hyperparameter optimization and meta-modeling are
computationally expensive due to the need to train a large number of model
configurations. In this paper, we show that standard frequentist regression
models can predict the final performance of partially trained model
configurations using features based on network architectures, hyperparameters,
and time-series validation performance data. We empirically show that our
performance prediction models are much more effective than prominent Bayesian
counterparts, are simpler to implement, and are faster to train. Our models can
predict final performance in both visual classification and language modeling
domains, are effective for predicting performance of drastically varying model
architectures, and can even generalize between model classes. Using these
prediction models, we also propose an early stopping method for hyperparameter
optimization and meta-modeling, which obtains a speedup of a factor up to 6x in
both hyperparameter optimization and meta-modeling. Finally, we empirically
show that our early stopping method can be seamlessly incorporated into both
reinforcement learning-based architecture selection algorithms and bandit based
search methods. Through extensive experimentation, we empirically show our
performance prediction models and early stopping algorithm are state-of-the-art
in terms of prediction accuracy and speedup achieved while still identifying
the optimal model configurations.Comment: Submitted to International Conference on Learning Representations,
(2018
High-dimensional dynamics of generalization error in neural networks
We perform an average case analysis of the generalization dynamics of large
neural networks trained using gradient descent. We study the
practically-relevant "high-dimensional" regime where the number of free
parameters in the network is on the order of or even larger than the number of
examples in the dataset. Using random matrix theory and exact solutions in
linear models, we derive the generalization error and training error dynamics
of learning and analyze how they depend on the dimensionality of data and
signal to noise ratio of the learning problem. We find that the dynamics of
gradient descent learning naturally protect against overtraining and
overfitting in large networks. Overtraining is worst at intermediate network
sizes, when the effective number of free parameters equals the number of
samples, and thus can be reduced by making a network smaller or larger.
Additionally, in the high-dimensional regime, low generalization error requires
starting with small initial weights. We then turn to non-linear neural
networks, and show that making networks very large does not harm their
generalization performance. On the contrary, it can in fact reduce
overtraining, even without early stopping or regularization of any sort. We
identify two novel phenomena underlying this behavior in overcomplete models:
first, there is a frozen subspace of the weights in which no learning occurs
under gradient descent; and second, the statistical properties of the
high-dimensional regime yield better-conditioned input correlations which
protect against overtraining. We demonstrate that naive application of
worst-case theories such as Rademacher complexity are inaccurate in predicting
the generalization performance of deep neural networks, and derive an
alternative bound which incorporates the frozen subspace and conditioning
effects and qualitatively matches the behavior observed in simulation
NeuNetS: An Automated Synthesis Engine for Neural Network Design
Application of neural networks to a vast variety of practical applications is
transforming the way AI is applied in practice. Pre-trained neural network
models available through APIs or capability to custom train pre-built neural
network architectures with customer data has made the consumption of AI by
developers much simpler and resulted in broad adoption of these complex AI
models. While prebuilt network models exist for certain scenarios, to try and
meet the constraints that are unique to each application, AI teams need to
think about developing custom neural network architectures that can meet the
tradeoff between accuracy and memory footprint to achieve the tight constraints
of their unique use-cases. However, only a small proportion of data science
teams have the skills and experience needed to create a neural network from
scratch, and the demand far exceeds the supply. In this paper, we present
NeuNetS : An automated Neural Network Synthesis engine for custom neural
network design that is available as part of IBM's AI OpenScale's product.
NeuNetS is available for both Text and Image domains and can build neural
networks for specific tasks in a fraction of the time it takes today with human
effort, and with accuracy similar to that of human-designed AI models.Comment: 14 pages, 12 figures. arXiv admin note: text overlap with
arXiv:1806.0025
Neural Network Multitask Learning for Traffic Flow Forecasting
Traditional neural network approaches for traffic flow forecasting are
usually single task learning (STL) models, which do not take advantage of the
information provided by related tasks. In contrast to STL, multitask learning
(MTL) has the potential to improve generalization by transferring information
in training signals of extra tasks. In this paper, MTL based neural networks
are used for traffic flow forecasting. For neural network MTL, a
backpropagation (BP) network is constructed by incorporating traffic flows at
several contiguous time instants into an output layer. Nodes in the output
layer can be seen as outputs of different but closely related STL tasks.
Comprehensive experiments on urban vehicular traffic flow data and comparisons
with STL show that MTL in BP neural networks is a promising and effective
approach for traffic flow forecasting
Fast Hyperparameter Optimization of Deep Neural Networks via Ensembling Multiple Surrogates
The performance of deep neural networks crucially depends on good
hyperparameter configurations. Bayesian optimization is a powerful framework
for optimizing the hyperparameters of DNNs. These methods need sufficient
evaluation data to approximate and minimize the validation error function of
hyperparameters. However, the expensive evaluation cost of DNNs leads to very
few evaluation data within a limited time, which greatly reduces the efficiency
of Bayesian optimization. Besides, the previous researches focus on using the
complete evaluation data to conduct Bayesian optimization, and ignore the
intermediate evaluation data generated by early stopping methods. To alleviate
the insufficient evaluation data problem, we propose a fast hyperparameter
optimization method, HOIST, that utilizes both the complete and intermediate
evaluation data to accelerate the hyperparameter optimization of DNNs.
Specifically, we train multiple basic surrogates to gather information from the
mixed evaluation data, and then combine all basic surrogates using weighted
bagging to provide an accurate ensemble surrogate. Our empirical studies show
that HOIST outperforms the state-of-the-art approaches on a wide range of DNNs,
including feed forward neural networks, convolutional neural networks,
recurrent neural networks, and variational autoencoder
3D Deep Learning for Biological Function Prediction from Physical Fields
Predicting the biological function of molecules, be it proteins or drug-like
compounds, from their atomic structure is an important and long-standing
problem. Function is dictated by structure, since it is by spatial interactions
that molecules interact with each other, both in terms of steric
complementarity, as well as intermolecular forces. Thus, the electron density
field and electrostatic potential field of a molecule contain the "raw
fingerprint" of how this molecule can fit to binding partners. In this paper,
we show that deep learning can predict biological function of molecules
directly from their raw 3D approximated electron density and electrostatic
potential fields. Protein function based on EC numbers is predicted from the
approximated electron density field. In another experiment, the activity of
small molecules is predicted with quality comparable to state-of-the-art
descriptor-based methods. We propose several alternative computational models
for the GPU with different memory and runtime requirements for different sizes
of molecules and of databases. We also propose application-specific
multi-channel data representations. With future improvements of training
datasets and neural network settings in combination with complementary
information sources (sequence, genomic context, expression level), deep
learning can be expected to show its generalization power and revolutionize the
field of molecular function prediction
Checkpoint Ensembles: Ensemble Methods from a Single Training Process
We present the checkpoint ensembles method that can learn ensemble models on
a single training process. Although checkpoint ensembles can be applied to any
parametric iterative learning technique, here we focus on neural networks.
Neural networks' composable and simple neurons make it possible to capture many
individual and interaction effects among features. However, small sample sizes
and sampling noise may result in patterns in the training data that are not
representative of the true relationship between the features and the outcome.
As a solution, regularization during training is often used (e.g. dropout).
However, regularization is no panacea -- it does not perfectly address
overfitting. Even with methods like dropout, two methodologies are commonly
used in practice. First is to utilize a validation set independent to the
training set as a way to decide when to stop training. Second is to use
ensemble methods to further reduce overfitting and take advantage of local
optima (i.e. averaging over the predictions of several models). In this paper,
we explore checkpoint ensembles -- a simple technique that combines these two
ideas in one training process. Checkpoint ensembles improve performance by
averaging the predictions from "checkpoints" of the best models within single
training process. We use three real-world data sets -- text, image, and
electronic health record data -- using three prediction models: a vanilla
neural network, a convolutional neural network, and a long short term memory
network to show that checkpoint ensembles outperform existing methods: a method
that selects a model by minimum validation score, and two methods that average
models by weights. Our results also show that checkpoint ensembles capture a
portion of the performance gains that traditional ensembles provide.Comment: 7 pages, 4 figures, under review AAA
Deep learning electromagnetic inversion with convolutional neural networks
Geophysical inversion attempts to estimate the distribution of physical
properties in the Earth's interior from observations collected at or above the
surface. Inverse problems are commonly posed as least-squares optimization
problems in high-dimensional parameter spaces. Existing approaches are largely
based on deterministic gradient-based methods, which are limited by
nonlinearity and nonuniqueness of the inverse problem. Probabilistic inversion
methods, despite their great potential in uncertainty quantification, still
remain a formidable computational task. In this paper, I explore the potential
of deep learning methods for electromagnetic inversion. This approach does not
require calculation of the gradient and provides results instantaneously. Deep
neural networks based on fully convolutional architecture are trained on large
synthetic datasets obtained by full 3-D simulations. The performance of the
method is demonstrated on models of strong practical relevance representing an
onshore controlled source electromagnetic CO2 monitoring scenario. The
pre-trained networks can reliably estimate the position and lateral dimensions
of the anomalies, as well as their resistivity properties. Several fully
convolutional network architectures are compared in terms of their accuracy,
generalization, and cost of training. Examples with different survey geometry
and noise levels confirm the feasibility of the deep learning inversion,
opening the possibility to estimate the subsurface resistivity distribution in
real time.Comment: 27 pages, 14 figure
Which Tasks Should Be Learned Together in Multi-task Learning?
Many computer vision applications require solving multiple tasks in
real-time. A neural network can be trained to solve multiple tasks
simultaneously using multi-task learning. This can save computation at
inference time as only a single network needs to be evaluated. Unfortunately,
this often leads to inferior overall performance as task objectives can
compete, which consequently poses the question: which tasks should and should
not be learned together in one network when employing multi-task learning? We
study task cooperation and competition in several different learning settings
and propose a framework for assigning tasks to a few neural networks such that
cooperating tasks are computed by the same neural network, while competing
tasks are computed by different networks. Our framework offers a time-accuracy
trade-off and can produce better accuracy using less inference time than not
only a single large multi-task neural network but also many single-task
networks.Comment: Presented to ICML 2020 See project website at
http://taskgrouping.stanford.edu
Multimodal and Multiscale Deep Neural Networks for the Early Diagnosis of Alzheimer's Disease using structural MR and FDG-PET images
Alzheimer's Disease (AD) is a progressive neurodegenerative disease. Amnestic
mild cognitive impairment (MCI) is a common first symptom before the conversion
to clinical impairment where the individual becomes unable to perform
activities of daily living independently. Although there is currently no
treatment available, the earlier a conclusive diagnosis is made, the earlier
the potential for interventions to delay or perhaps even prevent progression to
full-blown AD. Neuroimaging scans acquired from MRI and metabolism images
obtained by FDG-PET provide in-vivo view into the structure and function
(glucose metabolism) of the living brain. It is hypothesized that combining
different image modalities could better characterize the change of human brain
and result in a more accuracy early diagnosis of AD. In this paper, we proposed
a novel framework to discriminate normal control(NC) subjects from subjects
with AD pathology (AD and NC, MCI subjects convert to AD in future). Our novel
approach utilizing a multimodal and multiscale deep neural network was found to
deliver a 85.68\% accuracy in the prediction of subjects within 3 years to
conversion. Cross validation experiments proved that it has better
discrimination ability compared with results in existing published literature.Comment: 12 pages, 4 figures, Alzheimer's disease, deep learning, multimodal,
early diagnosis, multiscal
- …