27,619 research outputs found

    Accelerating Neural Architecture Search using Performance Prediction

    Full text link
    Methods for neural network hyperparameter optimization and meta-modeling are computationally expensive due to the need to train a large number of model configurations. In this paper, we show that standard frequentist regression models can predict the final performance of partially trained model configurations using features based on network architectures, hyperparameters, and time-series validation performance data. We empirically show that our performance prediction models are much more effective than prominent Bayesian counterparts, are simpler to implement, and are faster to train. Our models can predict final performance in both visual classification and language modeling domains, are effective for predicting performance of drastically varying model architectures, and can even generalize between model classes. Using these prediction models, we also propose an early stopping method for hyperparameter optimization and meta-modeling, which obtains a speedup of a factor up to 6x in both hyperparameter optimization and meta-modeling. Finally, we empirically show that our early stopping method can be seamlessly incorporated into both reinforcement learning-based architecture selection algorithms and bandit based search methods. Through extensive experimentation, we empirically show our performance prediction models and early stopping algorithm are state-of-the-art in terms of prediction accuracy and speedup achieved while still identifying the optimal model configurations.Comment: Submitted to International Conference on Learning Representations, (2018

    High-dimensional dynamics of generalization error in neural networks

    Full text link
    We perform an average case analysis of the generalization dynamics of large neural networks trained using gradient descent. We study the practically-relevant "high-dimensional" regime where the number of free parameters in the network is on the order of or even larger than the number of examples in the dataset. Using random matrix theory and exact solutions in linear models, we derive the generalization error and training error dynamics of learning and analyze how they depend on the dimensionality of data and signal to noise ratio of the learning problem. We find that the dynamics of gradient descent learning naturally protect against overtraining and overfitting in large networks. Overtraining is worst at intermediate network sizes, when the effective number of free parameters equals the number of samples, and thus can be reduced by making a network smaller or larger. Additionally, in the high-dimensional regime, low generalization error requires starting with small initial weights. We then turn to non-linear neural networks, and show that making networks very large does not harm their generalization performance. On the contrary, it can in fact reduce overtraining, even without early stopping or regularization of any sort. We identify two novel phenomena underlying this behavior in overcomplete models: first, there is a frozen subspace of the weights in which no learning occurs under gradient descent; and second, the statistical properties of the high-dimensional regime yield better-conditioned input correlations which protect against overtraining. We demonstrate that naive application of worst-case theories such as Rademacher complexity are inaccurate in predicting the generalization performance of deep neural networks, and derive an alternative bound which incorporates the frozen subspace and conditioning effects and qualitatively matches the behavior observed in simulation

    NeuNetS: An Automated Synthesis Engine for Neural Network Design

    Full text link
    Application of neural networks to a vast variety of practical applications is transforming the way AI is applied in practice. Pre-trained neural network models available through APIs or capability to custom train pre-built neural network architectures with customer data has made the consumption of AI by developers much simpler and resulted in broad adoption of these complex AI models. While prebuilt network models exist for certain scenarios, to try and meet the constraints that are unique to each application, AI teams need to think about developing custom neural network architectures that can meet the tradeoff between accuracy and memory footprint to achieve the tight constraints of their unique use-cases. However, only a small proportion of data science teams have the skills and experience needed to create a neural network from scratch, and the demand far exceeds the supply. In this paper, we present NeuNetS : An automated Neural Network Synthesis engine for custom neural network design that is available as part of IBM's AI OpenScale's product. NeuNetS is available for both Text and Image domains and can build neural networks for specific tasks in a fraction of the time it takes today with human effort, and with accuracy similar to that of human-designed AI models.Comment: 14 pages, 12 figures. arXiv admin note: text overlap with arXiv:1806.0025

    Neural Network Multitask Learning for Traffic Flow Forecasting

    Full text link
    Traditional neural network approaches for traffic flow forecasting are usually single task learning (STL) models, which do not take advantage of the information provided by related tasks. In contrast to STL, multitask learning (MTL) has the potential to improve generalization by transferring information in training signals of extra tasks. In this paper, MTL based neural networks are used for traffic flow forecasting. For neural network MTL, a backpropagation (BP) network is constructed by incorporating traffic flows at several contiguous time instants into an output layer. Nodes in the output layer can be seen as outputs of different but closely related STL tasks. Comprehensive experiments on urban vehicular traffic flow data and comparisons with STL show that MTL in BP neural networks is a promising and effective approach for traffic flow forecasting

    Fast Hyperparameter Optimization of Deep Neural Networks via Ensembling Multiple Surrogates

    Full text link
    The performance of deep neural networks crucially depends on good hyperparameter configurations. Bayesian optimization is a powerful framework for optimizing the hyperparameters of DNNs. These methods need sufficient evaluation data to approximate and minimize the validation error function of hyperparameters. However, the expensive evaluation cost of DNNs leads to very few evaluation data within a limited time, which greatly reduces the efficiency of Bayesian optimization. Besides, the previous researches focus on using the complete evaluation data to conduct Bayesian optimization, and ignore the intermediate evaluation data generated by early stopping methods. To alleviate the insufficient evaluation data problem, we propose a fast hyperparameter optimization method, HOIST, that utilizes both the complete and intermediate evaluation data to accelerate the hyperparameter optimization of DNNs. Specifically, we train multiple basic surrogates to gather information from the mixed evaluation data, and then combine all basic surrogates using weighted bagging to provide an accurate ensemble surrogate. Our empirical studies show that HOIST outperforms the state-of-the-art approaches on a wide range of DNNs, including feed forward neural networks, convolutional neural networks, recurrent neural networks, and variational autoencoder

    3D Deep Learning for Biological Function Prediction from Physical Fields

    Full text link
    Predicting the biological function of molecules, be it proteins or drug-like compounds, from their atomic structure is an important and long-standing problem. Function is dictated by structure, since it is by spatial interactions that molecules interact with each other, both in terms of steric complementarity, as well as intermolecular forces. Thus, the electron density field and electrostatic potential field of a molecule contain the "raw fingerprint" of how this molecule can fit to binding partners. In this paper, we show that deep learning can predict biological function of molecules directly from their raw 3D approximated electron density and electrostatic potential fields. Protein function based on EC numbers is predicted from the approximated electron density field. In another experiment, the activity of small molecules is predicted with quality comparable to state-of-the-art descriptor-based methods. We propose several alternative computational models for the GPU with different memory and runtime requirements for different sizes of molecules and of databases. We also propose application-specific multi-channel data representations. With future improvements of training datasets and neural network settings in combination with complementary information sources (sequence, genomic context, expression level), deep learning can be expected to show its generalization power and revolutionize the field of molecular function prediction

    Checkpoint Ensembles: Ensemble Methods from a Single Training Process

    Full text link
    We present the checkpoint ensembles method that can learn ensemble models on a single training process. Although checkpoint ensembles can be applied to any parametric iterative learning technique, here we focus on neural networks. Neural networks' composable and simple neurons make it possible to capture many individual and interaction effects among features. However, small sample sizes and sampling noise may result in patterns in the training data that are not representative of the true relationship between the features and the outcome. As a solution, regularization during training is often used (e.g. dropout). However, regularization is no panacea -- it does not perfectly address overfitting. Even with methods like dropout, two methodologies are commonly used in practice. First is to utilize a validation set independent to the training set as a way to decide when to stop training. Second is to use ensemble methods to further reduce overfitting and take advantage of local optima (i.e. averaging over the predictions of several models). In this paper, we explore checkpoint ensembles -- a simple technique that combines these two ideas in one training process. Checkpoint ensembles improve performance by averaging the predictions from "checkpoints" of the best models within single training process. We use three real-world data sets -- text, image, and electronic health record data -- using three prediction models: a vanilla neural network, a convolutional neural network, and a long short term memory network to show that checkpoint ensembles outperform existing methods: a method that selects a model by minimum validation score, and two methods that average models by weights. Our results also show that checkpoint ensembles capture a portion of the performance gains that traditional ensembles provide.Comment: 7 pages, 4 figures, under review AAA

    Deep learning electromagnetic inversion with convolutional neural networks

    Full text link
    Geophysical inversion attempts to estimate the distribution of physical properties in the Earth's interior from observations collected at or above the surface. Inverse problems are commonly posed as least-squares optimization problems in high-dimensional parameter spaces. Existing approaches are largely based on deterministic gradient-based methods, which are limited by nonlinearity and nonuniqueness of the inverse problem. Probabilistic inversion methods, despite their great potential in uncertainty quantification, still remain a formidable computational task. In this paper, I explore the potential of deep learning methods for electromagnetic inversion. This approach does not require calculation of the gradient and provides results instantaneously. Deep neural networks based on fully convolutional architecture are trained on large synthetic datasets obtained by full 3-D simulations. The performance of the method is demonstrated on models of strong practical relevance representing an onshore controlled source electromagnetic CO2 monitoring scenario. The pre-trained networks can reliably estimate the position and lateral dimensions of the anomalies, as well as their resistivity properties. Several fully convolutional network architectures are compared in terms of their accuracy, generalization, and cost of training. Examples with different survey geometry and noise levels confirm the feasibility of the deep learning inversion, opening the possibility to estimate the subsurface resistivity distribution in real time.Comment: 27 pages, 14 figure

    Which Tasks Should Be Learned Together in Multi-task Learning?

    Full text link
    Many computer vision applications require solving multiple tasks in real-time. A neural network can be trained to solve multiple tasks simultaneously using multi-task learning. This can save computation at inference time as only a single network needs to be evaluated. Unfortunately, this often leads to inferior overall performance as task objectives can compete, which consequently poses the question: which tasks should and should not be learned together in one network when employing multi-task learning? We study task cooperation and competition in several different learning settings and propose a framework for assigning tasks to a few neural networks such that cooperating tasks are computed by the same neural network, while competing tasks are computed by different networks. Our framework offers a time-accuracy trade-off and can produce better accuracy using less inference time than not only a single large multi-task neural network but also many single-task networks.Comment: Presented to ICML 2020 See project website at http://taskgrouping.stanford.edu

    Multimodal and Multiscale Deep Neural Networks for the Early Diagnosis of Alzheimer's Disease using structural MR and FDG-PET images

    Full text link
    Alzheimer's Disease (AD) is a progressive neurodegenerative disease. Amnestic mild cognitive impairment (MCI) is a common first symptom before the conversion to clinical impairment where the individual becomes unable to perform activities of daily living independently. Although there is currently no treatment available, the earlier a conclusive diagnosis is made, the earlier the potential for interventions to delay or perhaps even prevent progression to full-blown AD. Neuroimaging scans acquired from MRI and metabolism images obtained by FDG-PET provide in-vivo view into the structure and function (glucose metabolism) of the living brain. It is hypothesized that combining different image modalities could better characterize the change of human brain and result in a more accuracy early diagnosis of AD. In this paper, we proposed a novel framework to discriminate normal control(NC) subjects from subjects with AD pathology (AD and NC, MCI subjects convert to AD in future). Our novel approach utilizing a multimodal and multiscale deep neural network was found to deliver a 85.68\% accuracy in the prediction of subjects within 3 years to conversion. Cross validation experiments proved that it has better discrimination ability compared with results in existing published literature.Comment: 12 pages, 4 figures, Alzheimer's disease, deep learning, multimodal, early diagnosis, multiscal
    • …
    corecore