Search CORE

80 research outputs found

Practical recommendations for gradient-based training of deep architectures

Author: Bengio Yoshua
Publication venue
Publication date: 16/09/2012
Field of study

Learning algorithms related to artificial neural networks and in particular for Deep Learning may seem to involve many bells and whistles, called hyper-parameters. This chapter is meant as a practical guide with recommendations for some of the most commonly used hyper-parameters, in particular in the context of learning algorithms based on back-propagated gradient and gradient-based optimization. It also discusses how to deal with the fact that more interesting results can be obtained when allowing one to adjust many hyper-parameters. Overall, it describes elements of the practice used to successfully and efficiently train and debug large-scale and often deep multi-layer neural networks. It closes with open questions about the training difficulties observed with deeper architectures

arXiv.org e-Print Archive

CiteSeerX

Representation Learning for Visual Data

Author: Dumoulin Vincent
Publication venue
Publication date: 01/09/2018
Field of study

Cette thèse par article contribue au domaine de l’apprentissage de représentations profondes, et plus précisément celui des modèles génératifs profonds, par l’entremise de travaux sur les machines de Boltzmann restreintes, les modèles génératifs adversariels ainsi que le pastiche automatique. Le premier article s’intéresse au problème de l’estimation du gradient de la phase négative des machines de Boltzmann par l’échantillonnage d’une réalisation physique du modèle. Nous présentons une évaluation empirique de l’impact sur la performance, mesurée par log-vraisemblance négative, de diverses contraintes associées à l’implémentation physique de machines de Boltzmann restreintes (RBMs), soit le bruit sur les paramètres, l’amplitude limitée des paramètres et une connectivité limitée. Le second article s’attaque au problème de l’inférence dans les modèles génératifs adversariels (GANs). Nous proposons une extension du modèle appelée inférence adversativement apprise (ALI) qui a la particularité d’apprendre jointement l’inférence et la génération à partir d’un principe adversariel. Nous montrons que la représentation apprise par le modèle est utile à la résolution de tâches auxiliaires comme l’apprentissage semi-supervisé en obtenant une performance comparable à l’état de l’art pour les ensembles de données SVHN et CIFAR10. Finalement, le troisième article propose une approche simple et peu coûteuse pour entraîner un réseau unique de pastiche automatique à imiter plusieurs styles artistiques. Nous présentons un mécanisme de conditionnement, appelé normalisation conditionnelle par instance, qui permet au réseau d’imiter plusieurs styles en parallèle via l’apprentissage d’un ensemble de paramètres de normalisation unique à chaque style. Ce mécanisme s’avère très efficace en pratique et a inspiré plusieurs travaux subséquents qui ont appliqué l’idée à des problèmes au-delà du domaine du pastiche automatique.This thesis by articles contributes to the field of deep learning, and more specifically the subfield of deep generative modeling, through work on restricted Boltzmann machines, generative adversarial networks and style transfer networks. The first article examines the idea of tackling the problem of estimating the negative phase gradients in Boltzmann machines by sampling from a physical implementation of the model. We provide an empirical evaluation of the impact of various constraints associated with physical implementations of restricted Boltzmann machines (RBMs), namely noisy parameters, finite parameter amplitude and restricted connectivity patterns, on their performance as measured by negative log-likelihood through software simulation. The second article tackles the inference problem in generative adversarial networks (GANs). It proposes a simple and straightforward extension to the GAN framework, named adversarially learned inference (ALI), which allows inference to be learned jointly with generation in a fully-adversarial framework. We show that the learned representation is useful for auxiliary tasks such as semi-supervised learning by obtaining a performance competitive with the then-state-of-the-art on the SVHN and CIFAR10 semi-supervised learning tasks. Finally, the third article proposes a simple and scalable technique to train a single feedforward style transfer network to model multiple styles. It introduces a conditioning mechanism named conditional instance normalization which allows the network to capture multiple styles in parallel by learning a different set of instance normalization parameters for each style. This mechanism is shown to be very efficient and effective in practice, and has inspired multiple efforts to adapt the idea to problems outside of the artistic style transfer domain

Dépôt Institutionnel Numérique

Recommended from our members

Deep Energy-Based Models for Structured Prediction

Author: Belanger David
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/11/2017
Field of study

We introduce structured prediction energy networks (SPENs), a flexible frame- work for structured prediction. A deep architecture is used to define an energy func- tion over candidate outputs and predictions are produced by gradient-based energy minimization. This deep energy captures dependencies between labels that would lead to intractable graphical models, and allows us to automatically discover discrim- inative features of the structured output. Furthermore, practitioners can explore a wide variety of energy function architectures without having to hand-design predic- tion and learning methods for each model. This is because all of our prediction and learning methods interact with the energy only via the standard interface for deep networks: forward and back-propagation. In a variety of applications, we find that we can obtain better accuracy using approximate minimization of non-convex deep energy functions than baseline models that employ simple energy functions for which exact minimization is tractable

ScholarWorks@UMass Amherst

Deep Learning Applied to PMU Data in Power Systems

Author: Pedro Emanuel Almeida Cardoso
Publication venue
Publication date: 13/07/2017
Field of study

With the advent of Wide Area Measurement Systems and the consequent proliferation of digital measurement devices such as PMUs, control centers are being flooded with growing amounts of data. Therefore, operators are craving for efficient techniques to digest the incoming data, enhancing grid operations by making use of knowledge extraction. Driven by the volumes of data involved, innovative methods in the field of Artificial Intelligence are emerging for harnessing information without declaring complex analytical models. In fact, learning to recognize patterns seems to be the answer to overcome the challenges imposed by processing the huge volumes of raw data involved in PMU-based WAMS. Hence, Deep Learning Frameworks are applied as computational learning techniques so as to extract features from electrical frequency records collected by the Brazillian Medfasee BT Project. More specifically, the work developed proposes a classifier of dynamic events such as generation loss, load shedding, etc., based on frequency change

Repositório Aberto da Universidade do Porto

Concurrent Probabilistic Simulation of High Temperature Composite Structural Response

Author: Abdi Frank
Publication venue
Publication date
Field of study

A computational structural/material analysis and design tool which would meet industry's future demand for expedience and reduced cost is presented. This unique software 'GENOA' is dedicated to parallel and high speed analysis to perform probabilistic evaluation of high temperature composite response of aerospace systems. The development is based on detailed integration and modification of diverse fields of specialized analysis techniques and mathematical models to combine their latest innovative capabilities into a commercially viable software package. The technique is specifically designed to exploit the availability of processors to perform computationally intense probabilistic analysis assessing uncertainties in structural reliability analysis and composite micromechanics. The primary objectives which were achieved in performing the development were: (1) Utilization of the power of parallel processing and static/dynamic load balancing optimization to make the complex simulation of structure, material and processing of high temperature composite affordable; (2) Computational integration and synchronization of probabilistic mathematics, structural/material mechanics and parallel computing; (3) Implementation of an innovative multi-level domain decomposition technique to identify the inherent parallelism, and increasing convergence rates through high- and low-level processor assignment; (4) Creating the framework for Portable Paralleled architecture for the machine independent Multi Instruction Multi Data, (MIMD), Single Instruction Multi Data (SIMD), hybrid and distributed workstation type of computers; and (5) Market evaluation. The results of Phase-2 effort provides a good basis for continuation and warrants Phase-3 government, and industry partnership

NASA Technical Reports Server

Discovering anomalies in big data: a review focused on the application of metaheuristics and machine learning techniques

Author: Claudia Cavallaro
Francesco Zito
Mario Pavone
Vincenzo Cutello
Publication venue: Frontiers Media S.A.
Publication date: 01/08/2023
Field of study

With the increase in available data from computer systems and their security threats, interest in anomaly detection has increased as well in recent years. The need to diagnose faults and cyberattacks has also focused scientific research on the automated classification of outliers in big data, as manual labeling is difficult in practice due to their huge volumes. The results obtained from data analysis can be used to generate alarms that anticipate anomalies and thus prevent system failures and attacks. Therefore, anomaly detection has the purpose of reducing maintenance costs as well as making decisions based on reports. During the last decade, the approaches proposed in the literature to classify unknown anomalies in log analysis, process analysis, and time series have been mainly based on machine learning and deep learning techniques. In this study, we provide an overview of current state-of-the-art methodologies, highlighting their advantages and disadvantages and the new challenges. In particular, we will see that there is no absolute best method, i.e., for any given dataset a different method may achieve the best result. Finally, we describe how the use of metaheuristics within machine learning algorithms makes it possible to have more robust and efficient tools

Directory of Open Access Journals

Efficient Deep Reinforcement Learning via Planning, Generalization, and Improved Exploration

Author: Oh Junhyuk
Publication venue
Publication date: 01/01/2018
Field of study

Reinforcement learning (RL) is a general-purpose machine learning framework, which considers an agent that makes sequential decisions in an environment to maximize its reward. Deep reinforcement learning (DRL) approaches use deep neural networks as non-linear function approximators that parameterize policies or value functions directly from raw observations in RL. Although DRL approaches have been shown to be successful on many challenging RL benchmarks, much of the prior work has mainly focused on learning a single task in a model-free setting, which is often sample-inefficient. On the other hand, humans have abilities to acquire knowledge by learning a model of the world in an unsupervised fashion, use such knowledge to plan ahead for decision making, transfer knowledge between many tasks, and generalize to previously unseen circumstances from the pre-learned knowledge. Developing such abilities are some of the fundamental challenges for building RL agents that can learn as efficiently as humans. As a step towards developing the aforementioned capabilities in RL, this thesis develops new DRL techniques to address three important challenges in RL: 1) planning via prediction, 2) rapidly generalizing to new environments and tasks, and 3) efficient exploration in complex environments. The first part of the thesis discusses how to learn a dynamics model of the environment using deep neural networks and how to use such a model for planning in complex domains where observations are high-dimensional. Specifically, we present neural network architectures for action-conditional video prediction and demonstrate improved exploration in RL. In addition, we present a neural network architecture that performs lookahead planning by predicting the future only in terms of rewards and values without predicting observations. We then discuss why this approach is beneficial compared to conventional model-based planning approaches. The second part of the thesis considers generalization to unseen environments and tasks. We first introduce a set of cognitive tasks in a 3D environment and present memory-based DRL architectures that generalize better to previously unseen 3D environments compared to existing baselines. In addition, we introduce a new multi-task RL problem where the agent should learn to execute different tasks depending on given instructions and generalize to new instructions in a zero-shot fashion. We present a new hierarchical DRL architecture that learns to generalize over previously unseen task descriptions with minimal prior knowledge. The third part of the thesis discusses how exploiting past experiences can indirectly drive deep exploration and improve sample-efficiency. In particular, we propose a new off-policy learning algorithm, called self-imitation learning, which learns a policy to reproduce past good experiences. We empirically show that self-imitation learning indirectly encourages the agent to explore reasonably good state spaces and thus significantly improves sample-efficiency on RL domains where exploration is challenging. Overall, the main contribution of this thesis are to explore several fundamental challenges in RL in the context of DRL and develop new DRL architectures and algorithms to address such challenges. This allows us to understand how deep learning can be used to improve sample efficiency, and thus come closer to human-like learning abilities.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145829/1/junhyuk_1.pd

Deep Blue Documents at the University of Michigan

Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks

Author: Alistarh Dan-Adrian
Ben-Nun Tal
Dryden Nikoli
Hoefler Torsten
Peste Elena-Alexandra
Publication venue: Journal of Machine Learning Research
Publication date: 01/01/2021
Field of study

The growing energy and performance costs of deep learning have driven the community to reduce the size of neural networks by selectively pruning components. Similarly to their biological counterparts, sparse networks generalize just as well, sometimes even better than, the original dense networks. Sparsity promises to reduce the memory footprint of regular networks to fit mobile devices, as well as shorten training time for ever growing networks. In this paper, we survey prior work on sparsity in deep learning and provide an extensive tutorial of sparsification for both inference and training. We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice. Our work distills ideas from more than 300 research papers and provides guidance to practitioners who wish to utilize sparsity today, as well as to researchers whose goal is to push the frontier forward. We include the necessary background on mathematical methods in sparsification, describe phenomena such as early structure adaptation, the intricate relations between sparsity and the training process, and show techniques for achieving acceleration on real hardware. We also define a metric of pruned parameter efficiency that could serve as a baseline for comparison of different sparse networks. We close by speculating on how sparsity can improve future workloads and outline major open problems in the field

IST Austria: PubRep (Institute of Science and Technology)