22,871 research outputs found
On the role of synaptic stochasticity in training low-precision neural networks
Stochasticity and limited precision of synaptic weights in neural network
models are key aspects of both biological and hardware modeling of learning
processes. Here we show that a neural network model with stochastic binary
weights naturally gives prominence to exponentially rare dense regions of
solutions with a number of desirable properties such as robustness and good
generalization performance, while typical solutions are isolated and hard to
find. Binary solutions of the standard perceptron problem are obtained from a
simple gradient descent procedure on a set of real values parametrizing a
probability distribution over the binary synapses. Both analytical and
numerical results are presented. An algorithmic extension aimed at training
discrete deep neural networks is also investigated.Comment: 7 pages + 14 pages of supplementary materia
POWERPLAY: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem
Most of computer science focuses on automatically solving given computational
problems. I focus on automatically inventing or discovering problems in a way
inspired by the playful behavior of animals and humans, to train a more and
more general problem solver from scratch in an unsupervised fashion. Consider
the infinite set of all computable descriptions of tasks with possibly
computable solutions. The novel algorithmic framework POWERPLAY (2011)
continually searches the space of possible pairs of new tasks and modifications
of the current problem solver, until it finds a more powerful problem solver
that provably solves all previously learned tasks plus the new one, while the
unmodified predecessor does not. Wow-effects are achieved by continually making
previously learned skills more efficient such that they require less time and
space. New skills may (partially) re-use previously learned skills. POWERPLAY's
search orders candidate pairs of tasks and solver modifications by their
conditional computational (time & space) complexity, given the stored
experience so far. The new task and its corresponding task-solving skill are
those first found and validated. The computational costs of validating new
tasks need not grow with task repertoire size. POWERPLAY's ongoing search for
novelty keeps breaking the generalization abilities of its present solver. This
is related to Goedel's sequence of increasingly powerful formal theories based
on adding formerly unprovable statements to the axioms without affecting
previously provable theorems. The continually increasing repertoire of problem
solving procedures can be exploited by a parallel search for solutions to
additional externally posed tasks. POWERPLAY may be viewed as a greedy but
practical implementation of basic principles of creativity. A first
experimental analysis can be found in separate papers [53,54].Comment: 21 pages, additional connections to previous work, references to
first experiments with POWERPLA
Shaping the learning landscape in neural networks around wide flat minima
Learning in Deep Neural Networks (DNN) takes place by minimizing a non-convex
high-dimensional loss function, typically by a stochastic gradient descent
(SGD) strategy. The learning process is observed to be able to find good
minimizers without getting stuck in local critical points, and that such
minimizers are often satisfactory at avoiding overfitting. How these two
features can be kept under control in nonlinear devices composed of millions of
tunable connections is a profound and far reaching open question. In this paper
we study basic non-convex one- and two-layer neural network models which learn
random patterns, and derive a number of basic geometrical and algorithmic
features which suggest some answers. We first show that the error loss function
presents few extremely wide flat minima (WFM) which coexist with narrower
minima and critical points. We then show that the minimizers of the
cross-entropy loss function overlap with the WFM of the error loss. We also
show examples of learning devices for which WFM do not exist. From the
algorithmic perspective we derive entropy driven greedy and message passing
algorithms which focus their search on wide flat regions of minimizers. In the
case of SGD and cross-entropy loss, we show that a slow reduction of the norm
of the weights along the learning process also leads to WFM. We corroborate the
results by a numerical study of the correlations between the volumes of the
minimizers, their Hessian and their generalization performance on real data.Comment: 37 pages (16 main text), 10 figures (7 main text
Diseño para operabilidad: Una revisión de enfoques y estrategias de solución
In the last decades the chemical engineering scientific research community has largely addressed the design-foroperability problem. Such an interest responds to the fact that the operability quality of a process is determined by design, becoming evident the convenience of considering operability issues in early design stages rather than later when the impact of modifications is less effective and more expensive. The necessity of integrating design and operability is dictated by the increasing complexity of the processes as result of progressively stringent economic, quality, safety and environmental constraints. Although the design-for-operability problem concerns to practically every technical discipline, it has achieved a particular identity within the chemical engineering field due to the economic magnitude of the involved processes. The work on design and analysis for operability in chemical engineering is really vast and a complete review in terms of papers is beyond the scope of this contribution. Instead, two major approaches will be addressed and those papers that in our belief had the most significance to the development of the field will be described in some detail.En las últimas décadas, la comunidad cientÃfica de ingenierÃa quÃmica ha abordado intensamente el problema de diseño-para-operabilidad. Tal interés responde al hecho de que la calidad operativa de un proceso esta determinada por diseño, resultando evidente la conveniencia de considerar aspectos operativos en las etapas tempranas del diseño y no luego, cuando el impacto de las modificaciones es menos efectivo y más costoso. La necesidad de integrar diseño y operabilidad esta dictada por la creciente complejidad de los procesos como resultado de las cada vez mayores restricciones económicas, de calidad de seguridad y medioambientales. Aunque el problema de diseño para operabilidad concierne a prácticamente toda disciplina, ha adquirido una identidad particular dentro de la ingenierÃa quÃmica debido a la magnitud económica de los procesos involucrados. El trabajo sobre diseño y análisis para operabilidad es realmente vasto y una revisión completa en términos de artÃculos supera los alcances de este trabajo. En su lugar, se discutirán los dos enfoques principales y aquellos artÃculos que en nuestra opinión han tenido mayor impacto para el desarrollo de la disciplina serán descriptos con cierto detalle.Fil: Blanco, Anibal Manuel. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Planta Piloto de IngenierÃa QuÃmica. Universidad Nacional del Sur. Planta Piloto de IngenierÃa QuÃmica; ArgentinaFil: Bandoni, Jose Alberto. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Planta Piloto de IngenierÃa QuÃmica. Universidad Nacional del Sur. Planta Piloto de IngenierÃa QuÃmica; Argentin
- …