22,871 research outputs found

    On the role of synaptic stochasticity in training low-precision neural networks

    Get PDF
    Stochasticity and limited precision of synaptic weights in neural network models are key aspects of both biological and hardware modeling of learning processes. Here we show that a neural network model with stochastic binary weights naturally gives prominence to exponentially rare dense regions of solutions with a number of desirable properties such as robustness and good generalization performance, while typical solutions are isolated and hard to find. Binary solutions of the standard perceptron problem are obtained from a simple gradient descent procedure on a set of real values parametrizing a probability distribution over the binary synapses. Both analytical and numerical results are presented. An algorithmic extension aimed at training discrete deep neural networks is also investigated.Comment: 7 pages + 14 pages of supplementary materia

    POWERPLAY: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem

    Get PDF
    Most of computer science focuses on automatically solving given computational problems. I focus on automatically inventing or discovering problems in a way inspired by the playful behavior of animals and humans, to train a more and more general problem solver from scratch in an unsupervised fashion. Consider the infinite set of all computable descriptions of tasks with possibly computable solutions. The novel algorithmic framework POWERPLAY (2011) continually searches the space of possible pairs of new tasks and modifications of the current problem solver, until it finds a more powerful problem solver that provably solves all previously learned tasks plus the new one, while the unmodified predecessor does not. Wow-effects are achieved by continually making previously learned skills more efficient such that they require less time and space. New skills may (partially) re-use previously learned skills. POWERPLAY's search orders candidate pairs of tasks and solver modifications by their conditional computational (time & space) complexity, given the stored experience so far. The new task and its corresponding task-solving skill are those first found and validated. The computational costs of validating new tasks need not grow with task repertoire size. POWERPLAY's ongoing search for novelty keeps breaking the generalization abilities of its present solver. This is related to Goedel's sequence of increasingly powerful formal theories based on adding formerly unprovable statements to the axioms without affecting previously provable theorems. The continually increasing repertoire of problem solving procedures can be exploited by a parallel search for solutions to additional externally posed tasks. POWERPLAY may be viewed as a greedy but practical implementation of basic principles of creativity. A first experimental analysis can be found in separate papers [53,54].Comment: 21 pages, additional connections to previous work, references to first experiments with POWERPLA

    Shaping the learning landscape in neural networks around wide flat minima

    Full text link
    Learning in Deep Neural Networks (DNN) takes place by minimizing a non-convex high-dimensional loss function, typically by a stochastic gradient descent (SGD) strategy. The learning process is observed to be able to find good minimizers without getting stuck in local critical points, and that such minimizers are often satisfactory at avoiding overfitting. How these two features can be kept under control in nonlinear devices composed of millions of tunable connections is a profound and far reaching open question. In this paper we study basic non-convex one- and two-layer neural network models which learn random patterns, and derive a number of basic geometrical and algorithmic features which suggest some answers. We first show that the error loss function presents few extremely wide flat minima (WFM) which coexist with narrower minima and critical points. We then show that the minimizers of the cross-entropy loss function overlap with the WFM of the error loss. We also show examples of learning devices for which WFM do not exist. From the algorithmic perspective we derive entropy driven greedy and message passing algorithms which focus their search on wide flat regions of minimizers. In the case of SGD and cross-entropy loss, we show that a slow reduction of the norm of the weights along the learning process also leads to WFM. We corroborate the results by a numerical study of the correlations between the volumes of the minimizers, their Hessian and their generalization performance on real data.Comment: 37 pages (16 main text), 10 figures (7 main text

    Diseño para operabilidad: Una revisión de enfoques y estrategias de solución

    Get PDF
    In the last decades the chemical engineering scientific research community has largely addressed the design-foroperability problem. Such an interest responds to the fact that the operability quality of a process is determined by design, becoming evident the convenience of considering operability issues in early design stages rather than later when the impact of modifications is less effective and more expensive. The necessity of integrating design and operability is dictated by the increasing complexity of the processes as result of progressively stringent economic, quality, safety and environmental constraints. Although the design-for-operability problem concerns to practically every technical discipline, it has achieved a particular identity within the chemical engineering field due to the economic magnitude of the involved processes. The work on design and analysis for operability in chemical engineering is really vast and a complete review in terms of papers is beyond the scope of this contribution. Instead, two major approaches will be addressed and those papers that in our belief had the most significance to the development of the field will be described in some detail.En las últimas décadas, la comunidad científica de ingeniería química ha abordado intensamente el problema de diseño-para-operabilidad. Tal interés responde al hecho de que la calidad operativa de un proceso esta determinada por diseño, resultando evidente la conveniencia de considerar aspectos operativos en las etapas tempranas del diseño y no luego, cuando el impacto de las modificaciones es menos efectivo y más costoso. La necesidad de integrar diseño y operabilidad esta dictada por la creciente complejidad de los procesos como resultado de las cada vez mayores restricciones económicas, de calidad de seguridad y medioambientales. Aunque el problema de diseño para operabilidad concierne a prácticamente toda disciplina, ha adquirido una identidad particular dentro de la ingeniería química debido a la magnitud económica de los procesos involucrados. El trabajo sobre diseño y análisis para operabilidad es realmente vasto y una revisión completa en términos de artículos supera los alcances de este trabajo. En su lugar, se discutirán los dos enfoques principales y aquellos artículos que en nuestra opinión han tenido mayor impacto para el desarrollo de la disciplina serán descriptos con cierto detalle.Fil: Blanco, Anibal Manuel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Planta Piloto de Ingeniería Química. Universidad Nacional del Sur. Planta Piloto de Ingeniería Química; ArgentinaFil: Bandoni, Jose Alberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Planta Piloto de Ingeniería Química. Universidad Nacional del Sur. Planta Piloto de Ingeniería Química; Argentin
    • …
    corecore