633 research outputs found

    Polyploidism in Deep Neural Networks: m-Parent Evolutionary Synthesis of Deep Neural Networks in Varying Population Sizes

    Get PDF
    Evolutionary deep intelligence was recently proposed to organicallyproduce highly efficient deep neural network architecturesover successive generations. Thus far, current evolutionary synthesisprocesses are based on asexual reproduction, i.e., offspringneural networks are synthesized stochastically from a single parentnetwork. In this study, we investigate the effects of m-parentsexual evolutionary synthesis (m = 1, 2, 3, 5) in combination withvarying population sizes of three, five, and eight synthesized networksper generation. Experimental results were obtained usinga 10% subset of the MNIST handwritten digits dataset, and showthat increasing the number of parent networks results in improvedarchitectural efficiency of the synthesized networks (approximately150x synaptic efficiency and approximately 42–49x cluster efficiency)while resulting in only a 2–3% drop in testing accuracy

    Highly Efficient Deep Intelligence via Multi-Parent Evolutionary Synthesis of Deep Neural Networks

    Get PDF
    Machine learning methods, and particularly deep neural networks, are a rapidly growing field and are currently being employed in domains such as science, business, and government. However, the significant success of neural networks has largely been due to the increasingly large model sizes and enormous amounts of required training data. As a result, powerful neural networks are accompanied by growing storage and memory requirements, making these powerful models infeasible for practical scenarios that use small embedded devices without access to cloud computing. As such, methods for significantly reducing the memory and computational requirements of high-performing deep neural networks via sparsification and/or compression have been developed. More recently, the concept of evolutionary deep intelligence was proposed, and takes inspiration from nature and allows highly-efficient deep neural networks to organically synthesize over successive generations. However, current work in evolutionary deep intelligence has been limited to the use of asexual evolutionary synthesis where a newly synthesized offspring network is solely dependent on a single parent network from the preceding generation. In this thesis, we introduce a general framework for synthesizing efficient neural network architectures via multi-parent evolutionary synthesis. Generalized from the asexual evolutionary synthesis approach, the framework allows for a newly synthesized network to be dependent on a subset of all previously synthesized networks. By imposing constraints on this general framework, the cases of asexual evolutionary synthesis, 2-parent sexual evolutionary synthesis, and m-parent evolutionary synthesis can all be realized. We explore the computational construct used to mimic heredity, and generalize it beyond the asexual evolutionary synthesis used in current evolutionary deep intelligence works. The efficacy of incorporating multiple parent networks during evolutionary synthesis was examined first in the context of 2-parent sexual evolutionary synthesis, then generalized to m-parent evolutionary synthesis in the context of varying generational population sizes. Both experiments show that the use of multiple parent networks during evolutionary synthesis allows for increased network diversity as well as steeper trends in increasing network efficiency over generations. We also introduce the concept of gene tagging within the evolutionary deep intelligence framework as a means to enforce a like-with-like mating policy during the multi-parent evolutionary synthesis process, and evaluate the effect of architectural alignment during multi-parent evolutionary synthesis. We present an experiment exploring the quantification of network architectural similarity in populations of networks. In addition, we investigate the the computational construct used to mimic natural selection. The impact of various environmental resource models used to mimic the constraint of available computational and storage resources on network synthesis over successive generations is explored, and results clearly demonstrate the trade-off between computation time and optimal model performance. The results of m-parent evolutionary synthesis are promising, and indicate the potential benefits of incorporating multiple parent networks during evolutionary synthesis for highly-efficient evolutionary deep intelligence. Future work includes studying the effects of inheriting weight values (as opposed to random initialization) on total training time and further investigation of potential structural similarity metrics, with the goal of developing a deeper understanding of the underlying effects of network architecture on performance

    Gradient-free activation maximization for identifying effective stimuli

    Full text link
    A fundamental question for understanding brain function is what types of stimuli drive neurons to fire. In visual neuroscience, this question has also been posted as characterizing the receptive field of a neuron. The search for effective stimuli has traditionally been based on a combination of insights from previous studies, intuition, and luck. Recently, the same question has emerged in the study of units in convolutional neural networks (ConvNets), and together with this question a family of solutions were developed that are generally referred to as "feature visualization by activation maximization." We sought to bring in tools and techniques developed for studying ConvNets to the study of biological neural networks. However, one key difference that impedes direct translation of tools is that gradients can be obtained from ConvNets using backpropagation, but such gradients are not available from the brain. To circumvent this problem, we developed a method for gradient-free activation maximization by combining a generative neural network with a genetic algorithm. We termed this method XDream (EXtending DeepDream with real-time evolution for activation maximization), and we have shown that this method can reliably create strong stimuli for neurons in the macaque visual cortex (Ponce et al., 2019). In this paper, we describe extensive experiments characterizing the XDream method by using ConvNet units as in silico models of neurons. We show that XDream is applicable across network layers, architectures, and training sets; examine design choices in the algorithm; and provide practical guides for choosing hyperparameters in the algorithm. XDream is an efficient algorithm for uncovering neuronal tuning preferences in black-box networks using a vast and diverse stimulus space.Comment: 16 pages, 8 figures, 3 table

    Brain-Inspired Computational Intelligence via Predictive Coding

    Full text link
    Artificial intelligence (AI) is rapidly becoming one of the key technologies of this century. The majority of results in AI thus far have been achieved using deep neural networks trained with the error backpropagation learning algorithm. However, the ubiquitous adoption of this approach has highlighted some important limitations such as substantial computational cost, difficulty in quantifying uncertainty, lack of robustness, unreliability, and biological implausibility. It is possible that addressing these limitations may require schemes that are inspired and guided by neuroscience theories. One such theory, called predictive coding (PC), has shown promising performance in machine intelligence tasks, exhibiting exciting properties that make it potentially valuable for the machine learning community: PC can model information processing in different brain areas, can be used in cognitive control and robotics, and has a solid mathematical grounding in variational inference, offering a powerful inversion scheme for a specific class of continuous-state generative models. With the hope of foregrounding research in this direction, we survey the literature that has contributed to this perspective, highlighting the many ways that PC might play a role in the future of machine learning and computational intelligence at large.Comment: 37 Pages, 9 Figure

    Exploring New Forms of Random Projections for Prediction and Dimensionality Reduction in Big-Data Regimes

    Get PDF
    The story of this work is dimensionality reduction. Dimensionality reduction is a method that takes as input a point-set P of n points in R^d where d is typically large and attempts to find a lower-dimensional representation of that dataset, in order to ease the burden of processing for down-stream algorithms. In today’s landscape of machine learning, researchers and practitioners work with datasets that either have a very large number of samples, and or include high-dimensional samples. Therefore, dimensionality reduction is applied as a pre-processing technique primarily to overcome the curse of dimensionality. Generally, dimensionality reduction improves time and storage space required for processing the point-set, removes multi-collinearity and redundancies in the dataset where different features may depend on one another, and may enable simple visualizations of the dataset in 2-D and 3-D making the relationships in the data easy for humans to comprehend. Dimensionality reduction methods come in many shapes and sizes. Methods such as Principal Component Analysis (PCA), Multi-dimensional Scaling, IsoMaps, and Locally Linear Embeddings are amongst the most commonly used method of this family of algorithms. However, the choice of dimensionality reduction method proves critical in many applications as there is no one-size-fits-all solution, and special care must be considered for different datasets and tasks. Furthermore, the aforementioned popular methods are data-dependent, and commonly rely on computing either the Kernel / Gram matrix or the covariance matrix of the dataset. These matrices scale with increasing number of samples and increasing number of data dimensions, respectively, and are consequently poor choices in today’s landscape of big-data applications. Therefore, it is pertinent to develop new dimensionality reduction methods that can be efficiently applied to large and high-dimensional datasets, by either reducing the dependency on the data, or side-stepping it altogether. Furthermore, such new dimensionality reduction methods should be able to perform on par with, or better than, traditional methods such as PCA. To achieve this goal, we turn to a simple and powerful method called random projections. Random projections are a simple, efficient, and data-independent method for stably embedding a point-set P of n points in R^d to R^k where d is typically large and k is on the order of log n. Random projections have a long history of use in dimensionality reduction literature with great success. In this work, we are inspired to build on the ideas of random projection theory, and extend the framework and build a powerful new setup of random projections for large high-dimensional datasets, with comparable performance to state-of-the-art data-dependent and nonlinear methods. Furthermore, we study the use of random projections in domains other than dimensionality reduction, including prediction, and show the competitive performance of such methods for processing small dataset regimes
    • …