9,148 research outputs found

    On information captured by neural networks: connections with memorization and generalization

    Full text link
    Despite the popularity and success of deep learning, there is limited understanding of when, how, and why neural networks generalize to unseen examples. Since learning can be seen as extracting information from data, we formally study information captured by neural networks during training. Specifically, we start with viewing learning in presence of noisy labels from an information-theoretic perspective and derive a learning algorithm that limits label noise information in weights. We then define a notion of unique information that an individual sample provides to the training of a deep network, shedding some light on the behavior of neural networks on examples that are atypical, ambiguous, or belong to underrepresented subpopulations. We relate example informativeness to generalization by deriving nonvacuous generalization gap bounds. Finally, by studying knowledge distillation, we highlight the important role of data and label complexity in generalization. Overall, our findings contribute to a deeper understanding of the mechanisms underlying neural network generalization.Comment: PhD thesi

    Modular lifelong machine learning

    Get PDF
    Deep learning has drastically improved the state-of-the-art in many important fields, including computer vision and natural language processing (LeCun et al., 2015). However, it is expensive to train a deep neural network on a machine learning problem. The overall training cost further increases when one wants to solve additional problems. Lifelong machine learning (LML) develops algorithms that aim to efficiently learn to solve a sequence of problems, which become available one at a time. New problems are solved with less resources by transferring previously learned knowledge. At the same time, an LML algorithm needs to retain good performance on all encountered problems, thus avoiding catastrophic forgetting. Current approaches do not possess all the desired properties of an LML algorithm. First, they primarily focus on preventing catastrophic forgetting (Diaz-Rodriguez et al., 2018; Delange et al., 2021). As a result, they neglect some knowledge transfer properties. Furthermore, they assume that all problems in a sequence share the same input space. Finally, scaling these methods to a large sequence of problems remains a challenge. Modular approaches to deep learning decompose a deep neural network into sub-networks, referred to as modules. Each module can then be trained to perform an atomic transformation, specialised in processing a distinct subset of inputs. This modular approach to storing knowledge makes it easy to only reuse the subset of modules which are useful for the task at hand. This thesis introduces a line of research which demonstrates the merits of a modular approach to lifelong machine learning, and its ability to address the aforementioned shortcomings of other methods. Compared to previous work, we show that a modular approach can be used to achieve more LML properties than previously demonstrated. Furthermore, we develop tools which allow modular LML algorithms to scale in order to retain said properties on longer sequences of problems. First, we introduce HOUDINI, a neurosymbolic framework for modular LML. HOUDINI represents modular deep neural networks as functional programs and accumulates a library of pre-trained modules over a sequence of problems. Given a new problem, we use program synthesis to select a suitable neural architecture, as well as a high-performing combination of pre-trained and new modules. We show that our approach has most of the properties desired from an LML algorithm. Notably, it can perform forward transfer, avoid negative transfer and prevent catastrophic forgetting, even across problems with disparate input domains and problems which require different neural architectures. Second, we produce a modular LML algorithm which retains the properties of HOUDINI but can also scale to longer sequences of problems. To this end, we fix the choice of a neural architecture and introduce a probabilistic search framework, PICLE, for searching through different module combinations. To apply PICLE, we introduce two probabilistic models over neural modules which allows us to efficiently identify promising module combinations. Third, we phrase the search over module combinations in modular LML as black-box optimisation, which allows one to make use of methods from the setting of hyperparameter optimisation (HPO). We then develop a new HPO method which marries a multi-fidelity approach with model-based optimisation. We demonstrate that this leads to improvement in anytime performance in the HPO setting and discuss how this can in turn be used to augment modular LML methods. Overall, this thesis identifies a number of important LML properties, which have not all been attained in past methods, and presents an LML algorithm which can achieve all of them, apart from backward transfer

    Beam scanning by liquid-crystal biasing in a modified SIW structure

    Get PDF
    A fixed-frequency beam-scanning 1D antenna based on Liquid Crystals (LCs) is designed for application in 2D scanning with lateral alignment. The 2D array environment imposes full decoupling of adjacent 1D antennas, which often conflicts with the LC requirement of DC biasing: the proposed design accommodates both. The LC medium is placed inside a Substrate Integrated Waveguide (SIW) modified to work as a Groove Gap Waveguide, with radiating slots etched on the upper broad wall, that radiates as a Leaky-Wave Antenna (LWA). This allows effective application of the DC bias voltage needed for tuning the LCs. At the same time, the RF field remains laterally confined, enabling the possibility to lay several antennas in parallel and achieve 2D beam scanning. The design is validated by simulation employing the actual properties of a commercial LC medium

    Reinforcement learning in large state action spaces

    Get PDF
    Reinforcement learning (RL) is a promising framework for training intelligent agents which learn to optimize long term utility by directly interacting with the environment. Creating RL methods which scale to large state-action spaces is a critical problem towards ensuring real world deployment of RL systems. However, several challenges limit the applicability of RL to large scale settings. These include difficulties with exploration, low sample efficiency, computational intractability, task constraints like decentralization and lack of guarantees about important properties like performance, generalization and robustness in potentially unseen scenarios. This thesis is motivated towards bridging the aforementioned gap. We propose several principled algorithms and frameworks for studying and addressing the above challenges RL. The proposed methods cover a wide range of RL settings (single and multi-agent systems (MAS) with all the variations in the latter, prediction and control, model-based and model-free methods, value-based and policy-based methods). In this work we propose the first results on several different problems: e.g. tensorization of the Bellman equation which allows exponential sample efficiency gains (Chapter 4), provable suboptimality arising from structural constraints in MAS(Chapter 3), combinatorial generalization results in cooperative MAS(Chapter 5), generalization results on observation shifts(Chapter 7), learning deterministic policies in a probabilistic RL framework(Chapter 6). Our algorithms exhibit provably enhanced performance and sample efficiency along with better scalability. Additionally, we also shed light on generalization aspects of the agents under different frameworks. These properties have been been driven by the use of several advanced tools (e.g. statistical machine learning, state abstraction, variational inference, tensor theory). In summary, the contributions in this thesis significantly advance progress towards making RL agents ready for large scale, real world applications

    Unstable Periodic Orbits: a language to interpret the complexity of chaotic systems

    Get PDF
    Unstable periodic orbits (UPOs), exact periodic solutions of the evolution equation, offer a very powerful framework for studying chaotic dynamical systems, as they allow one to dissect their dynamical structure. UPOs can be considered the skeleton of chaotic dynamics, its essential building blocks. In fact, it is possible to prove that in a chaotic system, UPOs are dense in the attractor, meaning that it is always possible to find a UPO arbitrarily near any chaotic trajectory. We can thus think of the chaotic trajectory as being approximated by different UPOs as it evolves in time, jumping from one UPO to another as a result of their instability. In this thesis we provide a contribution towards the use of UPOs as a tool to understand and distill the dynamical structure of chaotic dynamical systems. We will focus on two models, characterised by different properties, the Lorenz-63 and Lorenz-96 model. The process of approximation of a chaotic trajectory in terms of UPOs will play a central role in our investigation. In fact, we will use this tool to explore the properties of the attractor of the system under the lens of its UPOs. In the first part of the thesis we consider the Lorenz-63 model with the classic parameters’ value. We investigate how a chaotic trajectory can be approximated using a complete set of UPOs up to symbolic dynamics’ period 14. At each instant in time, we rank the UPOs according to their proximity to the position of the orbit in the phase space. We study this process from two different perspectives. First, we find that longer period UPOs overwhelmingly provide the best local approximation to the trajectory. Second, we construct a finite-state Markov chain by studying the scattering of the trajectory between the neighbourhood of the various UPOs. Each UPO and its neighbourhood are taken as a possible state of the system. Through the analysis of the subdominant eigenvectors of the corresponding stochastic matrix we provide a different interpretation of the mixing processes occurring in the system by taking advantage of the concept of quasi-invariant sets. In the second part of the thesis we provide an extensive numerical investigation of the variability of the dynamical properties across the attractor of the much studied Lorenz ’96 dynamical system. By combining the Lyapunov analysis of the tangent space with the study of the shadowing of the chaotic trajectory performed by a very large set of unstable periodic orbits, we show that the observed variability in the number of unstable dimensions, which shows a serious breakdown of hyperbolicity, is associated with the presence of a substantial number of finite-time Lyapunov exponents that fluctuate about zero also when very long averaging times are considered

    Extending the reach of uncertainty quantification in nuclear theory

    Get PDF
    The theory of the strong interaction—quantum chromodynamics (QCD)—is unsuited to practical calculations of nuclear observables and approximate models for nuclear interaction potentials are required. In contrast to phenomenological models, chiral effective field theories (χEFTs) of QCD grant a handle on the theoretical uncertainty arising from the truncation of the chiral expansion. Uncertainties in χEFT are preferably quantified using Bayesian inference, but quantifying reliable posterior predictive distributions for nuclear observables presents several challenges. First, χEFT is parametrized by unknown low-energy constants (LECs) whose values must be inferred from low-energy data of nuclear structure and reaction observables. There are 31 LECs at fourth order in Weinberg power counting, leading to a high-dimensional inference problem which I approach by developing an advanced sampling protocol using Hamiltonian Monte Carlo (HMC). This allows me to quantify LEC posteriors up to and including fourth chiral order. Second, the χEFT truncation error is correlated across independent variables such as scattering energies and angles; I model correlations using a Gaussian process. Third, the computational cost of computing few- and many-nucleon observables typically precludes their direct use in Bayesian parameter estimation as each observable must be computed in excess of 100,000 times during HMC sampling. The one exception is nucleon-nucleon scattering observables, but even these incur a substantial computational cost in the present applications. I sidestep such issues using eigenvector-continuation emulators, which accurately mimic exact calculations while dramatically reducing the computational cost. Equipped with Bayesian posteriors for the LECs, and a model for the truncation error, I explore the predictive ability of χEFT, presenting the results as the probability distributions they always were

    Intelligent computing : the latest advances, challenges and future

    Get PDF
    Computing is a critical driving force in the development of human civilization. In recent years, we have witnessed the emergence of intelligent computing, a new computing paradigm that is reshaping traditional computing and promoting digital revolution in the era of big data, artificial intelligence and internet-of-things with new computing theories, architectures, methods, systems, and applications. Intelligent computing has greatly broadened the scope of computing, extending it from traditional computing on data to increasingly diverse computing paradigms such as perceptual intelligence, cognitive intelligence, autonomous intelligence, and human computer fusion intelligence. Intelligence and computing have undergone paths of different evolution and development for a long time but have become increasingly intertwined in recent years: intelligent computing is not only intelligence-oriented but also intelligence-driven. Such cross-fertilization has prompted the emergence and rapid advancement of intelligent computing

    Modelling, Monitoring, Control and Optimization for Complex Industrial Processes

    Get PDF
    This reprint includes 22 research papers and an editorial, collected from the Special Issue "Modelling, Monitoring, Control and Optimization for Complex Industrial Processes", highlighting recent research advances and emerging research directions in complex industrial processes. This reprint aims to promote the research field and benefit the readers from both academic communities and industrial sectors
    • …
    corecore