525 research outputs found

    Automating Vehicles by Deep Reinforcement Learning using Task Separation with Hill Climbing

    Full text link
    Within the context of autonomous driving a model-based reinforcement learning algorithm is proposed for the design of neural network-parameterized controllers. Classical model-based control methods, which include sampling- and lattice-based algorithms and model predictive control, suffer from the trade-off between model complexity and computational burden required for the online solution of expensive optimization or search problems at every short sampling time. To circumvent this trade-off, a 2-step procedure is motivated: first learning of a controller during offline training based on an arbitrarily complicated mathematical system model, before online fast feedforward evaluation of the trained controller. The contribution of this paper is the proposition of a simple gradient-free and model-based algorithm for deep reinforcement learning using task separation with hill climbing (TSHC). In particular, (i) simultaneous training on separate deterministic tasks with the purpose of encoding many motion primitives in a neural network, and (ii) the employment of maximally sparse rewards in combination with virtual velocity constraints (VVCs) in setpoint proximity are advocated.Comment: 10 pages, 6 figures, 1 tabl

    Contrasting Views of Complexity and Their Implications For Network-Centric Infrastructures

    Get PDF
    There exists a widely recognized need to better understand and manage complex “systems of systems,” ranging from biology, ecology, and medicine to network-centric technologies. This is motivating the search for universal laws of highly evolved systems and driving demand for new mathematics and methods that are consistent, integrative, and predictive. However, the theoretical frameworks available today are not merely fragmented but sometimes contradictory and incompatible. We argue that complexity arises in highly evolved biological and technological systems primarily to provide mechanisms to create robustness. However, this complexity itself can be a source of new fragility, leading to “robust yet fragile” tradeoffs in system design. We focus on the role of robustness and architecture in networked infrastructures, and we highlight recent advances in the theory of distributed control driven by network technologies. This view of complexity in highly organized technological and biological systems is fundamentally different from the dominant perspective in the mainstream sciences, which downplays function, constraints, and tradeoffs, and tends to minimize the role of organization and design

    Recurrent neural networks: methods and applications to non-linear predictions

    Get PDF
    This thesis deals with recurrent neural networks, a particular class of artificial neural networks which can learn a generative model of input sequences. The input is mapped, through a feedback loop and a non-linear activation function, into a hidden state, which is then projected into the output space, obtaining either a probability distribution or the new input for the next time-step. This work consists mainly of two parts: a theoretical study for helping the understanding of recurrent neural networks framework, which is not yet deeply investigated, and their application to non-linear prediction problems, since recurrent neural networks are really powerful models suitable for solving several practical tasks in different fields. For what concerns the theoretical part, we analyse the weaknesses of state-of-the-art models and tackle them in order to improve the performance of a recurrent neural network. Firstly, we contribute in the understanding of the dynamical properties of a recurrent neural network, highlighting the close relation between the definition of stable limit cycles and the echo state property of an echo state network. We provide sufficient conditions for the convergence of the hidden state to a trajectory, which is uniquely determined by the input signal, independently of the initial states. This may help extend the memory of the network and increase the design options for the network. Moreover, we develop a novel approach to address the main problem in training recurrent neural networks, the so-called vanishing gradient problem. Our new method allows us to train a very simple recurrent neural network, making the gradient not to vanish even after many time-steps. Exploiting the singular value decomposition of the vanishing factors in the gradient and random matrices theory, we find that the singular values have to be confined in a narrow interval and derive conditions about their root mean square value. Then, we also improve the efficiency of the training of a recurrent neural network, defining a new method for speeding up this process. Thanks to a least square regularization, we can initialize the parameters of the network, in order to set them closer to the minimum and running fewer epochs of classical training algorithms. Moreover, it is also possible to completely train the network with our initialization method, running more iterations of it without losing in performance with respect to classical training algorithms. Finally, it is also possible to use it as a real-time learning algorithm, adjusting the parameters to the new data through one iteration of our initialization. In the last part of this thesis, we apply recurrent neural networks to non-linear prediction problems. We consider prediction of numerical sequences, estimating the following input choosing it from a probability distribution. We study an automatic text generation problem, where we need to predict the following character in order to compose words and sentences, and a path prediction of walking mobile users in the central area of a city, as a sequence of crossroads. Then, we analyse the prediction of video frames, discovering a wide range of applications related to the prediction of movements. We study the collision problem of bouncing balls, taking into account only the sequence of video frames without any knowledge about the physical characteristics of the problem, and the distribution over days of mobile user in a city and in a whole region. Finally, we address the state-of-the-art problem of missing data imputation, analysing the incomplete spectrogram of audio signals. We restore audio signals with missing time-frequency data, demonstrating via numerical experiments that a performance improvement can be achieved involving recurrent neural networks

    Latent data augmentation and modular structure for improved generalization

    Full text link
    This thesis explores the nature of generalization in deep learning and several settings in which it fails. In particular, deep neural networks can struggle to generalize in settings with limited data, insufficient supervision, challenging long-range dependencies, or complex structure and subsystems. This thesis explores the nature of these challenges for generalization in deep learning and presents several algorithms which seek to address these challenges. In the first article, we show how training with interpolated hidden states can improve generalization and calibration in deep learning. We also introduce a theory showing how our algorithm, which we call Manifold Mixup, leads to a flattening of the per-class hidden representations, which can be seen as a compression of the information in the hidden states. The second article is related to the first and shows how interpolated examples can be used for semi-supervised learning. In addition to interpolating the input examples, the model’s interpolated predictions are used as targets for these examples. This improves results on standard benchmarks as well as classic 2D toy problems for semi-supervised learning. The third article studies how a recurrent neural network can be divided into multiple modules with different parameters and well separated hidden states, as well as a competition mechanism restricting updating of the hidden states to a subset of the most relevant modules on a specific time-step. This improves systematic generalization when the pattern distribution is changed between the training and evaluation phases. It also improves generalization in reinforcement learning. In the fourth article, we show that attention can be used to control the flow of information between successive layers in deep networks. This allows each layer to only process the subset of the previously computed layers’ outputs which are most relevant. This improves generalization on relational reasoning tasks as well as standard benchmark classification tasks.Cette thĂšse explore la nature de la gĂ©nĂ©ralisation dans l’apprentissage en profondeur et plusieurs contextes dans lesquels elle Ă©choue. En particulier, les rĂ©seaux de neurones profonds peuvent avoir du mal Ă  se gĂ©nĂ©raliser dans des contextes avec des donnĂ©es limitĂ©es, une supervision insuffisante, des dĂ©pendances Ă  longue portĂ©e difficiles ou une structure et des sous-systĂšmes complexes. Cette thĂšse explore la nature de ces dĂ©fis pour la gĂ©nĂ©ralisation en apprentissage profond et prĂ©sente plusieurs algorithmes qui cherchent Ă  relever ces dĂ©fis. Dans le premier article, nous montrons comment l’entraĂźnement avec des Ă©tats cachĂ©s interpolĂ©s peut amĂ©liorer la gĂ©nĂ©ralisation et la calibration en apprentissage profond. Nous introduisons Ă©galement une thĂ©orie montrant comment notre algorithme, que nous appelons Manifold Mixup, conduit Ă  un aplatissement des reprĂ©sentations cachĂ©es par classe, ce qui peut ĂȘtre vu comme une compression de l’information dans les Ă©tats cachĂ©s. Le deuxiĂšme article est liĂ© au premier et montre comment des exemples interpolĂ©s peuvent ĂȘtre utilisĂ©s pour un apprentissage semi-supervisĂ©. Outre l’interpolation des exemples d’entrĂ©e, les prĂ©dictions interpolĂ©es du modĂšle sont utilisĂ©es comme cibles pour ces exemples. Cela amĂ©liore les rĂ©sultats sur les benchmarks standard ainsi que sur les problĂšmes de jouets 2D classiques pour l’apprentissage semi-supervisĂ©. Le troisiĂšme article Ă©tudie comment un rĂ©seau de neurones rĂ©current peut ĂȘtre divisĂ© en plusieurs modules avec des paramĂštres diffĂ©rents et des Ă©tats cachĂ©s bien sĂ©parĂ©s, ainsi qu’un mĂ©canisme de concurrence limitant la mise Ă  jour des Ă©tats cachĂ©s Ă  un sous-ensemble des modules les plus pertinents sur un pas de temps spĂ©cifique. . Cela amĂ©liore la gĂ©nĂ©ralisation systĂ©matique lorsque la distribution des modĂšles est modifiĂ©e entre les phases de entraĂźnement et d’évaluation. Il amĂ©liore Ă©galement la gĂ©nĂ©ralisation dans l’apprentissage par renforcement. Dans le quatriĂšme article, nous montrons que l’attention peut ĂȘtre utilisĂ©e pour contrĂŽler le flux d’informations entre les couches successives des rĂ©seaux profonds. Cela permet Ă  chaque couche de ne traiter que le sous-ensemble des sorties des couches prĂ©cĂ©demment calculĂ©es qui sont les plus pertinentes. Cela amĂ©liore la gĂ©nĂ©ralisation sur les tĂąches de raisonnement relationnel ainsi que sur les tĂąches de classification de rĂ©fĂ©rence standard

    Stochastic Motion Planning as Gaussian Variational Inference: Theory and Algorithms

    Full text link
    We consider the motion planning problem under uncertainty and address it using probabilistic inference. A collision-free motion plan with linear stochastic dynamics is modeled by a posterior distribution. Gaussian variational inference is an optimization over the path distributions to infer this posterior within the scope of Gaussian distributions. We propose Gaussian Variational Inference Motion Planner (GVI-MP) algorithm to solve this Gaussian inference, where a natural gradient paradigm is used to iteratively update the Gaussian distribution, and the factorized structure of the joint distribution is leveraged. We show that the direct optimization over the state distributions in GVI-MP is equivalent to solving a stochastic control that has a closed-form solution. Starting from this observation, we propose our second algorithm, Proximal Gradient Covariance Steering Motion Planner (PGCS-MP), to solve the same inference problem in its stochastic control form with terminal constraints. We use a proximal gradient paradigm to solve the linear stochastic control with nonlinear collision cost, where the nonlinear cost is iteratively approximated using quadratic functions and a closed-form solution can be obtained by solving a linear covariance steering at each iteration. We evaluate the effectiveness and the performance of the proposed approaches through extensive experiments on various robot models. The code for this paper can be found in https://github.com/hzyu17/VIMP.Comment: 19 page
    • 

    corecore