320 research outputs found

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    From Human Behavior to Machine Behavior

    Get PDF
    A core pursuit of artificial intelligence is the comprehension of human behavior. Imbuing intelligent agents with a good human behavior model can help them understand how to behave intelligently and interactively in complex situations. Due to the increase in data availability and computational resources, the development of machine learning algorithms for duplicating human cognitive abilities has made rapid progress. To solve difficult scenarios, learning-based methods must search for solutions in a predefined but large space. Along with implementing a smart exploration strategy, the right representation for a task can help narrow the search process during learning. This dissertation tackles three important aspects of machine intelligence: 1) prediction, 2) exploration, and 3) representation. More specifically we develop new algorithms for 1) predicting the future maneuvers or outcomes in pilot training and computer architecture applications; 2) exploration strategies for reinforcement learning in game environments and 3) scene representations for autonomous driving agents capable of handling large numbers of dynamic entities. This dissertation makes the following research contributions in the area of representation learning. First, we introduce a new time series representation for flight trajectories in intelligent pilot training simulations. Second, we demonstrate a method, Temporally Aware Embedding (TAE) for learning an embedding that leverages temporal information extracted from data retrieval series. Third, the dissertation introduces GRAD (Graph Representation for Autonomous Driving) that incorporates the future location of neighboring vehicles into the decision-making process. We demonstrate the usage of our models for pilot training, cache usage prediction, and autonomous driving; however, believe that our new time series representations can be applied to many other types of modeling problems

    Model learning for trajectory tracking of robot manipulators

    Get PDF
    Abstract Model based controllers have drastically improved robot performance, increasing task accuracy while reducing control effort. Nevertheless, all this was realized with a very strong assumption: the exact knowledge of the physical properties of both the robot and the environment that surrounds it. This assertion is often misleading: in fact modern robots are modeled in a very approximate way and, more important, the environment is almost never static and completely known. Also for systems very simple, such as robot manipulators, these assumptions are still too strong and must be relaxed. Many methods were developed which, exploiting previous experiences, are able to refine the nominal model: from classic identification techniques to more modern machine learning based approaches. Indeed, the topic of this thesis is the investigation of these data driven techniques in the context of robot control for trajectory tracking. In the first two chapters, preliminary knowledge is provided on both model based controllers, used in robotics to assure precise trajectory tracking, and model learning techniques. In the following three chapters, are presented the novelties introduced by the author in this context with respect to the state of the art: three works with the same premise (an inaccurate system modeling), an identical goal (accurate trajectory tracking control) but with small differences according to the specific platform of application (fully actuated, underactuated, redundant robots). In all the considered architectures, an online learning scheme has been introduced to correct the nominal feedback linearization control law. Indeed, the method has been primarily introduced in the literature to cope with fully actuated systems, showing its efficacy in the accurate tracking of joint space trajectories also with an inaccurate dynamic model. The main novelty of the technique was the use of only kinematics information, instead of torque measurements (in general very noisy), to online retrieve and compensate the dynamic mismatches. After that the method has been extended to underactuated robots. This new architecture was composed by an online learning correction of the controller, acting on the actuated part of the system (the nominal partial feedback linearization), and an offline planning phase, required to realize a dynamically feasible trajectory also for the zero dynamics of the system. The scheme was iterative: after each trial, according to the collected information, both the phases were improved and then repeated until the task achievement. Also in this case the method showed its capability, both in numerical simulations and on real experiments on a robotics platform. Eventually the method has been applied to redundant systems: differently from before, in this context the task consisted in the accurate tracking of a Cartesian end effector trajectory. In principle very similar to the fully actuated case, the presence of redundancy slowed down drastically the learning machinery convergence, worsening the performance. In order to cope with this, a redundancy resolution was proposed that, exploiting an approximation of the learning algorithm (Gaussian process regression), allowed to locally maximize the information and so select the most convenient self motion for the system; moreover, all of this was realized with just the resolution of a quadratic programming problem. Also in this case the method showed its performance, realizing an accurate online tracking while reducing both the control effort and the joints velocity, obtaining so a natural behaviour. The thesis concludes with summary considerations on the proposed approach and with possible future directions of research

    Modular lifelong machine learning

    Get PDF
    Deep learning has drastically improved the state-of-the-art in many important fields, including computer vision and natural language processing (LeCun et al., 2015). However, it is expensive to train a deep neural network on a machine learning problem. The overall training cost further increases when one wants to solve additional problems. Lifelong machine learning (LML) develops algorithms that aim to efficiently learn to solve a sequence of problems, which become available one at a time. New problems are solved with less resources by transferring previously learned knowledge. At the same time, an LML algorithm needs to retain good performance on all encountered problems, thus avoiding catastrophic forgetting. Current approaches do not possess all the desired properties of an LML algorithm. First, they primarily focus on preventing catastrophic forgetting (Diaz-Rodriguez et al., 2018; Delange et al., 2021). As a result, they neglect some knowledge transfer properties. Furthermore, they assume that all problems in a sequence share the same input space. Finally, scaling these methods to a large sequence of problems remains a challenge. Modular approaches to deep learning decompose a deep neural network into sub-networks, referred to as modules. Each module can then be trained to perform an atomic transformation, specialised in processing a distinct subset of inputs. This modular approach to storing knowledge makes it easy to only reuse the subset of modules which are useful for the task at hand. This thesis introduces a line of research which demonstrates the merits of a modular approach to lifelong machine learning, and its ability to address the aforementioned shortcomings of other methods. Compared to previous work, we show that a modular approach can be used to achieve more LML properties than previously demonstrated. Furthermore, we develop tools which allow modular LML algorithms to scale in order to retain said properties on longer sequences of problems. First, we introduce HOUDINI, a neurosymbolic framework for modular LML. HOUDINI represents modular deep neural networks as functional programs and accumulates a library of pre-trained modules over a sequence of problems. Given a new problem, we use program synthesis to select a suitable neural architecture, as well as a high-performing combination of pre-trained and new modules. We show that our approach has most of the properties desired from an LML algorithm. Notably, it can perform forward transfer, avoid negative transfer and prevent catastrophic forgetting, even across problems with disparate input domains and problems which require different neural architectures. Second, we produce a modular LML algorithm which retains the properties of HOUDINI but can also scale to longer sequences of problems. To this end, we fix the choice of a neural architecture and introduce a probabilistic search framework, PICLE, for searching through different module combinations. To apply PICLE, we introduce two probabilistic models over neural modules which allows us to efficiently identify promising module combinations. Third, we phrase the search over module combinations in modular LML as black-box optimisation, which allows one to make use of methods from the setting of hyperparameter optimisation (HPO). We then develop a new HPO method which marries a multi-fidelity approach with model-based optimisation. We demonstrate that this leads to improvement in anytime performance in the HPO setting and discuss how this can in turn be used to augment modular LML methods. Overall, this thesis identifies a number of important LML properties, which have not all been attained in past methods, and presents an LML algorithm which can achieve all of them, apart from backward transfer

    Reinforcement learning in large state action spaces

    Get PDF
    Reinforcement learning (RL) is a promising framework for training intelligent agents which learn to optimize long term utility by directly interacting with the environment. Creating RL methods which scale to large state-action spaces is a critical problem towards ensuring real world deployment of RL systems. However, several challenges limit the applicability of RL to large scale settings. These include difficulties with exploration, low sample efficiency, computational intractability, task constraints like decentralization and lack of guarantees about important properties like performance, generalization and robustness in potentially unseen scenarios. This thesis is motivated towards bridging the aforementioned gap. We propose several principled algorithms and frameworks for studying and addressing the above challenges RL. The proposed methods cover a wide range of RL settings (single and multi-agent systems (MAS) with all the variations in the latter, prediction and control, model-based and model-free methods, value-based and policy-based methods). In this work we propose the first results on several different problems: e.g. tensorization of the Bellman equation which allows exponential sample efficiency gains (Chapter 4), provable suboptimality arising from structural constraints in MAS(Chapter 3), combinatorial generalization results in cooperative MAS(Chapter 5), generalization results on observation shifts(Chapter 7), learning deterministic policies in a probabilistic RL framework(Chapter 6). Our algorithms exhibit provably enhanced performance and sample efficiency along with better scalability. Additionally, we also shed light on generalization aspects of the agents under different frameworks. These properties have been been driven by the use of several advanced tools (e.g. statistical machine learning, state abstraction, variational inference, tensor theory). In summary, the contributions in this thesis significantly advance progress towards making RL agents ready for large scale, real world applications

    Physics-informed radial basis network (PIRBN): A local approximation neural network for solving nonlinear PDEs

    Full text link
    Our recent intensive study has found that physics-informed neural networks (PINN) tend to be local approximators after training. This observation leads to this novel physics-informed radial basis network (PIRBN), which can maintain the local property throughout the entire training process. Compared to deep neural networks, a PIRBN comprises of only one hidden layer and a radial basis "activation" function. Under appropriate conditions, we demonstrated that the training of PIRBNs using gradient descendent methods can converge to Gaussian processes. Besides, we studied the training dynamics of PIRBN via the neural tangent kernel (NTK) theory. In addition, comprehensive investigations regarding the initialisation strategies of PIRBN were conducted. Based on numerical examples, PIRBN has been demonstrated to be more effective and efficient than PINN in solving PDEs with high-frequency features and ill-posed computational domains. Moreover, the existing PINN numerical techniques, such as adaptive learning, decomposition and different types of loss functions, are applicable to PIRBN. The programs that can regenerate all numerical results can be found at https://github.com/JinshuaiBai/PIRBN.Comment: 48 pages, 26 figure

    Security and Privacy for Modern Wireless Communication Systems

    Get PDF
    The aim of this reprint focuses on the latest protocol research, software/hardware development and implementation, and system architecture design in addressing emerging security and privacy issues for modern wireless communication networks. Relevant topics include, but are not limited to, the following: deep-learning-based security and privacy design; covert communications; information-theoretical foundations for advanced security and privacy techniques; lightweight cryptography for power constrained networks; physical layer key generation; prototypes and testbeds for security and privacy solutions; encryption and decryption algorithm for low-latency constrained networks; security protocols for modern wireless communication networks; network intrusion detection; physical layer design with security consideration; anonymity in data transmission; vulnerabilities in security and privacy in modern wireless communication networks; challenges of security and privacy in node–edge–cloud computation; security and privacy design for low-power wide-area IoT networks; security and privacy design for vehicle networks; security and privacy design for underwater communications networks

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum

    Stability of Q-Learning Through Design and Optimism

    Full text link
    Q-learning has become an important part of the reinforcement learning toolkit since its introduction in the dissertation of Chris Watkins in the 1980s. The purpose of this paper is in part a tutorial on stochastic approximation and Q-learning, providing details regarding the INFORMS APS inaugural Applied Probability Trust Plenary Lecture, presented in Nancy France, June 2023. The paper also presents new approaches to ensure stability and potentially accelerated convergence for these algorithms, and stochastic approximation in other settings. Two contributions are entirely new: 1. Stability of Q-learning with linear function approximation has been an open topic for research for over three decades. It is shown that with appropriate optimistic training in the form of a modified Gibbs policy, there exists a solution to the projected Bellman equation, and the algorithm is stable (in terms of bounded parameter estimates). Convergence remains one of many open topics for research. 2. The new Zap Zero algorithm is designed to approximate the Newton-Raphson flow without matrix inversion. It is stable and convergent under mild assumptions on the mean flow vector field for the algorithm, and compatible statistical assumption on an underlying Markov chain. The algorithm is a general approach to stochastic approximation which in particular applies to Q-learning with "oblivious" training even with non-linear function approximation.Comment: Companion paper to the INFORMS APS inaugural Applied Probability Trust Plenary Lecture, presented in Nancy France, June 2023. Slides available online, Online, DOI 10.13140/RG.2.2.24897.3312
    • …
    corecore