2,233 research outputs found

    Efficient Deep Reinforcement Learning via Planning, Generalization, and Improved Exploration

    Full text link
    Reinforcement learning (RL) is a general-purpose machine learning framework, which considers an agent that makes sequential decisions in an environment to maximize its reward. Deep reinforcement learning (DRL) approaches use deep neural networks as non-linear function approximators that parameterize policies or value functions directly from raw observations in RL. Although DRL approaches have been shown to be successful on many challenging RL benchmarks, much of the prior work has mainly focused on learning a single task in a model-free setting, which is often sample-inefficient. On the other hand, humans have abilities to acquire knowledge by learning a model of the world in an unsupervised fashion, use such knowledge to plan ahead for decision making, transfer knowledge between many tasks, and generalize to previously unseen circumstances from the pre-learned knowledge. Developing such abilities are some of the fundamental challenges for building RL agents that can learn as efficiently as humans. As a step towards developing the aforementioned capabilities in RL, this thesis develops new DRL techniques to address three important challenges in RL: 1) planning via prediction, 2) rapidly generalizing to new environments and tasks, and 3) efficient exploration in complex environments. The first part of the thesis discusses how to learn a dynamics model of the environment using deep neural networks and how to use such a model for planning in complex domains where observations are high-dimensional. Specifically, we present neural network architectures for action-conditional video prediction and demonstrate improved exploration in RL. In addition, we present a neural network architecture that performs lookahead planning by predicting the future only in terms of rewards and values without predicting observations. We then discuss why this approach is beneficial compared to conventional model-based planning approaches. The second part of the thesis considers generalization to unseen environments and tasks. We first introduce a set of cognitive tasks in a 3D environment and present memory-based DRL architectures that generalize better to previously unseen 3D environments compared to existing baselines. In addition, we introduce a new multi-task RL problem where the agent should learn to execute different tasks depending on given instructions and generalize to new instructions in a zero-shot fashion. We present a new hierarchical DRL architecture that learns to generalize over previously unseen task descriptions with minimal prior knowledge. The third part of the thesis discusses how exploiting past experiences can indirectly drive deep exploration and improve sample-efficiency. In particular, we propose a new off-policy learning algorithm, called self-imitation learning, which learns a policy to reproduce past good experiences. We empirically show that self-imitation learning indirectly encourages the agent to explore reasonably good state spaces and thus significantly improves sample-efficiency on RL domains where exploration is challenging. Overall, the main contribution of this thesis are to explore several fundamental challenges in RL in the context of DRL and develop new DRL architectures and algorithms to address such challenges. This allows us to understand how deep learning can be used to improve sample efficiency, and thus come closer to human-like learning abilities.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145829/1/junhyuk_1.pd

    Optimal control and approximations

    Get PDF

    Research reports: 1990 NASA/ASEE Summer Faculty Fellowship Program

    Get PDF
    Reports on the research projects performed under the NASA/ASEE Summer Faculty Fellowship Program are presented. The program was conducted by The University of Alabama and MSFC during the period from June 4, 1990 through August 10, 1990. Some of the topics covered include: (1) Space Shuttles; (2) Space Station Freedom; (3) information systems; (4) materials and processes; (4) Space Shuttle main engine; (5) aerospace sciences; (6) mathematical models; (7) mission operations; (8) systems analysis and integration; (9) systems control; (10) structures and dynamics; (11) aerospace safety; and (12) remote sensin

    Active SLAM: A Review On Last Decade

    Full text link
    This article presents a comprehensive review of the Active Simultaneous Localization and Mapping (A-SLAM) research conducted over the past decade. It explores the formulation, applications, and methodologies employed in A-SLAM, particularly in trajectory generation and control-action selection, drawing on concepts from Information Theory (IT) and the Theory of Optimal Experimental Design (TOED). This review includes both qualitative and quantitative analyses of various approaches, deployment scenarios, configurations, path-planning methods, and utility functions within A-SLAM research. Furthermore, this article introduces a novel analysis of Active Collaborative SLAM (AC-SLAM), focusing on collaborative aspects within SLAM systems. It includes a thorough examination of collaborative parameters and approaches, supported by both qualitative and statistical assessments. This study also identifies limitations in the existing literature and suggests potential avenues for future research. This survey serves as a valuable resource for researchers seeking insights into A-SLAM methods and techniques, offering a current overview of A-SLAM formulation.Comment: 34 pages, 8 figures, 6 table

    Adaptive Sampling with Mobile Sensor Networks

    Get PDF
    Mobile sensor networks have unique advantages compared with wireless sensor networks. The mobility enables mobile sensors to flexibly reconfigure themselves to meet sensing requirements. In this dissertation, an adaptive sampling method for mobile sensor networks is presented. Based on the consideration of sensing resource constraints, computing abilities, and onboard energy limitations, the adaptive sampling method follows a down sampling scheme, which could reduce the total number of measurements, and lower sampling cost. Compressive sensing is a recently developed down sampling method, using a small number of randomly distributed measurements for signal reconstruction. However, original signals cannot be reconstructed using condensed measurements, as addressed by Shannon Sampling Theory. Measurements have to be processed under a sparse domain, and convex optimization methods should be applied to reconstruct original signals. Restricted isometry property would guarantee signals can be recovered with little information loss. While compressive sensing could effectively lower sampling cost, signal reconstruction is still a great research challenge. Compressive sensing always collects random measurements, whose information amount cannot be determined in prior. If each measurement is optimized as the most informative measurement, the reconstruction performance can perform much better. Based on the above consideration, this dissertation is focusing on an adaptive sampling approach, which could find the most informative measurements in unknown environments and reconstruct original signals. With mobile sensors, measurements are collect sequentially, giving the chance to uniquely optimize each of them. When mobile sensors are about to collect a new measurement from the surrounding environments, existing information is shared among networked sensors so that each sensor would have a global view of the entire environment. Shared information is analyzed under Haar Wavelet domain, under which most nature signals appear sparse, to infer a model of the environments. The most informative measurements can be determined by optimizing model parameters. As a result, all the measurements collected by the mobile sensor network are the most informative measurements given existing information, and a perfect reconstruction would be expected. To present the adaptive sampling method, a series of research issues will be addressed, including measurement evaluation and collection, mobile network establishment, data fusion, sensor motion, signal reconstruction, etc. Two dimensional scalar field will be reconstructed using the method proposed. Both single mobile sensors and mobile sensor networks will be deployed in the environment, and reconstruction performance of both will be compared.In addition, a particular mobile sensor, a quadrotor UAV is developed, so that the adaptive sampling method can be used in three dimensional scenarios
    • …
    corecore