936 research outputs found

    Clipped-Objective Policy Gradients for Pessimistic Policy Optimization

    Full text link
    To facilitate efficient learning, policy gradient approaches to deep reinforcement learning (RL) are typically paired with variance reduction measures and strategies for making large but safe policy changes based on a batch of experiences. Natural policy gradient methods, including Trust Region Policy Optimization (TRPO), seek to produce monotonic improvement through bounded changes in policy outputs. Proximal Policy Optimization (PPO) is a commonly used, first-order algorithm that instead uses loss clipping to take multiple safe optimization steps per batch of data, replacing the bound on the single step of TRPO with regularization on multiple steps. In this work, we find that the performance of PPO, when applied to continuous action spaces, may be consistently improved through a simple change in objective. Instead of the importance sampling objective of PPO, we instead recommend a basic policy gradient, clipped in an equivalent fashion. While both objectives produce biased gradient estimates with respect to the RL objective, they also both display significantly reduced variance compared to the unbiased off-policy policy gradient. Additionally, we show that (1) the clipped-objective policy gradient (COPG) objective is on average "pessimistic" compared to both the PPO objective and (2) this pessimism promotes enhanced exploration. As a result, we empirically observe that COPG produces improved learning compared to PPO in single-task, constrained, and multi-task learning, without adding significant computational cost or complexity. Compared to TRPO, the COPG approach is seen to offer comparable or superior performance, while retaining the simplicity of a first-order method.Comment: 12 pages, 8 figure

    A data-driven neuromuscular model of walking and its application to prosthesis control

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Physics, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 119-123).In this thesis we present a data-driven neuromuscular model of human walking and its application to prosthesis control. The model is novel in that it leverages tendon elasticity to more accurately predict the metabolic consumption of walking than conventional models. Paired with a reflex-based neural drive the model has been applied in the control of a robotic ankle-foot prosthesis, producing speed adaptive behavior. Current neuromuscular models significantly overestimate the metabolic demands of walking. We believe this is because they do not adequately consider the role of elasticity; specifically the parameters that govern the force-length relations of tendons in these models are typically taken from published values determined from cadaver studies. To investigate this issue we first collected kinematic, kinetic, electromyographic (EMG), and metabolic data from five subjects walking at six different speeds. The kinematic and kinetic data were used to estimate muscle lengths, muscle moment arms, and joint moments while the EMG data were used to estimate muscle activations. For each subject we performed a kinematically clamped optimization, varying the parameters that govern the force-length curve of each tendon while simultaneously seeking to minimize metabolic cost and maximize agreement with the observed joint moments. We found a family of parameter sets that excel at both objectives, providing agreement with both the collected kinetic and metabolic data. This identification allows us to accurately predict the metabolic cost of walking as well as the force and state of individual muscles, lending insight into the roles and control objectives of different muscles throughout the gait cycle. This optimized muscle-tendon morphology was then applied with an optimized linear reflex architecture in the control of a powered ankle-foot prosthesis. Specifically, the model was fed the robot's angle and state and used to command output torque. Clinical trials were conducted that demonstrated speed adaptive behavior; commanded net work was seen to increase with walking speed. This result supports both the efficacy of the modeling approach and its potential utility in controlling life-like prosthetic limbs.by Jared Markowitz.Ph.D

    Handling Cost and Constraints with Off-Policy Deep Reinforcement Learning

    Full text link
    By reusing data throughout training, off-policy deep reinforcement learning algorithms offer improved sample efficiency relative to on-policy approaches. For continuous action spaces, the most popular methods for off-policy learning include policy improvement steps where a learned state-action (QQ) value function is maximized over selected batches of data. These updates are often paired with regularization to combat associated overestimation of QQ values. With an eye toward safety, we revisit this strategy in environments with "mixed-sign" reward functions; that is, with reward functions that include independent positive (incentive) and negative (cost) terms. This setting is common in real-world applications, and may be addressed with or without constraints on the cost terms. We find the combination of function approximation and a term that maximizes QQ in the policy update to be problematic in such environments, because systematic errors in value estimation impact the contributions from the competing terms asymmetrically. This results in overemphasis of either incentives or costs and may severely limit learning. We explore two remedies to this issue. First, consistent with prior work, we find that periodic resetting of QQ and policy networks can be used to reduce value estimation error and improve learning in this setting. Second, we formulate novel off-policy actor-critic methods for both unconstrained and constrained learning that do not explicitly maximize QQ in the policy update. We find that this second approach, when applied to continuous action spaces with mixed-sign rewards, consistently and significantly outperforms state-of-the-art methods augmented by resetting. We further find that our approach produces agents that are both competitive with popular methods overall and more reliably competent on frequently-studied control problems that do not have mixed-sign rewards.Comment: 22 pages, 16 figure

    Learning a Group-Aware Policy for Robot Navigation

    Full text link
    Human-aware robot navigation promises a range of applications in which mobile robots bring versatile assistance to people in common human environments. While prior research has mostly focused on modeling pedestrians as independent, intentional individuals, people move in groups; consequently, it is imperative for mobile robots to respect human groups when navigating around people. This paper explores learning group-aware navigation policies based on dynamic group formation using deep reinforcement learning. Through simulation experiments, we show that group-aware policies, compared to baseline policies that neglect human groups, achieve greater robot navigation performance (e.g., fewer collisions), minimize violation of social norms and discomfort, and reduce the robot's movement impact on pedestrians. Our results contribute to the development of social navigation and the integration of mobile robots into human environments.Comment: 8 pages, 4 figure

    Observation of a kilogram-scale oscillator near its quantum ground state

    Get PDF
    We introduce a novel cooling technique capable of approaching the quantum ground state of a kilogram-scale system—an interferometric gravitational wave detector. The detectors of the Laser Interferometer Gravitational-wave Observatory (LIGO) operate within a factor of 10 of the standard quantum limit (SQL), providing a displacement sensitivity of 10[superscript −18] m in a 100 Hz band centered on 150 Hz. With a new feedback strategy, we dynamically shift the resonant frequency of a 2.7 kg pendulum mode to lie within this optimal band, where its effective temperature falls as low as 1.4 μK, and its occupation number reaches about 200 quanta. This work shows how the exquisite sensitivity necessary to detect gravitational waves can be made available to probe the validity of quantum mechanics on an enormous mass scale.Alfred P. Sloan FoundationUnited States. National Aeronautics and Space AdministrationDavid & Lucile Packard FoundationResearch CorporationNational Science Foundation (U.S.

    A Platform-Independent Method for Detecting Errors in Metagenomic Sequencing Data: DRISEE

    Get PDF
    We provide a novel method, DRISEE (duplicate read inferred sequencing error estimation), to assess sequencing quality (alternatively referred to as “noise” or “error”) within and/or between sequencing samples. DRISEE provides positional error estimates that can be used to inform read trimming within a sample. It also provides global (whole sample) error estimates that can be used to identify samples with high or varying levels of sequencing error that may confound downstream analyses, particularly in the case of studies that utilize data from multiple sequencing samples. For shotgun metagenomic data, we believe that DRISEE provides estimates of sequencing error that are more accurate and less constrained by technical limitations than existing methods that rely on reference genomes or the use of scores (e.g. Phred). Here, DRISEE is applied to (non amplicon) data sets from both the 454 and Illumina platforms. The DRISEE error estimate is obtained by analyzing sets of artifactual duplicate reads (ADRs), a known by-product of both sequencing platforms. We present DRISEE as an open-source, platform-independent method to assess sequencing error in shotgun metagenomic data, and utilize it to discover previously uncharacterized error in de novo sequence data from the 454 and Illumina sequencing platforms

    Estimating the Impact of Plasma HIV-1 RNA Reductions on Heterosexual HIV-1 Transmission Risk

    Get PDF
    Background: The risk of sexual transmission of HIV-1 is strongly associated with the level of HIV-1 RNA in plasma making reduction in HIV-1 plasma levels an important target for HIV-1 prevention interventions. A quantitative understanding of the relationship of plasma HIV-1 RNA and HIV-1 transmission risk could help predict the impact of candidate HIV-1 prevention interventions that operate by reducing plasma HIV-1 levels, such as antiretroviral therapy (ART), therapeutic vaccines, and other non-ART interventions. Methodology/Principal Findings: We use prospective data collected from 2004 to 2008 in East and Southern African HIV-1 serodiscordant couples to model the relationship of plasma HIV-1 RNA levels and heterosexual transmission risk with confirmation of HIV-1 transmission events by HIV-1 sequencing. The model is based on follow-up of 3381 HIV-1 serodiscordant couples over 5017 person-years encompassing 108 genetically-linked HIV-1 transmission events. HIV-1 transmission risk was 2.27 per 100 person-years with a log-linear relationship to log10 plasma HIV-1 RNA. The model predicts that a decrease in average plasma HIV-1 RNA of 0.74 log10 copies/mL (95% CI 0.60 to 0.97) reduces heterosexual transmission risk by 50%, regardless of the average starting plasma HIV-1 level in the population and independent of other HIV-1-related population characteristics. In a simulated population with a similar plasma HIV-1 RNA distribution the model estimates that 90% of overall HIV-1 infections averted by a 0.74 copies/mL reduction in plasma HIV-1 RNA could be achieved by targeting this reduction to the 58% of the cohort with plasma HIV-1 levels ≥4 log10 copies/mL. Conclusions/Significance: This log-linear model of plasma HIV-1 levels and risk of sexual HIV-1 transmission may help estimate the impact on HIV-1 transmission and infections averted from candidate interventions that reduce plasma HIV-1 RNA levels

    Differential cross section measurements for the production of a W boson in association with jets in proton–proton collisions at √s = 7 TeV

    Get PDF
    Measurements are reported of differential cross sections for the production of a W boson, which decays into a muon and a neutrino, in association with jets, as a function of several variables, including the transverse momenta (pT) and pseudorapidities of the four leading jets, the scalar sum of jet transverse momenta (HT), and the difference in azimuthal angle between the directions of each jet and the muon. The data sample of pp collisions at a centre-of-mass energy of 7 TeV was collected with the CMS detector at the LHC and corresponds to an integrated luminosity of 5.0 fb[superscript −1]. The measured cross sections are compared to predictions from Monte Carlo generators, MadGraph + pythia and sherpa, and to next-to-leading-order calculations from BlackHat + sherpa. The differential cross sections are found to be in agreement with the predictions, apart from the pT distributions of the leading jets at high pT values, the distributions of the HT at high-HT and low jet multiplicity, and the distribution of the difference in azimuthal angle between the leading jet and the muon at low values.United States. Dept. of EnergyNational Science Foundation (U.S.)Alfred P. Sloan Foundatio
    corecore