8,472 research outputs found

    Q-PrOP: Sample-efficient policy gradient with an off-policy critic

    Get PDF
    Model-free deep reinforcement learning (RL) methods have been successful in a wide variety of simulated domains. However, a major obstacle facing deep RL in the real world is their high sample complexity. Batch policy gradient methods offer stable learning, but at the cost of high variance, which often requires large batches. TD-style methods, such as off-policy actor-critic and Q-learning, are more sample-efficient but biased, and often require costly hyperparameter sweeps to stabilize. In this work, we aim to develop methods that combine the stability of policy gradients with the efficiency of off-policy RL. We present Q-Prop, a policy gradient method that uses a Taylor expansion of the off-policy critic as a control variate. Q-Prop is both sample efficient and stable, and effectively combines the benefits of on-policy and off-policy methods. We analyze the connection between Q-Prop and existing model-free algorithms, and use control variate theory to derive two variants of Q-Prop with conservative and aggressive adaptation. We show that conservative Q-Prop provides substantial gains in sample efficiency over trust region policy optimization (TRPO) with generalized advantage estimation (GAE), and improves stability over deep deterministic policy gradient (DDPG), the state-of-the-art on-policy and off-policy methods, on OpenAI Gym's MuJoCo continuous control environments

    Who gets credit for AI-generated art?

    No full text
    The recent sale of an artificial intelligence (AI)-generated portrait for $432,000 at Christie's art auction has raised questions about how credit and responsibility should be allocated to individuals involved and how the anthropomorphic perception of the AI system contributed to the artwork's success. Here, we identify natural heterogeneity in the extent to which different people perceive AI as anthropomorphic. We find that differences in the perception of AI anthropomorphicity are associated with different allocations of responsibility to the AI system and credit to different stakeholders involved in art production. We then show that perceptions of AI anthropomorphicity can be manipulated by changing the language used to talk about AI—as a tool versus agent—with consequences for artists and AI practitioners. Our findings shed light on what is at stake when we anthropomorphize AI systems and offer an empirical lens to reason about how to allocate credit and responsibility to human stakeholders

    Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning

    Get PDF
    Off-policy model-free deep reinforcement learning methods using previously collected data can improve sample efficiency over on-policy policy gradient techniques. On the other hand, on-policy algorithms are often more stable and easier to use. This paper examines, both theoretically and empirically, approaches to merging on- and off-policy updates for deep reinforcement learning. Theoretical results show that off-policy updates with a value function estimator can be interpolated with on-policy policy gradient updates whilst still satisfying performance bounds. Our analysis uses control variate methods to produce a family of policy gradient algorithms, with several recently proposed algorithms being special cases of this family. We then provide an empirical comparison of these techniques with the remaining algorithmic details fixed, and show how different mixing of off-policy gradient estimates with on-policy samples contribute to improvements in empirical performance. The final algorithm provides a generalization and unification of existing deep policy gradient techniques, has theoretical guarantees on the bias introduced by off-policy updates, and improves on the state-of-the-art model-free deep RL methods on a number of OpenAI Gym continuous control benchmarks

    Construction of surrogate CHEmical MEchanisms (SCHEMEs) for atmospheric photochemical systems

    Get PDF
    During the past several years it has become apparent that homogeneous gas-phase reactions of pollutants in the troposphere, i.e., the formation of photochemical smog and the oxidation of SO/sub 2/, occur to a great extent by elementary reactions involving chain carrying free-radicals (HO, HO/sub 2/, RO, RO/sub 2/, RCOO/sub 2/) whose concentrations are governed by the concentrations of trace molecular constituents including NO, NO/sub 2/, CO, O/sub 3/, and organics, as well as sunlight. For the purpose of modeling chemical transformations in the ambient atmosphere, which requires incorporating a reaction mechanism within an atmospheric transport model, it is necessary to develop a mechanism that includes a minimum number of chemical species, since the computational time and cost involved in solving the set of partial differential equations describing the diffusion-advection-reaction problem increases dramatically with the number of species modeled. Although photochemicl mechanisms employing fewer than 15 species have been developed previously for use within urban airshed models, those reduced, or surrogate, mechanisms do not include sulfur chemistry and do not appear applicable to the more widely varying conditions possible as gases become chemically depleted while being transported away from emission sources. Therefore, in order to meet the time and cost constraints of an atmospheric transport model, we have constructed a 12-species Surrogate CHEmical MEchanism (SCHEME) incorporating reactions for the homogeneous gas-phase oxidation of SO/sub 2/. A preliminary but much more detailed and comprehensive ATmospheric Model for Sulfur (ATMOS) has been used to generate SCHEME and test its applicability to a broad range of chemical conditions

    Density-Polarization Functional Theory of the response of a periodic insulating solid to an electric field.

    Get PDF
    The response of an infinite, periodic, insulating, solid to an infinitesimally small electric field is investigated in the framework of Density Functional Theory. We find that the applied perturbing potential is not a unique functional of the periodic density change~: it depends also on the change in the macroscopic {\em polarization}. Moreover, the dependence of the exchange-correlation energy on polarization induces an exchange-correlation electric field. These effects are exhibited for a model semiconductor. We also show that the scissor-operator technique is an approximate way of bypassing this polarization dependence.Comment: 11 pages, 1 Fig

    Zero-range process with open boundaries

    Full text link
    We calculate the exact stationary distribution of the one-dimensional zero-range process with open boundaries for arbitrary bulk and boundary hopping rates. When such a distribution exists, the steady state has no correlations between sites and is uniquely characterized by a space-dependent fugacity which is a function of the boundary rates and the hopping asymmetry. For strong boundary drive the system has no stationary distribution. In systems which on a ring geometry allow for a condensation transition, a condensate develops at one or both boundary sites. On all other sites the particle distribution approaches a product measure with the finite critical density \rho_c. In systems which do not support condensation on a ring, strong boundary drive leads to a condensate at the boundary. However, in this case the local particle density in the interior exhibits a complex algebraic growth in time. We calculate the bulk and boundary growth exponents as a function of the system parameters

    Second harmonic generation in SiC polytypes

    Full text link
    LMTO calculations are presented for the frequency dependent second harmonic generation (SHG) in the polytypes 2H, 4H, 6H, 15R and 3C of SiC. All independent tensor components are calculated. The spectral features and the ratios of the 333 to 311 tensorial components are studied as a function of the degree of hexagonality. The relationship to the linear optical response and the underlying band structure are investigated. SHG is suggested to be a sensitive tool for investigating the near band edge interband excitations.Comment: 12 pages, 10 figure

    The TWA 3 Young Triple System: Orbits, Disks, Evolution

    Get PDF
    We have characterized the spectroscopic orbit of the TWA 3A binary and provide preliminary families of probable solutions for the TWA 3A visual orbit as well as for the wide TWA 3A--B orbit. TWA 3 is a hierarchical triple located at 34 pc in the \sim10 Myr old TW Hya association. The wide component separation is 1."55; the close pair was first identified as a possible binary almost 20 years ago. We initially identified the 35-day period orbital solution using high-resolution infrared spectroscopy which angularly resolved the A and B components. We then refined the preliminary orbit by combining the infrared data with a re-analysis of our high-resolution optical spectroscopy. The orbital period from the combined spectroscopic solution is \sim35 days, the eccentricity is \sim0.63, and the mass ratio is \sim0.84; although this high mass ratio would suggest that optical spectroscopy alone should be sufficient to identify the orbital solution, the presence of the tertiary B component likely introduced confusion in the blended optical spectra. Using millimeter imaging from the literature, we also estimate the inclinations of the stellar orbital planes with respect to the TWA 3A circumbinary disk inclination and find that all three planes are likely misaligned by at least \sim30 degrees. The TWA 3A spectroscopic binary components have spectral types of M4.0 and M4.5; TWA 3B is an M3. We speculate that the system formed as a triple, is bound, and that its properties were shaped by dynamical interactions between the inclined orbits and disk.Comment: Accepted to Ap

    Power Law Distribution of Wealth in a Money-Based Model

    Full text link
    A money-based model for the power law distribution (PLD) of wealth in an economically interacting population is introduced. The basic feature of our model is concentrating on the capital movements and avoiding the complexity of micro behaviors of individuals. It is proposed as an extension of the Equiluz and Zimmermann's (EZ) model for crowding and information transmission in financial markets. Still, we must emphasize that in EZ model the PLD without exponential correction is obtained only for a particular parameter, while our pattern will give it within a wide range. The Zipf exponent depends on the parameters in a nontrivial way and is exactly calculated in this paper.Comment: 5 pages and 4 figure

    The Effective Particle-Hole Interaction and the Optical Response of Simple Metal Clusters

    Full text link
    Following Sham and Rice [L. J. Sham, T. M. Rice, Phys. Rev. 144 (1966) 708] the correlated motion of particle-hole pairs is studied, starting from the general two-particle Greens function. In this way we derive a matrix equation for eigenvalues and wave functions, respectively, of the general type of collective excitation of a N-particle system. The interplay between excitons and plasmons is fully described by this new set of equations. As a by-product we obtain - at least a-posteriori - a justification for the use of the TDLDA for simple-metal clusters.Comment: RevTeX, 15 pages, 5 figures in uufiles format, 1 figure avaible from [email protected]
    corecore