Search CORE

8,483 research outputs found

Q-PrOP: Sample-efficient policy gradient with an off-policy critic

Author: Ghahramani Z
Gu S
Levine S
Lillicrap T
Turner RE
Publication venue: 5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings
Publication date: 01/01/2017
Field of study

Model-free deep reinforcement learning (RL) methods have been successful in a wide variety of simulated domains. However, a major obstacle facing deep RL in the real world is their high sample complexity. Batch policy gradient methods offer stable learning, but at the cost of high variance, which often requires large batches. TD-style methods, such as off-policy actor-critic and Q-learning, are more sample-efficient but biased, and often require costly hyperparameter sweeps to stabilize. In this work, we aim to develop methods that combine the stability of policy gradients with the efficiency of off-policy RL. We present Q-Prop, a policy gradient method that uses a Taylor expansion of the off-policy critic as a control variate. Q-Prop is both sample efficient and stable, and effectively combines the benefits of on-policy and off-policy methods. We analyze the connection between Q-Prop and existing model-free algorithms, and use control variate theory to derive two variants of Q-Prop with conservative and aggressive adaptation. We show that conservative Q-Prop provides substantial gains in sample efficiency over trust region policy optimization (TRPO) with generalized advantage estimation (GAE), and improves stability over deep deterministic policy gradient (DDPG), the state-of-the-art on-policy and off-policy methods, on OpenAI Gym's MuJoCo continuous control environments

arXiv.org e-Print Archive

Apollo (Cambridge)

MPG.PuRe

Who gets credit for AI-generated art?

Author: Epstein Z.
Levine S.
Rahwan I.
Rand D.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

The recent sale of an artificial intelligence (AI)-generated portrait for $432,000 at Christie's art auction has raised questions about how credit and responsibility should be allocated to individuals involved and how the anthropomorphic perception of the AI system contributed to the artwork's success. Here, we identify natural heterogeneity in the extent to which different people perceive AI as anthropomorphic. We find that differences in the perception of AI anthropomorphicity are associated with different allocations of responsibility to the AI system and credit to different stakeholders involved in art production. We then show that perceptions of AI anthropomorphicity can be manipulated by changing the language used to talk about AI—as a tool versus agent—with consequences for artists and AI practitioners. Our findings shed light on what is at stake when we anthropomorphize AI systems and offer an empirical lens to reason about how to allocate credit and responsibility to human stakeholders

DSpace@MIT

MPG.PuRe

Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning

Author: Ghahramani Z
Gu S
Levine S
Lillicrap T
Schölkopf B
Turner RE
Publication venue: Advances in Neural Information Processing Systems
Publication date: 01/01/2017
Field of study

Off-policy model-free deep reinforcement learning methods using previously collected data can improve sample efficiency over on-policy policy gradient techniques. On the other hand, on-policy algorithms are often more stable and easier to use. This paper examines, both theoretically and empirically, approaches to merging on- and off-policy updates for deep reinforcement learning. Theoretical results show that off-policy updates with a value function estimator can be interpolated with on-policy policy gradient updates whilst still satisfying performance bounds. Our analysis uses control variate methods to produce a family of policy gradient algorithms, with several recently proposed algorithms being special cases of this family. We then provide an empirical comparison of these techniques with the remaining algorithmic details fixed, and show how different mixing of off-policy gradient estimates with on-policy samples contribute to improvements in empirical performance. The final algorithm provides a generalization and unification of existing deep policy gradient techniques, has theoretical guarantees on the bias introduced by off-policy updates, and improves on the state-of-the-art model-free deep RL methods on a number of OpenAI Gym continuous control benchmarks

arXiv.org e-Print Archive

Apollo (Cambridge)

MPG.PuRe

Construction of surrogate CHEmical MEchanisms (SCHEMEs) for atmospheric photochemical systems

Author: Levine S. Z.
Schwartz S. E.
Publication venue: Brookhaven National Laboratory
Publication date: 01/01/1978
Field of study

During the past several years it has become apparent that homogeneous gas-phase reactions of pollutants in the troposphere, i.e., the formation of photochemical smog and the oxidation of SO/sub 2/, occur to a great extent by elementary reactions involving chain carrying free-radicals (HO, HO/sub 2/, RO, RO/sub 2/, RCOO/sub 2/) whose concentrations are governed by the concentrations of trace molecular constituents including NO, NO/sub 2/, CO, O/sub 3/, and organics, as well as sunlight. For the purpose of modeling chemical transformations in the ambient atmosphere, which requires incorporating a reaction mechanism within an atmospheric transport model, it is necessary to develop a mechanism that includes a minimum number of chemical species, since the computational time and cost involved in solving the set of partial differential equations describing the diffusion-advection-reaction problem increases dramatically with the number of species modeled. Although photochemicl mechanisms employing fewer than 15 species have been developed previously for use within urban airshed models, those reduced, or surrogate, mechanisms do not include sulfur chemistry and do not appear applicable to the more widely varying conditions possible as gases become chemically depleted while being transported away from emission sources. Therefore, in order to meet the time and cost constraints of an atmospheric transport model, we have constructed a 12-species Surrogate CHEmical MEchanism (SCHEME) incorporating reactions for the homogeneous gas-phase oxidation of SO/sub 2/. A preliminary but much more detailed and comprehensive ATmospheric Model for Sulfur (ATMOS) has been used to generate SCHEME and test its applicability to a broad range of chemical conditions

Crossref

UNT Digital Library

Density-Polarization Functional Theory of the response of a periodic insulating solid to an electric field.

Author: A. Dal Corso
E. A. Hylleraas
L. D. Landau
L. J. Sham
P. Hohenberg
Ph. Ghosez
R. D. King-Smith
R. Martin
R. O. Jones
R. Resta
R. W. Godby
R. W. Godby
R. W. Godby
R. W. Nunes
S. Baroni
W. E. Pickett
W. Kohn
X. Gonze
Z. H. Levine
Z. H. Levine
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/1995
Field of study

The response of an infinite, periodic, insulating, solid to an infinitesimally small electric field is investigated in the framework of Density Functional Theory. We find that the applied perturbing potential is not a unique functional of the periodic density change~: it depends also on the change in the macroscopic {\em polarization}. Moreover, the dependence of the exchange-correlation energy on polarization induces an exchange-correlation electric field. These effects are exhibited for a model semiconductor. We also show that the scissor-operator technique is an approximate way of bypassing this polarization dependence.Comment: 11 pages, 1 Fig

arXiv.org e-Print Archive

CiteSeerX

Crossref

Open Repository and Bibliography - Liège

DIAL UCLouvain

White Rose Research Online

Zero-range process with open boundaries

Author: A. Masi De
A.B. Kolomeisky
A.M. Povolotsky
B. Derrida
C. Godrèche
C. Kipnis
D. Mukamel
E. Levine
E. Levine
E. Levine
E.D. Andjel
G. Bianconi
G. M. Schütz
G. Schönherr
G.M. Schütz
G.M. Schütz
G.M. Shim
H. Fröhlich
H. Spohn
I. Jeon
I. Jeon
J. Krug
L. Bertini
M. Alimohammadi
M.R. Evans
M.R. Evans
O.J. O’Loan
S. Grosskinsky
S. Grosskinsky
S. Katz
S.N. Dorogovtsev
S.N. Majumdar
T. Antal
V.B. Priezzhev
Y. Kafri
Z. Burda
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

We calculate the exact stationary distribution of the one-dimensional zero-range process with open boundaries for arbitrary bulk and boundary hopping rates. When such a distribution exists, the steady state has no correlations between sites and is uniquely characterized by a space-dependent fugacity which is a function of the boundary rates and the hopping asymmetry. For strong boundary drive the system has no stationary distribution. In systems which on a ring geometry allow for a condensation transition, a condensate develops at one or both boundary sites. On all other sites the particle distribution approaches a product measure with the finite critical density \rho_c. In systems which do not support condensation on a ring, strong boundary drive leads to a condensate at the boundary. However, in this case the local particle density in the interior exhibits a complex algebraic growth in time. We calculate the bulk and boundary growth exponents as a function of the system parameters

arXiv.org e-Print Archive

CiteSeerX

Crossref

Juelich Shared Electronic Resources

Second harmonic generation in SiC polytypes

LMTO calculations are presented for the frequency dependent second harmonic generation (SHG) in the polytypes 2H, 4H, 6H, 15R and 3C of SiC. All independent tensor components are calculated. The spectral features and the ratios of the 333 to 311 tensorial components are studied as a function of the degree of hexagonality. The relationship to the linear optical response and the underlying band structure are investigated. SHG is suggested to be a sensitive tool for investigating the near band edge interband excitations.Comment: 12 pages, 10 figure

arXiv.org e-Print Archive

Crossref

The TWA 3 Young Triple System: Orbits, Disks, Evolution

Author: Avilez I.
Bailey Vanessa
Bonanos Alceste Z.
Bosh A. S.
Close Laird
Guenther E. W.
Hinz Phil
Kellogg Kendra
Levine S. E.
Males Jared R.
Morzinski Katie M.
Neuhäuser R.
Prato L.
Ruíz-Rodríguez D.
Schaefer G. H.
Torres Guillermo
Wasserman L. H.
Publication venue: 'American Astronomical Society'
Publication date: 03/07/2017
Field of study

We have characterized the spectroscopic orbit of the TWA 3A binary and provide preliminary families of probable solutions for the TWA 3A visual orbit as well as for the wide TWA 3A--B orbit. TWA 3 is a hierarchical triple located at 34 pc in the

\sim

10 Myr old TW Hya association. The wide component separation is 1."55; the close pair was first identified as a possible binary almost 20 years ago. We initially identified the 35-day period orbital solution using high-resolution infrared spectroscopy which angularly resolved the A and B components. We then refined the preliminary orbit by combining the infrared data with a re-analysis of our high-resolution optical spectroscopy. The orbital period from the combined spectroscopic solution is

\sim

35 days, the eccentricity is

\sim

0.63, and the mass ratio is

\sim

0.84; although this high mass ratio would suggest that optical spectroscopy alone should be sufficient to identify the orbital solution, the presence of the tertiary B component likely introduced confusion in the blended optical spectra. Using millimeter imaging from the literature, we also estimate the inclinations of the stellar orbital planes with respect to the TWA 3A circumbinary disk inclination and find that all three planes are likely misaligned by at least

\sim

30 degrees. The TWA 3A spectroscopic binary components have spectral types of M4.0 and M4.5; TWA 3B is an M3. We speculate that the system formed as a triple, is bound, and that its properties were shaped by dynamical interactions between the inclined orbits and disk.Comment: Accepted to Ap

arXiv.org e-Print Archive

DSpace@MIT

Crossref

The University of Arizona

Power Law Distribution of Wealth in a Money-Based Model

Author: A. B. Atkinson
B. B. Mandelbrot
D. G. Champernowne
E. Levine
E. W. Montroll
G. K. Zipf
H. A. Simon
P. W. Anderson
S. Solomon
U. G. Yule
V. Pareto
Y. B. Xie
Z. A. Melzak
Publication venue: 'American Physical Society (APS)'
Publication date: 13/05/2004
Field of study

A money-based model for the power law distribution (PLD) of wealth in an economically interacting population is introduced. The basic feature of our model is concentrating on the capital movements and avoiding the complexity of micro behaviors of individuals. It is proposed as an extension of the Equiluz and Zimmermann's (EZ) model for crowding and information transmission in financial markets. Still, we must emphasize that in EZ model the PLD without exponential correction is obtained only for a particular parameter, while our pattern will give it within a wide range. The Zipf exponent depends on the parameters in a nontrivial way and is exactly calculated in this paper.Comment: 5 pages and 4 figure

arXiv.org e-Print Archive

Crossref

The Effective Particle-Hole Interaction and the Optical Response of Simple Metal Clusters

Author: A. A. Quong
A. Zangwill
C. Horie
C. Yannouleas
C. Yannouleas
G. D. Mahan
I. Egri
J. M. Pacheco
J. M. Pacheco
L. Hedin
L. Hedin
L. J. Sham
M. Madjet
M. S. Hybertsen
P. Nozie`res
S. Saito
W. Ekardt
W. Ekardt
W. Ekardt
Z. Levine
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/1995
Field of study

Following Sham and Rice [L. J. Sham, T. M. Rice, Phys. Rev. 144 (1966) 708] the correlated motion of particle-hole pairs is studied, starting from the general two-particle Greens function. In this way we derive a matrix equation for eigenvalues and wave functions, respectively, of the general type of collective excitation of a N-particle system. The interplay between excitons and plasmons is fully described by this new set of equations. As a by-product we obtain - at least a-posteriori - a justification for the use of the TDLDA for simple-metal clusters.Comment: RevTeX, 15 pages, 5 figures in uufiles format, 1 figure avaible from [email protected]

arXiv.org e-Print Archive

Crossref

MPG.PuRe