8,472 research outputs found
Q-PrOP: Sample-efficient policy gradient with an off-policy critic
Model-free deep reinforcement learning (RL) methods have been successful in a
wide variety of simulated domains. However, a major obstacle facing deep RL in
the real world is their high sample complexity. Batch policy gradient methods
offer stable learning, but at the cost of high variance, which often requires
large batches. TD-style methods, such as off-policy actor-critic and
Q-learning, are more sample-efficient but biased, and often require costly
hyperparameter sweeps to stabilize. In this work, we aim to develop methods
that combine the stability of policy gradients with the efficiency of
off-policy RL. We present Q-Prop, a policy gradient method that uses a Taylor
expansion of the off-policy critic as a control variate. Q-Prop is both sample
efficient and stable, and effectively combines the benefits of on-policy and
off-policy methods. We analyze the connection between Q-Prop and existing
model-free algorithms, and use control variate theory to derive two variants of
Q-Prop with conservative and aggressive adaptation. We show that conservative
Q-Prop provides substantial gains in sample efficiency over trust region policy
optimization (TRPO) with generalized advantage estimation (GAE), and improves
stability over deep deterministic policy gradient (DDPG), the state-of-the-art
on-policy and off-policy methods, on OpenAI Gym's MuJoCo continuous control
environments
Who gets credit for AI-generated art?
The recent sale of an artificial intelligence (AI)-generated portrait for $432,000 at Christie's art auction has raised questions about how credit and responsibility should be allocated to individuals involved and how the anthropomorphic perception of the AI system contributed to the artwork's success. Here, we identify natural heterogeneity in the extent to which different people perceive AI as anthropomorphic. We find that differences in the perception of AI anthropomorphicity are associated with different allocations of responsibility to the AI system and credit to different stakeholders involved in art production. We then show that perceptions of AI anthropomorphicity can be manipulated by changing the language used to talk about AI—as a tool versus agent—with consequences for artists and AI practitioners. Our findings shed light on what is at stake when we anthropomorphize AI systems and offer an empirical lens to reason about how to allocate credit and responsibility to human stakeholders
Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning
Off-policy model-free deep reinforcement learning methods using previously
collected data can improve sample efficiency over on-policy policy gradient
techniques. On the other hand, on-policy algorithms are often more stable and
easier to use. This paper examines, both theoretically and empirically,
approaches to merging on- and off-policy updates for deep reinforcement
learning. Theoretical results show that off-policy updates with a value
function estimator can be interpolated with on-policy policy gradient updates
whilst still satisfying performance bounds. Our analysis uses control variate
methods to produce a family of policy gradient algorithms, with several
recently proposed algorithms being special cases of this family. We then
provide an empirical comparison of these techniques with the remaining
algorithmic details fixed, and show how different mixing of off-policy gradient
estimates with on-policy samples contribute to improvements in empirical
performance. The final algorithm provides a generalization and unification of
existing deep policy gradient techniques, has theoretical guarantees on the
bias introduced by off-policy updates, and improves on the state-of-the-art
model-free deep RL methods on a number of OpenAI Gym continuous control
benchmarks
Construction of surrogate CHEmical MEchanisms (SCHEMEs) for atmospheric photochemical systems
During the past several years it has become apparent that homogeneous gas-phase reactions of pollutants in the troposphere, i.e., the formation of photochemical smog and the oxidation of SO/sub 2/, occur to a great extent by elementary reactions involving chain carrying free-radicals (HO, HO/sub 2/, RO, RO/sub 2/, RCOO/sub 2/) whose concentrations are governed by the concentrations of trace molecular constituents including NO, NO/sub 2/, CO, O/sub 3/, and organics, as well as sunlight. For the purpose of modeling chemical transformations in the ambient atmosphere, which requires incorporating a reaction mechanism within an atmospheric transport model, it is necessary to develop a mechanism that includes a minimum number of chemical species, since the computational time and cost involved in solving the set of partial differential equations describing the diffusion-advection-reaction problem increases dramatically with the number of species modeled. Although photochemicl mechanisms employing fewer than 15 species have been developed previously for use within urban airshed models, those reduced, or surrogate, mechanisms do not include sulfur chemistry and do not appear applicable to the more widely varying conditions possible as gases become chemically depleted while being transported away from emission sources. Therefore, in order to meet the time and cost constraints of an atmospheric transport model, we have constructed a 12-species Surrogate CHEmical MEchanism (SCHEME) incorporating reactions for the homogeneous gas-phase oxidation of SO/sub 2/. A preliminary but much more detailed and comprehensive ATmospheric Model for Sulfur (ATMOS) has been used to generate SCHEME and test its applicability to a broad range of chemical conditions
Density-Polarization Functional Theory of the response of a periodic insulating solid to an electric field.
The response of an infinite, periodic, insulating, solid to an
infinitesimally small electric field is investigated in the framework of
Density Functional Theory. We find that the applied perturbing potential is not
a unique functional of the periodic density change~: it depends also on the
change in the macroscopic {\em polarization}. Moreover, the dependence of the
exchange-correlation energy on polarization induces an exchange-correlation
electric field. These effects are exhibited for a model semiconductor. We also
show that the scissor-operator technique is an approximate way of bypassing
this polarization dependence.Comment: 11 pages, 1 Fig
Zero-range process with open boundaries
We calculate the exact stationary distribution of the one-dimensional
zero-range process with open boundaries for arbitrary bulk and boundary hopping
rates. When such a distribution exists, the steady state has no correlations
between sites and is uniquely characterized by a space-dependent fugacity which
is a function of the boundary rates and the hopping asymmetry. For strong
boundary drive the system has no stationary distribution. In systems which on a
ring geometry allow for a condensation transition, a condensate develops at one
or both boundary sites. On all other sites the particle distribution approaches
a product measure with the finite critical density \rho_c. In systems which do
not support condensation on a ring, strong boundary drive leads to a condensate
at the boundary. However, in this case the local particle density in the
interior exhibits a complex algebraic growth in time. We calculate the bulk and
boundary growth exponents as a function of the system parameters
Second harmonic generation in SiC polytypes
LMTO calculations are presented for the frequency dependent second harmonic
generation (SHG) in the polytypes 2H, 4H, 6H, 15R and 3C of SiC. All
independent tensor components are calculated. The spectral features and the
ratios of the 333 to 311 tensorial components are studied as a function of the
degree of hexagonality. The relationship to the linear optical response and the
underlying band structure are investigated. SHG is suggested to be a sensitive
tool for investigating the near band edge interband excitations.Comment: 12 pages, 10 figure
The TWA 3 Young Triple System: Orbits, Disks, Evolution
We have characterized the spectroscopic orbit of the TWA 3A binary and
provide preliminary families of probable solutions for the TWA 3A visual orbit
as well as for the wide TWA 3A--B orbit. TWA 3 is a hierarchical triple located
at 34 pc in the 10 Myr old TW Hya association. The wide component
separation is 1."55; the close pair was first identified as a possible binary
almost 20 years ago. We initially identified the 35-day period orbital solution
using high-resolution infrared spectroscopy which angularly resolved the A and
B components. We then refined the preliminary orbit by combining the infrared
data with a re-analysis of our high-resolution optical spectroscopy. The
orbital period from the combined spectroscopic solution is 35 days, the
eccentricity is 0.63, and the mass ratio is 0.84; although this
high mass ratio would suggest that optical spectroscopy alone should be
sufficient to identify the orbital solution, the presence of the tertiary B
component likely introduced confusion in the blended optical spectra. Using
millimeter imaging from the literature, we also estimate the inclinations of
the stellar orbital planes with respect to the TWA 3A circumbinary disk
inclination and find that all three planes are likely misaligned by at least
30 degrees. The TWA 3A spectroscopic binary components have spectral
types of M4.0 and M4.5; TWA 3B is an M3. We speculate that the system formed as
a triple, is bound, and that its properties were shaped by dynamical
interactions between the inclined orbits and disk.Comment: Accepted to Ap
Power Law Distribution of Wealth in a Money-Based Model
A money-based model for the power law distribution (PLD) of wealth in an
economically interacting population is introduced. The basic feature of our
model is concentrating on the capital movements and avoiding the complexity of
micro behaviors of individuals. It is proposed as an extension of the Equiluz
and Zimmermann's (EZ) model for crowding and information transmission in
financial markets. Still, we must emphasize that in EZ model the PLD without
exponential correction is obtained only for a particular parameter, while our
pattern will give it within a wide range. The Zipf exponent depends on the
parameters in a nontrivial way and is exactly calculated in this paper.Comment: 5 pages and 4 figure
The Effective Particle-Hole Interaction and the Optical Response of Simple Metal Clusters
Following Sham and Rice [L. J. Sham, T. M. Rice, Phys. Rev. 144 (1966) 708]
the correlated motion of particle-hole pairs is studied, starting from the
general two-particle Greens function. In this way we derive a matrix equation
for eigenvalues and wave functions, respectively, of the general type of
collective excitation of a N-particle system. The interplay between excitons
and plasmons is fully described by this new set of equations. As a by-product
we obtain - at least a-posteriori - a justification for the use of the TDLDA
for simple-metal clusters.Comment: RevTeX, 15 pages, 5 figures in uufiles format, 1 figure avaible from
[email protected]
- …