205 research outputs found
AdCraft: An Advanced Reinforcement Learning Benchmark Environment for Search Engine Marketing Optimization
We introduce AdCraft, a novel benchmark environment for the Reinforcement
Learning (RL) community distinguished by its stochastic and non-stationary
properties. The environment simulates bidding and budgeting dynamics within
Search Engine Marketing (SEM), a digital marketing technique utilizing paid
advertising to enhance the visibility of websites on search engine results
pages (SERPs). The performance of SEM advertisement campaigns depends on
several factors, including keyword selection, ad design, bid management, budget
adjustments, and performance monitoring. Deep RL recently emerged as a
potential strategy to optimize campaign profitability within the complex and
dynamic landscape of SEM but it requires substantial data, which may be costly
or infeasible to acquire in practice. Our customizable environment enables
practitioners to assess and enhance the robustness of RL algorithms pertinent
to SEM bid and budget management without such costs. Through a series of
experiments within the environment, we demonstrate the challenges imposed by
sparsity and non-stationarity on agent convergence and performance. We hope
these challenges further encourage discourse and development around effective
strategies for managing real-world uncertainties
Finite-Time Thermodynamics
The theory around the concept of finite time describes how processes of any nature can be optimized in situations when their rate is required to be non-negligible, i.e., they must come to completion in a finite time. What the theory makes explicit is “the cost of haste”. Intuitively, it is quite obvious that you drive your car differently if you want to reach your destination as quickly as possible as opposed to the case when you are running out of gas. Finite-time thermodynamics quantifies such opposing requirements and may provide the optimal control to achieve the best compromise. The theory was initially developed for heat engines (steam, Otto, Stirling, a.o.) and for refrigerators, but it has by now evolved into essentially all areas of dynamic systems from the most abstract ones to the most practical ones. The present collection shows some fascinating current examples
Thermodynamics of quantum systems under dynamical control
In this review the debated rapport between thermodynamics and quantum
mechanics is addressed in the framework of the theory of
periodically-driven/controlled quantum-thermodynamic machines. The basic model
studied here is that of a two-level system (TLS), whose energy is periodically
modulated while the system is coupled to thermal baths. When the modulation
interval is short compared to the bath memory time, the system-bath
correlations are affected, thereby causing cooling or heating of the TLS,
depending on the interval. In steady state, a periodically-modulated TLS
coupled to two distinct baths constitutes the simplest quantum heat machine
(QHM) that may operate as either an engine or a refrigerator, depending on the
modulation rate. We find their efficiency and power-output bounds and the
conditions for attaining these bounds. An extension of this model to multilevel
systems shows that the QHM power output can be boosted by the multilevel
degeneracy.
These results are used to scrutinize basic thermodynamic principles: (i)
Externally-driven/modulated QHMs may attain the Carnot efficiency bound, but
when the driving is done by a quantum device ("piston"), the efficiency
strongly depends on its initial quantum state. Such dependence has been unknown
thus far. (ii) The refrigeration rate effected by QHMs does not vanish as the
temperature approaches absolute zero for certain quantized baths, e.g.,
magnons, thous challenging Nernst's unattainability principle. (iii)
System-bath correlations allow more work extraction under periodic control than
that expected from the Szilard-Landauer principle, provided the period is in
the non-Markovian domain. Thus, dynamically-controlled QHMs may benefit from
hitherto unexploited thermodynamic resources
Radio observations of active galactic nuclei with mm-VLBI
Over the past few decades, our knowledge of jets produced by active galactic
nuclei (AGN) has greatly progressed thanks to the development of
very-long-baseline interferometry (VLBI). Nevertheless, the crucial mechanisms
involved in the formation of the plasma flow, as well as those driving its
exceptional radiative output up to TeV energies, remain to be clarified. Most
likely, these physical processes take place at short separations from the
supermassive black hole, on scales which are inaccessible to VLBI observations
at centimeter wavelengths. Due to their high synchrotron opacity, the dense and
highly magnetized regions in the vicinity of the central engine can only be
penetrated when observing at shorter wavelengths, in the millimeter and
sub-millimeter regimes. While this was recognized already in the early days of
VLBI, it was not until the very recent years that sensitive VLBI imaging at
high frequencies has become possible. Ongoing technical development and wide
band observing now provide adequate imaging fidelity to carry out more detailed
analyses.
In this article we overview some open questions concerning the physics of AGN
jets, and we discuss the impact of mm-VLBI studies. Among the rich set of
results produced so far in this frequency regime, we particularly focus on
studies performed at 43 GHz (7 mm) and at 86 GHz (3 mm). Some of the first
findings at 230 GHz (1 mm) obtained with the Event Horizon Telescope are also
presented.Comment: Published in The Astronomy & Astrophysics Review. Open access:
https://link.springer.com/article/10.1007/s00159-017-0105-
Lipschitzness Is All You Need To Tame Off-policy Generative Adversarial Imitation Learning
Despite the recent success of reinforcement learning in various domains,
these approaches remain, for the most part, deterringly sensitive to
hyper-parameters and are often riddled with essential engineering feats
allowing their success. We consider the case of off-policy generative
adversarial imitation learning, and perform an in-depth review, qualitative and
quantitative, of the method. We show that forcing the learned reward function
to be local Lipschitz-continuous is a sine qua non condition for the method to
perform well. We then study the effects of this necessary condition and provide
several theoretical results involving the local Lipschitzness of the
state-value function. We complement these guarantees with empirical evidence
attesting to the strong positive effect that the consistent satisfaction of the
Lipschitzness constraint on the reward has on imitation performance. Finally,
we tackle a generic pessimistic reward preconditioning add-on spawning a large
class of reward shaping methods, which makes the base method it is plugged into
provably more robust, as shown in several additional theoretical guarantees. We
then discuss these through a fine-grained lens and share our insights.
Crucially, the guarantees derived and reported in this work are valid for any
reward satisfying the Lipschitzness condition, nothing is specific to
imitation. As such, these may be of independent interest
- …