662 research outputs found
Evolutionary Algorithms for Reinforcement Learning
There are two distinct approaches to solving reinforcement learning problems,
namely, searching in value function space and searching in policy space.
Temporal difference methods and evolutionary algorithms are well-known examples
of these approaches. Kaelbling, Littman and Moore recently provided an
informative survey of temporal difference methods. This article focuses on the
application of evolutionary algorithms to the reinforcement learning problem,
emphasizing alternative policy representations, credit assignment methods, and
problem-specific genetic operators. Strengths and weaknesses of the
evolutionary approach to reinforcement learning are presented, along with a
survey of representative applications
Sheffield University CLEF 2000 submission - bilingual track: German to English
We investigated dictionary based cross language information
retrieval using lexical triangulation. Lexical triangulation combines the results
of different transitive translations. Transitive translation uses a pivot language
to translate between two languages when no direct translation resource is
available. We took German queries and translated then via Spanish, or Dutch
into English. We compared the results of retrieval experiments using these
queries, with other versions created by combining the transitive translations or
created by direct translation. Direct dictionary translation of a query introduces
considerable ambiguity that damages retrieval, an average precision 79% below
monolingual in this research. Transitive translation introduces more ambiguity,
giving results worse than 88% below direct translation. We have shown that
lexical triangulation between two transitive translations can eliminate much of
the additional ambiguity introduced by transitive translation
Inheritance-Based Diversity Measures for Explicit Convergence Control in Evolutionary Algorithms
Diversity is an important factor in evolutionary algorithms to prevent
premature convergence towards a single local optimum. In order to maintain
diversity throughout the process of evolution, various means exist in
literature. We analyze approaches to diversity that (a) have an explicit and
quantifiable influence on fitness at the individual level and (b) require no
(or very little) additional domain knowledge such as domain-specific distance
functions. We also introduce the concept of genealogical diversity in a broader
study. We show that employing these approaches can help evolutionary algorithms
for global optimization in many cases.Comment: GECCO '18: Genetic and Evolutionary Computation Conference, 2018,
Kyoto, Japa
Genetic algorithms with elitism-based immigrants for changing optimization problems
Copyright @ Springer-Verlag Berlin Heidelberg 2007.Addressing dynamic optimization problems has been a challenging task for the genetic algorithm community. Over the years, several approaches have been developed into genetic algorithms to enhance their performance in dynamic environments. One major approach is to maintain the diversity of the population, e.g., via random immigrants. This paper proposes an elitism-based immigrants scheme for genetic algorithms in dynamic environments. In the scheme, the elite from previous generation is used as the base to create immigrants via mutation to replace the worst individuals in the current population. This way, the introduced immigrants are more adapted to the changing environment. This paper also proposes a hybrid scheme that combines the elitism-based immigrants scheme with traditional random immigrants scheme to deal with significant changes. The experimental results show that the proposed elitism-based and hybrid immigrants schemes efficiently improve the performance of genetic algorithms in dynamic environments
Multiple cyclotron line-forming regions in GX 301-2
We present two observations of the high-mass X-ray binary GX 301-2 with
NuSTAR, taken at different orbital phases and different luminosities. We find
that the continuum is well described by typical phenomenological models, like a
very strongly absorbed NPEX model. However, for a statistically acceptable
description of the hard X-ray spectrum we require two cyclotron resonant
scattering features (CRSF), one at ~35 keV and the other at ~50 keV. Even
though both features strongly overlap, the good resolution and sensitivity of
NuSTAR allows us to disentangle them at >=99.9% significance. This is the first
time that two CRSFs are seen in GX 301-2. We find that the CRSFs are very
likely independently formed, as their energies are not harmonically related
and, if it were a single line, the deviation from a Gaussian shape would be
very large. We compare our results to archival Suzaku data and find that our
model also provides a good fit to those data. We study the behavior of the
continuum as well as the CRSF parameters as function of pulse phase in seven
phase bins. We find that the energy of the 35 keV CRSF varies smoothly as
function of phase, between 30-38 keV. To explain this variation, we apply a
simple model of the accretion column, taking the altitude of the line-forming
region, the velocity of the in-falling material, and the resulting relativistic
effects into account. We find that in this model the observed energy variation
can be explained simply due to a variation of the projected velocity and
beaming factor of the line forming region towards us.Comment: 18 pages, 10 figures, accepted for publication in A&
Replay-Guided Adversarial Environment Design
Deep reinforcement learning (RL) agents may successfully generalize to new settings if trained on an appropriately diverse set of environment and task configurations. Unsupervised Environment Design (UED) is a promising self-supervised RL paradigm, wherein the free parameters of an underspecified environment are automatically adapted during training to the agent's capabilities, leading to the emergence of diverse training environments. Here, we cast Prioritized Level Replay (PLR), an empirically successful but theoretically unmotivated method that selectively samples randomly-generated training levels, as UED. We argue that by curating completely random levels, PLR, too, can generate novel and complex levels for effective training. This insight reveals a natural class of UED methods we call Dual Curriculum Design (DCD). Crucially, DCD includes both PLR and a popular UED algorithm, PAIRED, as special cases and inherits similar theoretical guarantees. This connection allows us to develop novel theory for PLR, providing a version with a robustness guarantee at Nash equilibria. Furthermore, our theory suggests a highly counterintuitive improvement to PLR: by stopping the agent from updating its policy on uncurated levels (training on less data), we can improve the convergence to Nash equilibria. Indeed, our experiments confirm that our new method, PLR
â„
, obtains better results on a suite of out-of-distribution, zero-shot transfer tasks, in addition to demonstrating that PLR
â„
improves the performance of PAIRED, from which it inherited its theoretical framework
Evidence for a Variable Ultrafast Outflow in the Newly Discovered Ultraluminous Pulsar NGC 300 ULX-1
Ultraluminous pulsars are a definite proof that persistent super-Eddington
accretion occurs in nature. They support the scenario according to which most
Ultraluminous X-ray Sources (ULXs) are super-Eddington accretors of stellar
mass rather than sub-Eddington intermediate mass black holes. An important
prediction of theories of supercritical accretion is the existence of powerful
outflows of moderately ionized gas at mildly relativistic speeds. In practice,
the spectral resolution of X-ray gratings such as RGS onboard XMM-Newton is
required to resolve their observational signatures in ULXs. Using RGS, outflows
have been discovered in the spectra of 3 ULXs (none of which are currently
known to be pulsars). Most recently, the fourth ultraluminous pulsar was
discovered in NGC 300. Here we report detection of an ultrafast outflow (UFO)
in the X-ray spectrum of the object, with a significance of more than
3{\sigma}, during one of the two simultaneous observations of the source by
XMM-Newton and NuSTAR in December 2016. The outflow has a projected velocity of
65000 km/s (0.22c) and a high ionisation factor with a log value of 3.9. This
is the first direct evidence for a UFO in a neutron star ULX and also the first
time that this its evidence in a ULX spectrum is seen in both soft and hard
X-ray data simultaneously. We find no evidence of the UFO during the other
observation of the object, which could be explained by either clumpy nature of
the absorber or a slight change in our viewing angle of the accretion flow.Comment: 10 pages, 4 figures. Accepted to MNRA
Improving Policy Learning via Language Dynamics Distillation
Recent work has shown that augmenting environments with language descriptions improves policy learning. However, for environments with complex language abstractions, learning how to ground language to observations is difficult due to sparse, delayed rewards. We propose Language Dynamics Distillation (LDD), which pretrains a model to predict environment dynamics given demonstrations with language descriptions, and then fine-tunes these language-aware pretrained representations via reinforcement learning (RL). In this way, the model is trained to both maximize expected reward and retain knowledge about how language relates to environment dynamics. On SILG, a benchmark of five tasks with language descriptions that evaluate distinct generalization challenges on unseen environments (NetHack, ALFWorld, RTFM, Messenger, and Touchdown), LDD outperforms tabula-rasa RL, VAE pretraining, and methods that learn from unlabeled demonstrations in inverse RL and reward shaping with pretrained experts. In our analyses, we show that language descriptions in demonstrations improve sample-efficiency and generalization across environments, and that dynamics modeling with expert demonstrations is more effective than with non-experts
Hierarchical Kickstarting for Skill Transfer in Reinforcement Learning
Practising and honing skills forms a fundamental component of how humans learn, yet artificial agents are rarely specifically trained to perform them. Instead, they are usually trained end-to-end, with the hope being that useful skills will be implicitly learned in order to maximise discounted return of some extrinsic reward function. In this paper, we investigate how skills can be incorporated into the training of reinforcement learning (RL) agents in complex environments with large state-action spaces and sparse rewards. To this end, we created SkillHack, a benchmark of tasks and associated skills based on the game of NetHack. We evaluate a number of baselines on this benchmark, as well as our own novel skill-based method Hierarchical Kickstarting (HKS), which is shown to outperform all other evaluated methods. Our experiments show that learning with a prior knowledge of useful skills can significantly improve the performance of agents on complex problems. We ultimately argue that utilising predefined skills provides a useful inductive bias for RL problems, especially those with large state-action spaces and sparse rewards
Canalization and Symmetry in Boolean Models for Genetic Regulatory Networks
Canalization of genetic regulatory networks has been argued to be favored by
evolutionary processes due to the stability that it can confer to phenotype
expression. We explore whether a significant amount of canalization and partial
canalization can arise in purely random networks in the absence of evolutionary
pressures. We use a mapping of the Boolean functions in the Kauffman N-K model
for genetic regulatory networks onto a k-dimensional Ising hypercube to show
that the functions can be divided into different classes strictly due to
geometrical constraints. The classes can be counted and their properties
determined using results from group theory and isomer chemistry. We demonstrate
that partially canalized functions completely dominate all possible Boolean
functions, particularly for higher k. This indicates that partial canalization
is extremely common, even in randomly chosen networks, and has implications for
how much information can be obtained in experiments on native state genetic
regulatory networks.Comment: 14 pages, 4 figures; version to appear in J. Phys.
- âŠ