716 research outputs found
Katakomba: Tools and Benchmarks for Data-Driven NetHack
NetHack is known as the frontier of reinforcement learning research where
learning-based methods still need to catch up to rule-based solutions. One of
the promising directions for a breakthrough is using pre-collected datasets
similar to recent developments in robotics, recommender systems, and more under
the umbrella of offline reinforcement learning (ORL). Recently, a large-scale
NetHack dataset was released; while it was a necessary step forward, it has yet
to gain wide adoption in the ORL community. In this work, we argue that there
are three major obstacles for adoption: resource-wise, implementation-wise, and
benchmark-wise. To address them, we develop an open-source library that
provides workflow fundamentals familiar to the ORL community: pre-defined
D4RL-style tasks, uncluttered baseline implementations, and reliable evaluation
tools with accompanying configs and logs synced to the cloud.Comment: Neural Information Processing Systems (NeurIPS 2023) Track on
Datasets and Benchmarks. Source code at
https://github.com/corl-team/katakomb
Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size
Training large neural networks is known to be time-consuming, with the
learning duration taking days or even weeks. To address this problem,
large-batch optimization was introduced. This approach demonstrated that
scaling mini-batch sizes with appropriate learning rate adjustments can speed
up the training process by orders of magnitude. While long training time was
not typically a major issue for model-free deep offline RL algorithms, recently
introduced Q-ensemble methods achieving state-of-the-art performance made this
issue more relevant, notably extending the training duration. In this work, we
demonstrate how this class of methods can benefit from large-batch
optimization, which is commonly overlooked by the deep offline RL community. We
show that scaling the mini-batch size and naively adjusting the learning rate
allows for (1) a reduced size of the Q-ensemble, (2) stronger penalization of
out-of-distribution actions, and (3) improved convergence time, effectively
shortening training duration by 3-4x times on average.Comment: Accepted at 3rd Offline Reinforcement Learning Workshop at Neural
Information Processing Systems, 202
Activity and stability of PtCo/C electrocatalysts for alcohol oxidation
This study considers the liquid-phase synthesis of PtCo/C catalysts based on CoOx/C composite carriers with different mass fractions of metals and Pt:Co ratios. The purpose of the article is to study the activity of PtCo/C electrocatalysts of various compositions in the oxidation reactions of methanol and ethanol and to compare their characteristics with their commercial PtRu/C and Pt/C analogues.
PtCo/С catalysts were synthesised with Pt:Co ratios of 1:1 and 3:1. The specific active surface of the obtained PtCo/C materials was determined, their activity in the oxidation reactions of methanol and ethanol and their resistance to poisoning by intermediate products of alcohol oxidation were studied. The structural and electrochemical characteristics of the obtained PtCo/C catalysts were studied by X-ray diffraction, cyclic voltammetry, and chronoamperometry. It was found that PtCo/C materials with a mass fraction of platinum close to 20% are the most active and stable as compared to their commercial PtRu/C and Pt/C analogues.
The presented results show that PtCo/C catalysts are a promising material for direct alcohol fuel cells
Migration of the Individuals
AbstractThe individuals are modeled by the elements of variable domains. The primitive frame to detect the individual migration from domain to domain is proposed. The supporting computational model is based on a separation of individuals into actual, possible and virtual ones. As was shown, this leads to an adoption of the stage-by-stage cognition model with a pair of evolvents to capture dynamics of the domains – the 2-dimensions model. The first evolvent reflects the generation of the individuals in a domain, the beginning of and canceling out their existence in a domain. The second evolvent reflects the shifts in properties of the individuals. As awaited this unified data model will have the applications to a wide range of models in computer science and Information Technologies
Wave kinetics of random fibre lasers
Traditional wave kinetics describes the slow evolution of systems with many degrees of freedom to equilibrium via numerous weak non-linear interactions and fails for very important class of dissipative (active) optical systems with cyclic gain and losses, such as lasers with non-linear intracavity dynamics. Here we introduce a conceptually new class of cyclic wave systems, characterized by non-uniform double-scale dynamics with strong periodic changes of the energy spectrum and slow evolution from cycle to cycle to a statistically steady state. Taking a practically important example—random fibre laser—we show that a model describing such a system is close to integrable non-linear Schrödinger equation and needs a new formalism of wave kinetics, developed here. We derive a non-linear kinetic theory of the laser spectrum, generalizing the seminal linear model of Schawlow and Townes. Experimental results agree with our theory. The work has implications for describing kinetics of cyclical systems beyond photonics
- …