716 research outputs found

    Katakomba: Tools and Benchmarks for Data-Driven NetHack

    Full text link
    NetHack is known as the frontier of reinforcement learning research where learning-based methods still need to catch up to rule-based solutions. One of the promising directions for a breakthrough is using pre-collected datasets similar to recent developments in robotics, recommender systems, and more under the umbrella of offline reinforcement learning (ORL). Recently, a large-scale NetHack dataset was released; while it was a necessary step forward, it has yet to gain wide adoption in the ORL community. In this work, we argue that there are three major obstacles for adoption: resource-wise, implementation-wise, and benchmark-wise. To address them, we develop an open-source library that provides workflow fundamentals familiar to the ORL community: pre-defined D4RL-style tasks, uncluttered baseline implementations, and reliable evaluation tools with accompanying configs and logs synced to the cloud.Comment: Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmarks. Source code at https://github.com/corl-team/katakomb

    Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size

    Full text link
    Training large neural networks is known to be time-consuming, with the learning duration taking days or even weeks. To address this problem, large-batch optimization was introduced. This approach demonstrated that scaling mini-batch sizes with appropriate learning rate adjustments can speed up the training process by orders of magnitude. While long training time was not typically a major issue for model-free deep offline RL algorithms, recently introduced Q-ensemble methods achieving state-of-the-art performance made this issue more relevant, notably extending the training duration. In this work, we demonstrate how this class of methods can benefit from large-batch optimization, which is commonly overlooked by the deep offline RL community. We show that scaling the mini-batch size and naively adjusting the learning rate allows for (1) a reduced size of the Q-ensemble, (2) stronger penalization of out-of-distribution actions, and (3) improved convergence time, effectively shortening training duration by 3-4x times on average.Comment: Accepted at 3rd Offline Reinforcement Learning Workshop at Neural Information Processing Systems, 202

    Activity and stability of PtCo/C electrocatalysts for alcohol oxidation

    Get PDF
    This study considers the liquid-phase synthesis of PtCo/C catalysts based on CoOx/C composite carriers with different mass fractions of metals and Pt:Co ratios. The purpose of the article is to study the activity of PtCo/C electrocatalysts of various compositions in the oxidation reactions of methanol and ethanol and to compare their characteristics with their commercial PtRu/C and Pt/C analogues. PtCo/С catalysts were synthesised with Pt:Co ratios of 1:1 and 3:1. The specific active surface of the obtained PtCo/C materials was determined, their activity in the oxidation reactions of methanol and ethanol and their resistance to poisoning by intermediate products of alcohol oxidation were studied. The structural and electrochemical characteristics of the obtained PtCo/C catalysts were studied by X-ray diffraction, cyclic voltammetry, and chronoamperometry. It was found that PtCo/C materials with a mass fraction of platinum close to 20% are the most active and stable as compared to their commercial PtRu/C and Pt/C analogues. The presented results show that PtCo/C catalysts are a promising material for direct alcohol fuel cells

    Migration of the Individuals

    Get PDF
    AbstractThe individuals are modeled by the elements of variable domains. The primitive frame to detect the individual migration from domain to domain is proposed. The supporting computational model is based on a separation of individuals into actual, possible and virtual ones. As was shown, this leads to an adoption of the stage-by-stage cognition model with a pair of evolvents to capture dynamics of the domains – the 2-dimensions model. The first evolvent reflects the generation of the individuals in a domain, the beginning of and canceling out their existence in a domain. The second evolvent reflects the shifts in properties of the individuals. As awaited this unified data model will have the applications to a wide range of models in computer science and Information Technologies

    Wave kinetics of random fibre lasers

    Get PDF
    Traditional wave kinetics describes the slow evolution of systems with many degrees of freedom to equilibrium via numerous weak non-linear interactions and fails for very important class of dissipative (active) optical systems with cyclic gain and losses, such as lasers with non-linear intracavity dynamics. Here we introduce a conceptually new class of cyclic wave systems, characterized by non-uniform double-scale dynamics with strong periodic changes of the energy spectrum and slow evolution from cycle to cycle to a statistically steady state. Taking a practically important example—random fibre laser—we show that a model describing such a system is close to integrable non-linear Schrödinger equation and needs a new formalism of wave kinetics, developed here. We derive a non-linear kinetic theory of the laser spectrum, generalizing the seminal linear model of Schawlow and Townes. Experimental results agree with our theory. The work has implications for describing kinetics of cyclical systems beyond photonics
    corecore