1,071 research outputs found

    Katakomba: Tools and Benchmarks for Data-Driven NetHack

    Full text link
    NetHack is known as the frontier of reinforcement learning research where learning-based methods still need to catch up to rule-based solutions. One of the promising directions for a breakthrough is using pre-collected datasets similar to recent developments in robotics, recommender systems, and more under the umbrella of offline reinforcement learning (ORL). Recently, a large-scale NetHack dataset was released; while it was a necessary step forward, it has yet to gain wide adoption in the ORL community. In this work, we argue that there are three major obstacles for adoption: resource-wise, implementation-wise, and benchmark-wise. To address them, we develop an open-source library that provides workflow fundamentals familiar to the ORL community: pre-defined D4RL-style tasks, uncluttered baseline implementations, and reliable evaluation tools with accompanying configs and logs synced to the cloud.Comment: Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmarks. Source code at https://github.com/corl-team/katakomb

    CORL: Research-oriented Deep Offline Reinforcement Learning Library

    Full text link
    CORL is an open-source library that provides thoroughly benchmarked single-file implementations of both deep offline and offline-to-online reinforcement learning algorithms. It emphasizes a simple developing experience with a straightforward codebase and a modern analysis tracking tool. In CORL, we isolate methods implementation into separate single files, making performance-relevant details easier to recognize. Additionally, an experiment tracking feature is available to help log metrics, hyperparameters, dependencies, and more to the cloud. Finally, we have ensured the reliability of the implementations by benchmarking commonly employed D4RL datasets providing a transparent source of results that can be reused for robust evaluation tools such as performance profiles, probability of improvement, or expected online performance.Comment: Conference on Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmarks. Source code at https://github.com/corl-team/COR

    Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size

    Full text link
    Training large neural networks is known to be time-consuming, with the learning duration taking days or even weeks. To address this problem, large-batch optimization was introduced. This approach demonstrated that scaling mini-batch sizes with appropriate learning rate adjustments can speed up the training process by orders of magnitude. While long training time was not typically a major issue for model-free deep offline RL algorithms, recently introduced Q-ensemble methods achieving state-of-the-art performance made this issue more relevant, notably extending the training duration. In this work, we demonstrate how this class of methods can benefit from large-batch optimization, which is commonly overlooked by the deep offline RL community. We show that scaling the mini-batch size and naively adjusting the learning rate allows for (1) a reduced size of the Q-ensemble, (2) stronger penalization of out-of-distribution actions, and (3) improved convergence time, effectively shortening training duration by 3-4x times on average.Comment: Accepted at 3rd Offline Reinforcement Learning Workshop at Neural Information Processing Systems, 202

    Let Offline RL Flow: Training Conservative Agents in the Latent Space of Normalizing Flows

    Full text link
    Offline reinforcement learning aims to train a policy on a pre-recorded and fixed dataset without any additional environment interactions. There are two major challenges in this setting: (1) extrapolation error caused by approximating the value of state-action pairs not well-covered by the training data and (2) distributional shift between behavior and inference policies. One way to tackle these problems is to induce conservatism - i.e., keeping the learned policies closer to the behavioral ones. To achieve this, we build upon recent works on learning policies in latent action spaces and use a special form of Normalizing Flows for constructing a generative model, which we use as a conservative action encoder. This Normalizing Flows action encoder is pre-trained in a supervised manner on the offline dataset, and then an additional policy model - controller in the latent space - is trained via reinforcement learning. This approach avoids querying actions outside of the training dataset and therefore does not require additional regularization for out-of-dataset actions. We evaluate our method on various locomotion and navigation tasks, demonstrating that our approach outperforms recently proposed algorithms with generative action models on a large portion of datasets.Comment: Accepted at 3rd Offline Reinforcement Learning Workshop at Neural Information Processing Systems, 202

    In Vitro and in Silico Liver Models: Current Trends, Challenges and Opportunities

    Get PDF
    Most common drug development failures originate from either bioavailability problems, or unexpected toxic effects. The culprit is often the liver, which is responsible for biotransformation of a majority of xenobiotics. Liver may be modeled using liver on a chip devices, which may include established cell lines, primary human cells, and stem cell-derived hepatocyte-like cells. The choice of biological material along with its processing and maintenance greatly influence both the device performance and the resultant toxicity predictions. Impediments to the development of liver on a chip technology include the problems with standardization of cells, limitations imposed by culturing and the necessity to develop more complicated fluidic contours. Fortunately, recent breakthroughs in the development of cell-based reporters, including ones with fluorescent label, permits monitoring of the behavior of the cells embed into the liver on a chip devices. Finally, a set of computational approaches has been developed to model both particular toxic response and the homeostasis of human liver as a whole; these approaches pave a way to enhance the in silico stage of assessment for a potential toxicity

    Modélisation analytique de l'essai caractéristique d'emboutissage du godet

    Get PDF
    L'emboutissage d'un godet occupe une place particulière dans les essais caractéristiques de mise en forme. Cet essai permet à la fois d'étudier la consolidation du matériau, l'effet du frottement, la rupture, le plissement et de construire les courbes limites de formage. Le développement d'un modèle analytique est intéressant car il peut fournir rapidement l'information sur les champs des déformations et des contraintes au cours de l'emboutissage et montrer l'influence des paramètres. Dans un premier temps, nous conduisons une analyse comparative des modèles analytiques issus de la littérature. Ensuite, nous proposons une approche basée entre autres sur l'hypothèse de l'homogénéité des contraintes de serrage induites dans la collerette du godet par le serre–flan. Rappelons que la plupart des autres travaux supposent l'invariance de l'épaisseur de la tôle dans cette zone. Les résultats des différentes approches sont comparés aux expériences ce qui permet de discuter la validité des différentes hypothèses

    Activity and stability of PtCo/C electrocatalysts for alcohol oxidation

    Get PDF
    This study considers the liquid-phase synthesis of PtCo/C catalysts based on CoOx/C composite carriers with different mass fractions of metals and Pt:Co ratios. The purpose of the article is to study the activity of PtCo/C electrocatalysts of various compositions in the oxidation reactions of methanol and ethanol and to compare their characteristics with their commercial PtRu/C and Pt/C analogues. PtCo/С catalysts were synthesised with Pt:Co ratios of 1:1 and 3:1. The specific active surface of the obtained PtCo/C materials was determined, their activity in the oxidation reactions of methanol and ethanol and their resistance to poisoning by intermediate products of alcohol oxidation were studied. The structural and electrochemical characteristics of the obtained PtCo/C catalysts were studied by X-ray diffraction, cyclic voltammetry, and chronoamperometry. It was found that PtCo/C materials with a mass fraction of platinum close to 20% are the most active and stable as compared to their commercial PtRu/C and Pt/C analogues. The presented results show that PtCo/C catalysts are a promising material for direct alcohol fuel cells

    Establishing of local population, population dynamics and current abundance of Steller sea lion ( <i>Eumetopias jubatus</i>) in the Commander Islands

    Get PDF
    The time course of the establishment of a local population of Steller sea lions in the Commander Islands, population dynamics and current abundance were studied using literature published since the 1930s and the author’s observations conducted during breeding seasons 2008-2011. The local population of Steller sea lions started formation in the early 1960s, when mature females first began to populate the islands and the population was fully established in the early 1990s. The whole process of development the Commander Islands Steller sea lion sub-population took about three decades. Abundance of adult and juvenile sea lions fluctuated highly in 1991-2011 without any statistically significant trend, but numbers of pups had a pronounced negative slope mostly due to three sharp declines in pup production in 2000, 2009, and 2011. A total of about 700 animals of age 1+ inhabit the islands during the breeding season and about 200 pups are born annually at the present time. This total number of Steller sea lions is close to the mean value for the period after 1990s. Nevertheless, occasional sharp declines in pup production cause some anxiety, so far as they could lead to extinction of the Steller sea lion sub-population in this area as had occurred in the middle of the 19th century

    Migration of the Individuals

    Get PDF
    AbstractThe individuals are modeled by the elements of variable domains. The primitive frame to detect the individual migration from domain to domain is proposed. The supporting computational model is based on a separation of individuals into actual, possible and virtual ones. As was shown, this leads to an adoption of the stage-by-stage cognition model with a pair of evolvents to capture dynamics of the domains – the 2-dimensions model. The first evolvent reflects the generation of the individuals in a domain, the beginning of and canceling out their existence in a domain. The second evolvent reflects the shifts in properties of the individuals. As awaited this unified data model will have the applications to a wide range of models in computer science and Information Technologies
    corecore