1,071 research outputs found
Katakomba: Tools and Benchmarks for Data-Driven NetHack
NetHack is known as the frontier of reinforcement learning research where
learning-based methods still need to catch up to rule-based solutions. One of
the promising directions for a breakthrough is using pre-collected datasets
similar to recent developments in robotics, recommender systems, and more under
the umbrella of offline reinforcement learning (ORL). Recently, a large-scale
NetHack dataset was released; while it was a necessary step forward, it has yet
to gain wide adoption in the ORL community. In this work, we argue that there
are three major obstacles for adoption: resource-wise, implementation-wise, and
benchmark-wise. To address them, we develop an open-source library that
provides workflow fundamentals familiar to the ORL community: pre-defined
D4RL-style tasks, uncluttered baseline implementations, and reliable evaluation
tools with accompanying configs and logs synced to the cloud.Comment: Neural Information Processing Systems (NeurIPS 2023) Track on
Datasets and Benchmarks. Source code at
https://github.com/corl-team/katakomb
CORL: Research-oriented Deep Offline Reinforcement Learning Library
CORL is an open-source library that provides thoroughly benchmarked
single-file implementations of both deep offline and offline-to-online
reinforcement learning algorithms. It emphasizes a simple developing experience
with a straightforward codebase and a modern analysis tracking tool. In CORL,
we isolate methods implementation into separate single files, making
performance-relevant details easier to recognize. Additionally, an experiment
tracking feature is available to help log metrics, hyperparameters,
dependencies, and more to the cloud. Finally, we have ensured the reliability
of the implementations by benchmarking commonly employed D4RL datasets
providing a transparent source of results that can be reused for robust
evaluation tools such as performance profiles, probability of improvement, or
expected online performance.Comment: Conference on Neural Information Processing Systems (NeurIPS 2023)
Track on Datasets and Benchmarks. Source code at
https://github.com/corl-team/COR
Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size
Training large neural networks is known to be time-consuming, with the
learning duration taking days or even weeks. To address this problem,
large-batch optimization was introduced. This approach demonstrated that
scaling mini-batch sizes with appropriate learning rate adjustments can speed
up the training process by orders of magnitude. While long training time was
not typically a major issue for model-free deep offline RL algorithms, recently
introduced Q-ensemble methods achieving state-of-the-art performance made this
issue more relevant, notably extending the training duration. In this work, we
demonstrate how this class of methods can benefit from large-batch
optimization, which is commonly overlooked by the deep offline RL community. We
show that scaling the mini-batch size and naively adjusting the learning rate
allows for (1) a reduced size of the Q-ensemble, (2) stronger penalization of
out-of-distribution actions, and (3) improved convergence time, effectively
shortening training duration by 3-4x times on average.Comment: Accepted at 3rd Offline Reinforcement Learning Workshop at Neural
Information Processing Systems, 202
Let Offline RL Flow: Training Conservative Agents in the Latent Space of Normalizing Flows
Offline reinforcement learning aims to train a policy on a pre-recorded and
fixed dataset without any additional environment interactions. There are two
major challenges in this setting: (1) extrapolation error caused by
approximating the value of state-action pairs not well-covered by the training
data and (2) distributional shift between behavior and inference policies. One
way to tackle these problems is to induce conservatism - i.e., keeping the
learned policies closer to the behavioral ones. To achieve this, we build upon
recent works on learning policies in latent action spaces and use a special
form of Normalizing Flows for constructing a generative model, which we use as
a conservative action encoder. This Normalizing Flows action encoder is
pre-trained in a supervised manner on the offline dataset, and then an
additional policy model - controller in the latent space - is trained via
reinforcement learning. This approach avoids querying actions outside of the
training dataset and therefore does not require additional regularization for
out-of-dataset actions. We evaluate our method on various locomotion and
navigation tasks, demonstrating that our approach outperforms recently proposed
algorithms with generative action models on a large portion of datasets.Comment: Accepted at 3rd Offline Reinforcement Learning Workshop at Neural
Information Processing Systems, 202
In Vitro and in Silico Liver Models: Current Trends, Challenges and Opportunities
Most common drug development failures originate from either bioavailability problems, or unexpected toxic effects. The culprit is often the liver, which is responsible for biotransformation of a majority of xenobiotics. Liver may be modeled using liver on a chip devices, which may include established cell lines, primary human cells, and stem cell-derived hepatocyte-like cells. The choice of biological material along with its processing and maintenance greatly influence both the device performance and the resultant toxicity predictions. Impediments to the development of liver on a chip technology include the problems with standardization of cells, limitations imposed by culturing and the necessity to develop more complicated fluidic contours. Fortunately, recent breakthroughs in the development of cell-based reporters, including ones with fluorescent label, permits monitoring of the behavior of the cells embed into the liver on a chip devices. Finally, a set of computational approaches has been developed to model both particular toxic response and the homeostasis of human liver as a whole; these approaches pave a way to enhance the in silico stage of assessment for a potential toxicity
Modélisation analytique de l'essai caractéristique d'emboutissage du godet
L'emboutissage d'un godet occupe une place particulière dans les essais caractéristiques de mise en forme. Cet essai permet à la fois d'étudier la consolidation du matériau, l'effet du frottement, la rupture, le plissement et de construire les courbes limites de formage. Le développement d'un modèle analytique est intéressant car il peut fournir rapidement l'information sur les champs des déformations et des contraintes au cours de l'emboutissage et montrer l'influence des paramètres. Dans un premier temps, nous conduisons une analyse comparative des modèles analytiques issus de la littérature. Ensuite, nous proposons une approche basée entre autres sur l'hypothèse de l'homogénéité des contraintes de serrage induites dans la collerette du godet par le serre–flan. Rappelons que la plupart des autres travaux supposent l'invariance de l'épaisseur de la tôle dans cette zone. Les résultats des différentes approches sont comparés aux expériences ce qui permet de discuter la validité des différentes hypothèses
Activity and stability of PtCo/C electrocatalysts for alcohol oxidation
This study considers the liquid-phase synthesis of PtCo/C catalysts based on CoOx/C composite carriers with different mass fractions of metals and Pt:Co ratios. The purpose of the article is to study the activity of PtCo/C electrocatalysts of various compositions in the oxidation reactions of methanol and ethanol and to compare their characteristics with their commercial PtRu/C and Pt/C analogues.
PtCo/С catalysts were synthesised with Pt:Co ratios of 1:1 and 3:1. The specific active surface of the obtained PtCo/C materials was determined, their activity in the oxidation reactions of methanol and ethanol and their resistance to poisoning by intermediate products of alcohol oxidation were studied. The structural and electrochemical characteristics of the obtained PtCo/C catalysts were studied by X-ray diffraction, cyclic voltammetry, and chronoamperometry. It was found that PtCo/C materials with a mass fraction of platinum close to 20% are the most active and stable as compared to their commercial PtRu/C and Pt/C analogues.
The presented results show that PtCo/C catalysts are a promising material for direct alcohol fuel cells
Establishing of local population, population dynamics and current abundance of Steller sea lion ( <i>Eumetopias jubatus</i>) in the Commander Islands
The time course of the establishment of a local population of Steller sea lions in the Commander Islands, population dynamics and current abundance were studied using literature published since the 1930s and the author’s observations conducted during breeding seasons 2008-2011. The local population of Steller sea lions started formation in the early 1960s, when mature females first began to populate the islands and the population was fully established in the early 1990s. The whole process of development the Commander Islands Steller sea lion sub-population took about three decades. Abundance of adult and juvenile sea lions fluctuated highly in 1991-2011 without any statistically significant trend, but numbers of pups had a pronounced negative slope mostly due to three sharp declines in pup production in 2000, 2009, and 2011. A total of about 700 animals of age 1+ inhabit the islands during the breeding season and about 200 pups are born annually at the present time. This total number of Steller sea lions is close to the mean value for the period after 1990s. Nevertheless, occasional sharp declines in pup production cause some anxiety, so far as they could lead to extinction of the Steller sea lion sub-population in this area as had occurred in the middle of the 19th century
Migration of the Individuals
AbstractThe individuals are modeled by the elements of variable domains. The primitive frame to detect the individual migration from domain to domain is proposed. The supporting computational model is based on a separation of individuals into actual, possible and virtual ones. As was shown, this leads to an adoption of the stage-by-stage cognition model with a pair of evolvents to capture dynamics of the domains – the 2-dimensions model. The first evolvent reflects the generation of the individuals in a domain, the beginning of and canceling out their existence in a domain. The second evolvent reflects the shifts in properties of the individuals. As awaited this unified data model will have the applications to a wide range of models in computer science and Information Technologies
- …