1,046 research outputs found
Waypoint Transformer: Reinforcement Learning via Supervised Learning with Intermediate Targets
Despite the recent advancements in offline reinforcement learning via
supervised learning (RvS) and the success of the decision transformer (DT)
architecture in various domains, DTs have fallen short in several challenging
benchmarks. The root cause of this underperformance lies in their inability to
seamlessly connect segments of suboptimal trajectories. To overcome this
limitation, we present a novel approach to enhance RvS methods by integrating
intermediate targets. We introduce the Waypoint Transformer (WT), using an
architecture that builds upon the DT framework and conditioned on
automatically-generated waypoints. The results show a significant increase in
the final return compared to existing RvS methods, with performance on par or
greater than existing state-of-the-art temporal difference learning-based
methods. Additionally, the performance and stability improvements are largest
in the most challenging environments and data configurations, including AntMaze
Large Play/Diverse and Kitchen Mixed/Partial.Comment: Accepted to the Conference on Neural Information Processing Systems
2023 (NeurIPS 2023
LLF-Bench: Benchmark for Interactive Learning from Language Feedback
We introduce a new benchmark, LLF-Bench (Learning from Language Feedback
Benchmark; pronounced as "elf-bench"), to evaluate the ability of AI agents to
interactively learn from natural language feedback and instructions. Learning
from language feedback (LLF) is essential for people, largely because the rich
information this feedback provides can help a learner avoid much of trial and
error and thereby speed up the learning process. Large Language Models (LLMs)
have recently enabled AI agents to comprehend natural language -- and hence AI
agents can potentially benefit from language feedback during learning like
humans do. But existing interactive benchmarks do not assess this crucial
capability: they either use numeric reward feedback or require no learning at
all (only planning or information retrieval). LLF-Bench is designed to fill
this omission. LLF-Bench is a diverse collection of sequential decision-making
tasks that includes user recommendation, poem writing, navigation, and robot
control. The objective of an agent is to interactively solve these tasks based
on their natural-language instructions and the feedback received after taking
actions. Crucially, to ensure that the agent actually "learns" from the
feedback, LLF-Bench implements several randomization techniques (such as
paraphrasing and environment randomization) to ensure that the task isn't
familiar to the agent and that the agent is robust to various verbalizations.
In addition, LLF-Bench provides a unified OpenAI Gym interface for all its
tasks and allows the users to easily configure the information the feedback
conveys (among suggestion, explanation, and instantaneous performance) to study
how agents respond to different types of feedback. Together, these features
make LLF-Bench a unique research platform for developing and testing LLF
agents
N′-(5-Bromo-2-methoxybenzylidene)-2-hydroxybenzohydrazide
The title Schiff base compound, C15H13BrN2O3, is derived from the condensation of 5-bromo-2-methoxybenzaldehyde with 2-hydroxybenzohydrazide in an ethanol solution. The dihedral angle between the two aromatic rings is 6.9 (9)°. The methoxy group is coplanar with the attached ring [C—O—C—C = 3.1 (12)°]. An intramolecular N—H⋯O hydrogen bond is observed. In the crystal structure, the molecules are linked into chains along the [001] direction by intermolecular O—H⋯N, O—H⋯O and C—H⋯O hydrogen bonds
N′-(2-Hydroxybenzylidene)-2-methoxybenzohydrazide monohydrate
In the title compound, C15H14N2O3·H2O, the Schiff base molecule is approximately planar, with a dihedral angle between the two aromatic rings of 10.2 (3)°. The molecular structure is stabilized by O—H⋯N and N—H⋯O hydrogen bonds. In the crystal structure, the Schiff base and water molecules are linked together by intermolecular O—H⋯O hydrogen bonds, forming chains parallel to the a axis
Trichlorido{2-[2-(η5-cyclopentadienyl)-2-methylpropyl]-1-trimethylsilyl-1H-imidazole-κN 3}titanium(IV) tetrahydrofuran hemisolvate
The title compound, [Ti(C15H23N2Si)Cl3]·0.5C4H8O, has been prepared from {2-[2-(η5-cyclopentadienyl)-2-methylpropyl]-1H-imidazolyl-κN
1}bis(N,N-diethylamido-κN)titanium(IV), (C12H14N2)Ti(NEt2)2, by reaction with excess of Me3SiCl in tetrahydrofuran (THF) at 353 K. The crystal structure contains THF as adduct solvent, disordered around a center of inversion. The presence of THF and the adduct ratio has been independently supported by 1H NMR spectroscopy. The coordination polyhedron of the Ti atom is distorted square-pyramidal, assuming the cyclopentadienyl (Cp) ring occupies one coordination site. The Ti, Si and CH2 group C atoms only deviate slightly from the imidazole ring plane [by 0.021 (4), 0.133 (4) and 0.094 (4) Å, respectively]. Comparison of the principal geometric parameters with those of the few known structurally characterized analogues reveal small differences in bond lengths and angles at the Ti atom. The title complex is only stable in THF-d
8 in the presence of excess Me3SiCl, otherwise it exists in an equilibrium with equimolar amounts of dichlorido{2-[2-(η5-cyclopentadienyl)-2-methylpropyl]-1H-imidazolyl-κN
3}titanium(IV) and chlorotrimethylsilane
4-Chloro-N′-(2-hydroxybenzylidene)benzohydrazide monohydrate
The asymmetric unit of the title compound, C14H11ClN2O2·H2O, contains a Schiff base molecule and a water molecule of crystallization. The dihedral angle between the two aromatic rings is 27.3 (4)°. In the crystal structure, molecules are linked into a two-dimensional network parallel to the bc plane by intermolecular O—H⋯O and N—H⋯O hydrogen bonds involving the water molecules
- …