1,046 research outputs found

    Waypoint Transformer: Reinforcement Learning via Supervised Learning with Intermediate Targets

    Full text link
    Despite the recent advancements in offline reinforcement learning via supervised learning (RvS) and the success of the decision transformer (DT) architecture in various domains, DTs have fallen short in several challenging benchmarks. The root cause of this underperformance lies in their inability to seamlessly connect segments of suboptimal trajectories. To overcome this limitation, we present a novel approach to enhance RvS methods by integrating intermediate targets. We introduce the Waypoint Transformer (WT), using an architecture that builds upon the DT framework and conditioned on automatically-generated waypoints. The results show a significant increase in the final return compared to existing RvS methods, with performance on par or greater than existing state-of-the-art temporal difference learning-based methods. Additionally, the performance and stability improvements are largest in the most challenging environments and data configurations, including AntMaze Large Play/Diverse and Kitchen Mixed/Partial.Comment: Accepted to the Conference on Neural Information Processing Systems 2023 (NeurIPS 2023

    LLF-Bench: Benchmark for Interactive Learning from Language Feedback

    Full text link
    We introduce a new benchmark, LLF-Bench (Learning from Language Feedback Benchmark; pronounced as "elf-bench"), to evaluate the ability of AI agents to interactively learn from natural language feedback and instructions. Learning from language feedback (LLF) is essential for people, largely because the rich information this feedback provides can help a learner avoid much of trial and error and thereby speed up the learning process. Large Language Models (LLMs) have recently enabled AI agents to comprehend natural language -- and hence AI agents can potentially benefit from language feedback during learning like humans do. But existing interactive benchmarks do not assess this crucial capability: they either use numeric reward feedback or require no learning at all (only planning or information retrieval). LLF-Bench is designed to fill this omission. LLF-Bench is a diverse collection of sequential decision-making tasks that includes user recommendation, poem writing, navigation, and robot control. The objective of an agent is to interactively solve these tasks based on their natural-language instructions and the feedback received after taking actions. Crucially, to ensure that the agent actually "learns" from the feedback, LLF-Bench implements several randomization techniques (such as paraphrasing and environment randomization) to ensure that the task isn't familiar to the agent and that the agent is robust to various verbalizations. In addition, LLF-Bench provides a unified OpenAI Gym interface for all its tasks and allows the users to easily configure the information the feedback conveys (among suggestion, explanation, and instantaneous performance) to study how agents respond to different types of feedback. Together, these features make LLF-Bench a unique research platform for developing and testing LLF agents

    N′-(5-Bromo-2-methoxy­benzyl­idene)-2-hydroxy­benzohydrazide

    Get PDF
    The title Schiff base compound, C15H13BrN2O3, is derived from the condensation of 5-bromo-2-methoxy­benzaldehyde with 2-hydroxy­benzohydrazide in an ethanol solution. The dihedral angle between the two aromatic rings is 6.9 (9)°. The meth­oxy group is coplanar with the attached ring [C—O—C—C = 3.1 (12)°]. An intra­molecular N—H⋯O hydrogen bond is observed. In the crystal structure, the mol­ecules are linked into chains along the [001] direction by inter­molecular O—H⋯N, O—H⋯O and C—H⋯O hydrogen bonds

    N′-(2-Hydroxy­benzyl­idene)-2-methoxy­benzohydrazide monohydrate

    Get PDF
    In the title compound, C15H14N2O3·H2O, the Schiff base mol­ecule is approximately planar, with a dihedral angle between the two aromatic rings of 10.2 (3)°. The mol­ecular structure is stabilized by O—H⋯N and N—H⋯O hydrogen bonds. In the crystal structure, the Schiff base and water mol­ecules are linked together by inter­molecular O—H⋯O hydrogen bonds, forming chains parallel to the a axis

    Trichlorido{2-[2-(η5-cyclo­penta­dien­yl)-2-methyl­prop­yl]-1-trimethyl­silyl-1H-imidazole-κN 3}titanium(IV) tetra­hydro­furan hemisolvate

    Get PDF
    The title compound, [Ti(C15H23N2Si)Cl3]·0.5C4H8O, has been prepared from {2-[2-(η5-cyclo­penta­dien­yl)-2-methyl­prop­yl]-1H-imidazolyl-κN 1}bis­(N,N-diethyl­amido-κN)titanium(IV), (C12H14N2)Ti(NEt2)2, by reaction with excess of Me3SiCl in tetra­hydro­furan (THF) at 353 K. The crystal structure contains THF as adduct solvent, disordered around a center of inversion. The presence of THF and the adduct ratio has been independently supported by 1H NMR spectroscopy. The coordination polyhedron of the Ti atom is distorted square-pyramidal, assuming the cyclo­penta­dienyl (Cp) ring occupies one coordination site. The Ti, Si and CH2 group C atoms only deviate slightly from the imidazole ring plane [by 0.021 (4), 0.133 (4) and 0.094 (4) Å, respectively]. Comparison of the principal geometric parameters with those of the few known structurally characterized analogues reveal small differences in bond lengths and angles at the Ti atom. The title complex is only stable in THF-d 8 in the presence of excess Me3SiCl, otherwise it exists in an equilibrium with equimolar amounts of dichlorido{2-[2-(η5-cyclo­penta­dien­yl)-2-methyl­prop­yl]-1H-imidazolyl-κN 3}titanium(IV) and chloro­trimethyl­silane

    4-Chloro-N′-(2-hydroxy­benzyl­idene)benzohydrazide monohydrate

    Get PDF
    The asymmetric unit of the title compound, C14H11ClN2O2·H2O, contains a Schiff base mol­ecule and a water mol­ecule of crystallization. The dihedral angle between the two aromatic rings is 27.3 (4)°. In the crystal structure, mol­ecules are linked into a two-dimensional network parallel to the bc plane by inter­molecular O—H⋯O and N—H⋯O hydrogen bonds involving the water mol­ecules
    corecore