183 research outputs found

    Manipulate by Seeing: Creating Manipulation Controllers from Pre-Trained Representations

    Full text link
    The field of visual representation learning has seen explosive growth in the past years, but its benefits in robotics have been surprisingly limited so far. Prior work uses generic visual representations as a basis to learn (task-specific) robot action policies (e.g., via behavior cloning). While the visual representations do accelerate learning, they are primarily used to encode visual observations. Thus, action information has to be derived purely from robot data, which is expensive to collect! In this work, we present a scalable alternative where the visual representations can help directly infer robot actions. We observe that vision encoders express relationships between image observations as distances (e.g., via embedding dot product) that could be used to efficiently plan robot behavior. We operationalize this insight and develop a simple algorithm for acquiring a distance function and dynamics predictor, by fine-tuning a pre-trained representation on human collected video sequences. The final method is able to substantially outperform traditional robot learning baselines (e.g., 70% success v.s. 50% for behavior cloning on pick-place) on a suite of diverse real-world manipulation tasks. It can also generalize to novel objects, without using any robot demonstrations during train time. For visualizations of the learned policies please check: https://agi-labs.github.io/manipulate-by-seeing/.Comment: Oral Presentation at the International Conference on Computer Vision (ICCV), 202

    Effects of appearance and gender on pre-touch proxemics in virtual reality

    Get PDF
    Virtual reality (VR) environments are increasingly popular for various applications, and the appearance of virtual characters is a critical factor that influences user behaviors. In this study, we aimed to investigate the impact of avatar and agent appearances on pre-touch proxemics in VR. To achieve this goal, we designed experiments utilizing three user avatars (man/woman/robot) and three virtual agents (man/woman/robot). Specifically, we measured the pre-touch reaction distances to the face and body, which are the distances at which a person starts to feel uncomfortable before being touched. We examined how these distances varied based on the appearances of avatars, agents, and user gender. Our results revealed that the appearance of avatars and agents significantly impacted pre-touch reaction distances. Specifically, those using a female avatar tended to maintain larger distances before their face and body to be touched, and people also preferred greater distances before being touched by a robot agent. Interestingly, we observed no effects of user gender on pre-touch reaction distances. These findings have implications for the design and implementation of VR systems, as they suggest that avatar and agent appearances play a significant role in shaping users’ perceptions of pre-touch proxemics. Our study highlights the importance of considering these factors when creating immersive and socially acceptable VR experiences

    Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting

    Full text link
    Most offline reinforcement learning (RL) algorithms return a target policy maximizing a trade-off between (1) the expected performance gain over the behavior policy that collected the dataset, and (2) the risk stemming from the out-of-distribution-ness of the induced state-action occupancy. It follows that the performance of the target policy is strongly related to the performance of the behavior policy and, thus, the trajectory return distribution of the dataset. We show that in mixed datasets consisting of mostly low-return trajectories and minor high-return trajectories, state-of-the-art offline RL algorithms are overly restrained by low-return trajectories and fail to exploit high-performing trajectories to the fullest. To overcome this issue, we show that, in deterministic MDPs with stochastic initial states, the dataset sampling can be re-weighted to induce an artificial dataset whose behavior policy has a higher return. This re-weighted sampling strategy may be combined with any offline RL algorithm. We further analyze that the opportunity for performance improvement over the behavior policy correlates with the positive-sided variance of the returns of the trajectories in the dataset. We empirically show that while CQL, IQL, and TD3+BC achieve only a part of this potential policy improvement, these same algorithms combined with our reweighted sampling strategy fully exploit the dataset. Furthermore, we empirically demonstrate that, despite its theoretical limitation, the approach may still be efficient in stochastic environments. The code is available at https://github.com/Improbable-AI/harness-offline-rl

    Blending the Material and Digital World for Hybrid Interfaces

    Get PDF
    The development of digital technologies in the 21st century is progressing continuously and new device classes such as tablets, smartphones or smartwatches are finding their way into our everyday lives. However, this development also poses problems, as these prevailing touch and gestural interfaces often lack tangibility, take little account of haptic qualities and therefore require full attention from their users. Compared to traditional tools and analog interfaces, the human skills to experience and manipulate material in its natural environment and context remain unexploited. To combine the best of both, a key question is how it is possible to blend the material world and digital world to design and realize novel hybrid interfaces in a meaningful way. Research on Tangible User Interfaces (TUIs) investigates the coupling between physical objects and virtual data. In contrast, hybrid interfaces, which specifically aim to digitally enrich analog artifacts of everyday work, have not yet been sufficiently researched and systematically discussed. Therefore, this doctoral thesis rethinks how user interfaces can provide useful digital functionality while maintaining their physical properties and familiar patterns of use in the real world. However, the development of such hybrid interfaces raises overarching research questions about the design: Which kind of physical interfaces are worth exploring? What type of digital enhancement will improve existing interfaces? How can hybrid interfaces retain their physical properties while enabling new digital functions? What are suitable methods to explore different design? And how to support technology-enthusiast users in prototyping? For a systematic investigation, the thesis builds on a design-oriented, exploratory and iterative development process using digital fabrication methods and novel materials. As a main contribution, four specific research projects are presented that apply and discuss different visual and interactive augmentation principles along real-world applications. The applications range from digitally-enhanced paper, interactive cords over visual watch strap extensions to novel prototyping tools for smart garments. While almost all of them integrate visual feedback and haptic input, none of them are built on rigid, rectangular pixel screens or use standard input modalities, as they all aim to reveal new design approaches. The dissertation shows how valuable it can be to rethink familiar, analog applications while thoughtfully extending them digitally. Finally, this thesis‚Äô extensive work of engineering versatile research platforms is accompanied by overarching conceptual work, user evaluations and technical experiments, as well as literature reviews.Die Durchdringung digitaler Technologien im 21. Jahrhundert schreitet stetig voran und neue Ger√§teklassen wie Tablets, Smartphones oder Smartwatches erobern unseren Alltag. Diese Entwicklung birgt aber auch Probleme, denn die vorherrschenden ber√ľhrungsempfindlichen Oberfl√§chen ber√ľcksichtigen kaum haptische Qualit√§ten und erfordern daher die volle Aufmerksamkeit ihrer Nutzer:innen. Im Vergleich zu traditionellen Werkzeugen und analogen Schnittstellen bleiben die menschlichen F√§higkeiten ungenutzt, die Umwelt mit allen Sinnen zu begreifen und wahrzunehmen. Um das Beste aus beiden Welten zu vereinen, stellt sich daher die Frage, wie neuartige hybride Schnittstellen sinnvoll gestaltet und realisiert werden k√∂nnen, um die materielle und die digitale Welt zu verschmelzen. In der Forschung zu Tangible User Interfaces (TUIs) wird die Verbindung zwischen physischen Objekten und virtuellen Daten untersucht. Noch nicht ausreichend erforscht wurden hingegen hybride Schnittstellen, die speziell darauf abzielen, physische Gegenst√§nde des Alltags digital zu erweitern und anhand geeigneter Designparameter und Entwurfsr√§ume systematisch zu untersuchen. In dieser Dissertation wird daher untersucht, wie Materialit√§t und Digitalit√§t nahtlos ineinander √ľbergehen k√∂nnen. Es soll erforscht werden, wie k√ľnftige Benutzungsschnittstellen n√ľtzliche digitale Funktionen bereitstellen k√∂nnen, ohne ihre physischen Eigenschaften und vertrauten Nutzungsmuster in der realen Welt zu verlieren. Die Entwicklung solcher hybriden Ans√§tze wirft jedoch √ľbergreifende Forschungsfragen zum Design auf: Welche Arten von physischen Schnittstellen sind es wert, betrachtet zu werden? Welche Art von digitaler Erweiterung verbessert das Bestehende? Wie k√∂nnen hybride Konzepte ihre physischen Eigenschaften beibehalten und gleichzeitig neue digitale Funktionen erm√∂glichen? Was sind geeignete Methoden, um verschiedene Designs zu erforschen? Wie kann man Technologiebegeisterte bei der Erstellung von Prototypen unterst√ľtzen? F√ľr eine systematische Untersuchung st√ľtzt sich die Arbeit auf einen designorientierten, explorativen und iterativen Entwicklungsprozess unter Verwendung digitaler Fabrikationsmethoden und neuartiger Materialien. Im Hauptteil werden vier Forschungsprojekte vorgestellt, die verschiedene visuelle und interaktive Prinzipien entlang realer Anwendungen diskutieren. Die Szenarien reichen von digital angereichertem Papier, interaktiven Kordeln √ľber visuelle Erweiterungen von Uhrarmb√§ndern bis hin zu neuartigen Prototyping-Tools f√ľr intelligente Kleidungsst√ľcke. Um neue Designans√§tze aufzuzeigen, integrieren nahezu alle visuelles Feedback und haptische Eingaben, um Alternativen zu Standard-Eingabemodalit√§ten auf starren Pixelbildschirmen zu schaffen. Die Dissertation hat gezeigt, wie wertvoll es sein kann, bekannte, analoge Anwendungen zu √ľberdenken und sie dabei gleichzeitig mit Bedacht digital zu erweitern. Dabei umfasst die vorliegende Arbeit sowohl realisierte technische Forschungsplattformen als auch √ľbergreifende konzeptionelle Arbeiten, Nutzerstudien und technische Experimente sowie die Analyse existierender Forschungsarbeiten

    Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization

    Full text link
    Most offline reinforcement learning (RL) methods suffer from the trade-off between improving the policy to surpass the behavior policy and constraining the policy to limit the deviation from the behavior policy as computing QQ-values using out-of-distribution (OOD) actions will suffer from errors due to distributional shift. The recently proposed \textit{In-sample Learning} paradigm (i.e., IQL), which improves the policy by quantile regression using only data samples, shows great promise because it learns an optimal policy without querying the value function of any unseen actions. However, it remains unclear how this type of method handles the distributional shift in learning the value function. In this work, we make a key finding that the in-sample learning paradigm arises under the \textit{Implicit Value Regularization} (IVR) framework. This gives a deeper understanding of why the in-sample learning paradigm works, i.e., it applies implicit value regularization to the policy. Based on the IVR framework, we further propose two practical algorithms, Sparse QQ-learning (SQL) and Exponential QQ-learning (EQL), which adopt the same value regularization used in existing works, but in a complete in-sample manner. Compared with IQL, we find that our algorithms introduce sparsity in learning the value function, making them more robust in noisy data regimes. We also verify the effectiveness of SQL and EQL on D4RL benchmark datasets and show the benefits of in-sample learning by comparing them with CQL in small data regimes.Comment: ICLR 2023 notable top 5

    Do You Mind? User Perceptions of Machine Consciousness

    Get PDF
    The prospect of machine consciousness cultivates controversy across media, academia, and industry. Assessing whether non-experts perceive technologies as conscious, and exploring the consequences of this perception, are yet unaddressed challenges in Human Computer Interaction (HCI). To address them, we surveyed 100 people, exploring their conceptualisations of consciousness and if and how they perceive consciousness in currently available interactive technologies. We show that many people already perceive a degree of consciousness in GPT-3, a voice chat bot, and a robot vacuum cleaner. Within participant responses we identified dynamic tensions between denial and speculation, thinking and feeling, interaction and experience, control and independence, and rigidity and spontaneity. These tensions can inform future research into perceptions of machine consciousness and the challenges it represents for HCI. With both empirical and theoretical contributions, this paper emphasises the importance of HCI in an era of machine consciousness, real, perceived or denied

    Explainable Reinforcement Learning via a Causal World Model

    Full text link
    Generating explanations for reinforcement learning (RL) is challenging as actions may produce long-term effects on the future. In this paper, we develop a novel framework for explainable RL by learning a causal world model without prior knowledge of the causal structure of the environment. The model captures the influence of actions, allowing us to interpret the long-term effects of actions through causal chains, which present how actions influence environmental variables and finally lead to rewards. Different from most explanatory models which suffer from low accuracy, our model remains accurate while improving explainability, making it applicable in model-based learning. As a result, we demonstrate that our causal model can serve as the bridge between explainability and learning.Comment: Accepted by IJCAI 202

    Replicable Reinforcement Learning

    Full text link
    The replicability crisis in the social, behavioral, and data sciences has led to the formulation of algorithm frameworks for replicability -- i.e., a requirement that an algorithm produce identical outputs (with high probability) when run on two different samples from the same underlying distribution. While still in its infancy, provably replicable algorithms have been developed for many fundamental tasks in machine learning and statistics, including statistical query learning, the heavy hitters problem, and distribution testing. In this work we initiate the study of replicable reinforcement learning, providing a provably replicable algorithm for parallel value iteration, and a provably replicable version of R-max in the episodic setting. These are the first formal replicability results for control problems, which present different challenges for replication than batch learning settings

    Cherry-Picking with Reinforcement Learning : Robust Dynamic Grasping in Unstable Conditions

    Full text link
    Grasping small objects surrounded by unstable or non-rigid material plays a crucial role in applications such as surgery, harvesting, construction, disaster recovery, and assisted feeding. This task is especially difficult when fine manipulation is required in the presence of sensor noise and perception errors; errors inevitably trigger dynamic motion, which is challenging to model precisely. Circumventing the difficulty to build accurate models for contacts and dynamics, data-driven methods like reinforcement learning (RL) can optimize task performance via trial and error, reducing the need for accurate models of contacts and dynamics. Applying RL methods to real robots, however, has been hindered by factors such as prohibitively high sample complexity or the high training infrastructure cost for providing resets on hardware. This work presents CherryBot, an RL system that uses chopsticks for fine manipulation that surpasses human reactiveness for some dynamic grasping tasks. By integrating imprecise simulators, suboptimal demonstrations and external state estimation, we study how to make a real-world robot learning system sample efficient and general while reducing the human effort required for supervision. Our system shows continual improvement through 30 minutes of real-world interaction: through reactive retry, it achieves an almost 100% success rate on the demanding task of using chopsticks to grasp small objects swinging in the air. We demonstrate the reactiveness, robustness and generalizability of CherryBot to varying object shapes and dynamics (e.g., external disturbances like wind and human perturbations). Videos are available at https://goodcherrybot.github.io/

    A Diffusion-based Method for Multi-turn Compositional Image Generation

    Full text link
    Multi-turn compositional image generation (M-CIG) is a challenging task that aims to iteratively manipulate a reference image given a modification text. While most of the existing methods for M-CIG are based on generative adversarial networks (GANs), recent advances in image generation have demonstrated the superiority of diffusion models over GANs. In this paper, we propose a diffusion-based method for M-CIG named conditional denoising diffusion with image compositional matching (CDD-ICM). We leverage CLIP as the backbone of image and text encoders, and incorporate a gated fusion mechanism, originally proposed for question answering, to compositionally fuse the reference image and the modification text at each turn of M-CIG. We introduce a conditioning scheme to generate the target image based on the fusion results. To prioritize the semantic quality of the generated target image, we learn an auxiliary image compositional match (ICM) objective, along with the conditional denoising diffusion (CDD) objective in a multi-task learning framework. Additionally, we also perform ICM guidance and classifier-free guidance to improve performance. Experimental results show that CDD-ICM achieves state-of-the-art results on two benchmark datasets for M-CIG, i.e., CoDraw and i-CLEVR
    • ‚Ķ
    corecore