28 research outputs found

    Object Permanence Emerges in a Random Walk along Memory

    Full text link
    This paper proposes a self-supervised objective for learning representations that localize objects under occlusion - a property known as object permanence. A central question is the choice of learning signal in cases of total occlusion. Rather than directly supervising the locations of invisible objects, we propose a self-supervised objective that requires neither human annotation, nor assumptions about object dynamics. We show that object permanence can emerge by optimizing for temporal coherence of memory: we fit a Markov walk along a space-time graph of memories, where the states in each time step are non-Markovian features from a sequence encoder. This leads to a memory representation that stores occluded objects and predicts their motion, to better localize them. The resulting model outperforms existing approaches on several datasets of increasing complexity and realism, despite requiring minimal supervision and assumptions, and hence being broadly applicable

    Towards Practical Multi-Object Manipulation using Relational Reinforcement Learning

    Full text link
    Learning robotic manipulation tasks using reinforcement learning with sparse rewards is currently impractical due to the outrageous data requirements. Many practical tasks require manipulation of multiple objects, and the complexity of such tasks increases with the number of objects. Learning from a curriculum of increasingly complex tasks appears to be a natural solution, but unfortunately, does not work for many scenarios. We hypothesize that the inability of the state-of-the-art algorithms to effectively utilize a task curriculum stems from the absence of inductive biases for transferring knowledge from simpler to complex tasks. We show that graph-based relational architectures overcome this limitation and enable learning of complex tasks when provided with a simple curriculum of tasks with increasing numbers of objects. We demonstrate the utility of our framework on a simulated block stacking task. Starting from scratch, our agent learns to stack six blocks into a tower. Despite using step-wise sparse rewards, our method is orders of magnitude more data-efficient and outperforms the existing state-of-the-art method that utilizes human demonstrations. Furthermore, the learned policy exhibits zero-shot generalization, successfully stacking blocks into taller towers and previously unseen configurations such as pyramids, without any further training.Comment: 10 pages, 4 figures and 1 table in main article, 3 figures and 3 tables in appendix. Supplementary website and videos at https://richardrl.github.io/relational-rl

    DORSal: Diffusion for Object-centric Representations of Scenes et al

    Full text link
    Recent progress in 3D scene understanding enables scalable learning of representations across large datasets of diverse scenes. As a consequence, generalization to unseen scenes and objects, rendering novel views from just a single or a handful of input images, and controllable scene generation that supports editing, is now possible. However, training jointly on a large number of scenes typically compromises rendering quality when compared to single-scene optimized models such as NeRFs. In this paper, we leverage recent progress in diffusion models to equip 3D scene representation learning models with the ability to render high-fidelity novel views, while retaining benefits such as object-level scene editing to a large degree. In particular, we propose DORSal, which adapts a video diffusion architecture for 3D scene generation conditioned on frozen object-centric slot-based representations of scenes. On both complex synthetic multi-object scenes and on the real-world large-scale Street View dataset, we show that DORSal enables scalable neural rendering of 3D scenes with object-level editing and improves upon existing approaches.Comment: Project page: https://www.sjoerdvansteenkiste.com/dorsa

    Properties and structural behavior of concrete containing fine sand contaminated with light crude oil

    Get PDF
    Mixing crude oil contaminated sand with cement and using this mix as an alternative construction material is considered an innovative and cost-effective approach to reduce its negative environmental impact. In this study, the compressive and splitting tensile strength of concrete with different levels of light crude oil contamination (0, 1, 2, 6, 10 and 20%) were evaluated. Microstructure observation was also conducted to better understand better on how the oil contamination is affecting the concrete properties. The bond strength of steel reinforcement and a comparative evaluation of the flexural behaviour of steel reinforced beams using concrete with 0% and 6% oil contamination was carried out. Results showed that concrete with light crude oil contamination can retain most of its compressive and splitting tensile strength at a contamination level of up to 6%. A good bond between the steel reinforcement and concrete can be achieved up to this level of oil contamination. The concrete beam with 6% oil contamination exhibited only a 20% reduction in the moment capacity compared to a beam using uncontaminated concrete. Simplified empirical equations were also proposed to reliably predict the mechanical properties of concrete containing oil contaminated sand

    To respond or not to respond - a personal perspective of intestinal tolerance

    Get PDF
    For many years, the intestine was one of the poor relations of the immunology world, being a realm inhabited mostly by specialists and those interested in unusual phenomena. However, this has changed dramatically in recent years with the realization of how important the microbiota is in shaping immune function throughout the body, and almost every major immunology institution now includes the intestine as an area of interest. One of the most important aspects of the intestinal immune system is how it discriminates carefully between harmless and harmful antigens, in particular, its ability to generate active tolerance to materials such as commensal bacteria and food proteins. This phenomenon has been recognized for more than 100 years, and it is essential for preventing inflammatory disease in the intestine, but its basis remains enigmatic. Here, I discuss the progress that has been made in understanding oral tolerance during my 40 years in the field and highlight the topics that will be the focus of future research

    Towards First-Person Context Awareness: Discovery of User Routine from Egocentric Video using Topic Models and Structure from Motion

    Full text link
    One of the ultimate goals of our pursuit of AI is to create intelligent machines that help us live our lives. In order to help us, these agents must gather a sense of our context. Already, personal computing technologies like Google Now use ego-centric (first-person) data - email, calendar, and other personal routine information - as actionable context. Recently, wearables have brought us the opportunity to easily capture many types of ego-centric data - including visual data. It is easy to imagine the potential impact of a context-aware intelligent assistant - ¿aware¿ of not only textual data but immediate visual information - for applications from assisted daily living to annotated augmented reality and self-organized life-logs. We imagine a future world when wearable computing is ubiquitous, and as a result, lifelogs and similar visual data are abundant. The problem of understanding user routine from ¿big¿ egocentric data naturally extends itself as an important machine learning problem. Our key observation is that egocentric data is ¿overfit¿ to person wearing it. Because human behavior tends to be periodic, hence the notion of ¿routine¿, lifelog data must then be a series of manifestations of periodic scenes. Using techniques inspired by work in scene understanding, ubiquitous computing, and 3D scene modeling, we propose two complementary approaches for discovering routine structure in ego-centric image data. We take a scene understanding approach, interpreting routine as periodic visits in meaningful scenes. For a robust representation of routine visual scenes, we propose a formulation of routine visual context as probablistic combinations of scene features discovered from a visual lifelog corpus using topic modeling. Concurrently, we discover the 3D spatial structure of routine scenes by incrementally building SFM models from images of the same spatial context. For proof of concept, we implement our framework using the Google Glass and an infrastructure that we call SUNglass
    corecore