28 research outputs found
Object Permanence Emerges in a Random Walk along Memory
This paper proposes a self-supervised objective for learning representations
that localize objects under occlusion - a property known as object permanence.
A central question is the choice of learning signal in cases of total
occlusion. Rather than directly supervising the locations of invisible objects,
we propose a self-supervised objective that requires neither human annotation,
nor assumptions about object dynamics. We show that object permanence can
emerge by optimizing for temporal coherence of memory: we fit a Markov walk
along a space-time graph of memories, where the states in each time step are
non-Markovian features from a sequence encoder. This leads to a memory
representation that stores occluded objects and predicts their motion, to
better localize them. The resulting model outperforms existing approaches on
several datasets of increasing complexity and realism, despite requiring
minimal supervision and assumptions, and hence being broadly applicable
Towards Practical Multi-Object Manipulation using Relational Reinforcement Learning
Learning robotic manipulation tasks using reinforcement learning with sparse
rewards is currently impractical due to the outrageous data requirements. Many
practical tasks require manipulation of multiple objects, and the complexity of
such tasks increases with the number of objects. Learning from a curriculum of
increasingly complex tasks appears to be a natural solution, but unfortunately,
does not work for many scenarios. We hypothesize that the inability of the
state-of-the-art algorithms to effectively utilize a task curriculum stems from
the absence of inductive biases for transferring knowledge from simpler to
complex tasks. We show that graph-based relational architectures overcome this
limitation and enable learning of complex tasks when provided with a simple
curriculum of tasks with increasing numbers of objects. We demonstrate the
utility of our framework on a simulated block stacking task. Starting from
scratch, our agent learns to stack six blocks into a tower. Despite using
step-wise sparse rewards, our method is orders of magnitude more data-efficient
and outperforms the existing state-of-the-art method that utilizes human
demonstrations. Furthermore, the learned policy exhibits zero-shot
generalization, successfully stacking blocks into taller towers and previously
unseen configurations such as pyramids, without any further training.Comment: 10 pages, 4 figures and 1 table in main article, 3 figures and 3
tables in appendix. Supplementary website and videos at
https://richardrl.github.io/relational-rl
DORSal: Diffusion for Object-centric Representations of Scenes et al
Recent progress in 3D scene understanding enables scalable learning of
representations across large datasets of diverse scenes. As a consequence,
generalization to unseen scenes and objects, rendering novel views from just a
single or a handful of input images, and controllable scene generation that
supports editing, is now possible. However, training jointly on a large number
of scenes typically compromises rendering quality when compared to single-scene
optimized models such as NeRFs. In this paper, we leverage recent progress in
diffusion models to equip 3D scene representation learning models with the
ability to render high-fidelity novel views, while retaining benefits such as
object-level scene editing to a large degree. In particular, we propose DORSal,
which adapts a video diffusion architecture for 3D scene generation conditioned
on frozen object-centric slot-based representations of scenes. On both complex
synthetic multi-object scenes and on the real-world large-scale Street View
dataset, we show that DORSal enables scalable neural rendering of 3D scenes
with object-level editing and improves upon existing approaches.Comment: Project page: https://www.sjoerdvansteenkiste.com/dorsa
Properties and structural behavior of concrete containing fine sand contaminated with light crude oil
Mixing crude oil contaminated sand with cement and using this mix as an alternative construction material is considered an innovative and cost-effective approach to reduce its negative environmental impact. In this study, the compressive and splitting tensile strength of concrete with different levels of light crude oil contamination (0, 1, 2, 6, 10 and 20%) were evaluated. Microstructure observation was also conducted to better understand better on how the oil contamination is affecting the concrete properties. The bond strength of steel reinforcement and a comparative evaluation of the flexural behaviour of steel reinforced beams using concrete with 0% and 6% oil contamination was carried out. Results showed that concrete with light crude oil contamination can retain most of its compressive and splitting tensile strength at a contamination level of up to 6%. A good bond between the steel reinforcement and concrete can be achieved up to this level of oil contamination. The concrete beam with 6% oil contamination exhibited only a 20% reduction in the moment capacity compared to a beam using uncontaminated concrete. Simplified empirical equations were also proposed to reliably predict the mechanical properties of concrete containing oil contaminated sand
To respond or not to respond - a personal perspective of intestinal tolerance
For many years, the intestine was one of the poor relations of the immunology world, being a realm inhabited mostly by specialists and those interested in unusual phenomena. However, this has changed dramatically in recent years with the realization of how important the microbiota is in shaping immune function throughout the body, and almost every major immunology institution now includes the intestine as an area of interest. One of the most important aspects of the intestinal immune system is how it discriminates carefully between harmless and harmful antigens, in particular, its ability to generate active tolerance to materials such as commensal bacteria and food proteins. This phenomenon has been recognized for more than 100 years, and it is essential for preventing inflammatory disease in the intestine, but its basis remains enigmatic. Here, I discuss the progress that has been made in understanding oral tolerance during my 40 years in the field and highlight the topics that will be the focus of future research
Towards First-Person Context Awareness: Discovery of User Routine from Egocentric Video using Topic Models and Structure from Motion
One of the ultimate goals of our pursuit of AI is to create intelligent machines that help us live
our lives. In order to help us, these agents must gather a sense of our context. Already, personal
computing technologies like Google Now use ego-centric (first-person) data - email, calendar,
and other personal routine information - as actionable context. Recently, wearables have brought
us the opportunity to easily capture many types of ego-centric data - including visual data. It
is easy to imagine the potential impact of a context-aware intelligent assistant - ¿aware¿ of not
only textual data but immediate visual information - for applications from assisted daily living
to annotated augmented reality and self-organized life-logs. We imagine a future world when
wearable computing is ubiquitous, and as a result, lifelogs and similar visual data are abundant.
The problem of understanding user routine from ¿big¿ egocentric data naturally extends itself as
an important machine learning problem. Our key observation is that egocentric data is ¿overfit¿ to
person wearing it. Because human behavior tends to be periodic, hence the notion of ¿routine¿,
lifelog data must then be a series of manifestations of periodic scenes. Using techniques inspired
by work in scene understanding, ubiquitous computing, and 3D scene modeling, we propose two
complementary approaches for discovering routine structure in ego-centric image data. We take a scene understanding approach, interpreting routine as periodic visits in meaningful scenes. For a
robust representation of routine visual scenes, we propose a formulation of routine visual context
as probablistic combinations of scene features discovered from a visual lifelog corpus using topic
modeling. Concurrently, we discover the 3D spatial structure of routine scenes by incrementally
building SFM models from images of the same spatial context. For proof of concept, we implement
our framework using the Google Glass and an infrastructure that we call SUNglass