104,228 research outputs found
Language-driven Object Fusion into Neural Radiance Fields with Pose-Conditioned Dataset Updates
Neural radiance field is an emerging rendering method that generates
high-quality multi-view consistent images from a neural scene representation
and volume rendering. Although neural radiance field-based techniques are
robust for scene reconstruction, their ability to add or remove objects remains
limited. This paper proposes a new language-driven approach for object
manipulation with neural radiance fields through dataset updates. Specifically,
to insert a new foreground object represented by a set of multi-view images
into a background radiance field, we use a text-to-image diffusion model to
learn and generate combined images that fuse the object of interest into the
given background across views. These combined images are then used for refining
the background radiance field so that we can render view-consistent images
containing both the object and the background. To ensure view consistency, we
propose a dataset updates strategy that prioritizes radiance field training
with camera views close to the already-trained views prior to propagating the
training to remaining views. We show that under the same dataset updates
strategy, we can easily adapt our method for object insertion using data from
text-to-3D models as well as object removal. Experimental results show that our
method generates photorealistic images of the edited scenes, and outperforms
state-of-the-art methods in 3D reconstruction and neural radiance field
blending
Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views
Learning object-centric representations of multi-object scenes is a promising
approach towards machine intelligence, facilitating high-level reasoning and
control from visual sensory data. However, current approaches for unsupervised
object-centric scene representation are incapable of aggregating information
from multiple observations of a scene. As a result, these "single-view" methods
form their representations of a 3D scene based only on a single 2D observation
(view). Naturally, this leads to several inaccuracies, with these methods
falling victim to single-view spatial ambiguities. To address this, we propose
The Multi-View and Multi-Object Network (MulMON) -- a method for learning
accurate, object-centric representations of multi-object scenes by leveraging
multiple views. In order to sidestep the main technical difficulty of the
multi-object-multi-view scenario -- maintaining object correspondences across
views -- MulMON iteratively updates the latent object representations for a
scene over multiple views. To ensure that these iterative updates do indeed
aggregate spatial information to form a complete 3D scene understanding, MulMON
is asked to predict the appearance of the scene from novel viewpoints during
training. Through experiments, we show that MulMON better-resolves spatial
ambiguities than single-view methods -- learning more accurate and disentangled
object representations -- and also achieves new functionality in predicting
object segmentations for novel viewpoints.Comment: Accepted at NeurIPS 2020 (Spotlight
Update propagation in chimera, an active DOOD language
Propagating updates is an important task to be performed within many database
services such as integrity checking, maintenance of materialized views, and condition
monitoring. This paper is concerned with the propagation of updates in an active DOOD
language. The approach proposed is to make use of Chimera triggers for computing
induced updates. It will be shown how a subset of Chimera's deductive rules can be
compiled to update propagation triggers. In its expressiveness the rule set considered
corresponds to that of Datalog with sets and negation. Using triggers for implementing
update propgation has the advantage that no special component has to be implemented
as a trigger mechanism has. to exist anyway. In this paper we will not propose new
techniques for computing induced updates but will transfer the techniques - well-known
for the relational model - to the object-oriented case
Consistent Unanticipated Adaptation for Context-Dependent Applications
Unanticipated adaptation allows context-dependent applications to overcome the limitation of foreseen adaptation by incorporating previously unknown behavior. Introducing this concept in language-based approaches leads to inconsistencies as an object can have different views in different contexts. Existing language-based approaches do not address unanticipated adaptation and its associated run-time inconsistencies. We propose an architecture for unanticipated adaptation at run time based on dynamic instance binding crafted in a loosely manner to asynchronously replace adaptable entities that allow for behavioral changes of objects. To solve inconsistencies, we introduce the notion of transactions at the object level. Transactions guard the changing objects during their execution, ensuring consistent views. This allows for disruption-free, safe updates of adaptable entities by means of consistent unanticipated adaptation
Look Further to Recognize Better: Learning Shared Topics and Category-Specific Dictionaries for Open-Ended 3D Object Recognition
Service robots are expected to operate effectively in human-centric environments for long periods of time. In such realistic scenarios, fine-grained object categorization is as important as basic-level object categorization. We tackle this problem by proposing an open-ended object recognition approach which concurrently learns both the object categories and the local features for encoding objects. In this work, each object is represented using a set of general latent visual topics and category-specific dictionaries. The general topics encode the common patterns of all categories, while the category-specific dictionary describes the content of each category in details. The proposed approach discovers both sets of general and specific representations in an unsupervised fashion and updates them incrementally using new object views. Experimental resultsshow that our approach yields significant improvements over the previous state-of-the-art approaches concerning scalability and object classification performance. Moreover, our approach demonstrates the capability of learning from very few training examples in a real-world setting. Regarding computation time, the best result was obtained with a Bag-of-Words method closely followed by a variant of the Latent Dirichlet Allocation approach
Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks
It is common to implicitly assume access to intelligently captured inputs
(e.g., photos from a human photographer), yet autonomously capturing good
observations is itself a major challenge. We address the problem of learning to
look around: if a visual agent has the ability to voluntarily acquire new views
to observe its environment, how can it learn efficient exploratory behaviors to
acquire informative observations? We propose a reinforcement learning solution,
where the agent is rewarded for actions that reduce its uncertainty about the
unobserved portions of its environment. Based on this principle, we develop a
recurrent neural network-based approach to perform active completion of
panoramic natural scenes and 3D object shapes. Crucially, the learned policies
are not tied to any recognition task nor to the particular semantic content
seen during training. As a result, 1) the learned "look around" behavior is
relevant even for new tasks in unseen environments, and 2) training data
acquisition involves no manual labeling. Through tests in diverse settings, we
demonstrate that our approach learns useful generic policies that transfer to
new unseen tasks and environments. Completion episodes are shown at
https://goo.gl/BgWX3W
- …