2,694 research outputs found
Object-Oriented Dynamics Learning through Multi-Level Abstraction
Object-based approaches for learning action-conditioned dynamics has
demonstrated promise for generalization and interpretability. However, existing
approaches suffer from structural limitations and optimization difficulties for
common environments with multiple dynamic objects. In this paper, we present a
novel self-supervised learning framework, called Multi-level Abstraction
Object-oriented Predictor (MAOP), which employs a three-level learning
architecture that enables efficient object-based dynamics learning from raw
visual observations. We also design a spatial-temporal relational reasoning
mechanism for MAOP to support instance-level dynamics learning and handle
partial observability. Our results show that MAOP significantly outperforms
previous methods in terms of sample efficiency and generalization over novel
environments for learning environment models. We also demonstrate that learned
dynamics models enable efficient planning in unseen environments, comparable to
true environment models. In addition, MAOP learns semantically and visually
interpretable disentangled representations.Comment: Accepted to the Thirthy-Fourth AAAI Conference On Artificial
Intelligence (AAAI), 202
HoME: a Household Multimodal Environment
We introduce HoME: a Household Multimodal Environment for artificial agents
to learn from vision, audio, semantics, physics, and interaction with objects
and other agents, all within a realistic context. HoME integrates over 45,000
diverse 3D house layouts based on the SUNCG dataset, a scale which may
facilitate learning, generalization, and transfer. HoME is an open-source,
OpenAI Gym-compatible platform extensible to tasks in reinforcement learning,
language grounding, sound-based navigation, robotics, multi-agent learning, and
more. We hope HoME better enables artificial agents to learn as humans do: in
an interactive, multimodal, and richly contextualized setting.Comment: Presented at NIPS 2017's Visually-Grounded Interaction and Language
Worksho
Edge Guided GANs with Multi-Scale Contrastive Learning for Semantic Image Synthesis
We propose a novel ECGAN for the challenging semantic image synthesis task.
Although considerable improvements have been achieved by the community in the
recent period, the quality of synthesized images is far from satisfactory due
to three largely unresolved challenges. 1) The semantic labels do not provide
detailed structural information, making it challenging to synthesize local
details and structures; 2) The widely adopted CNN operations such as
convolution, down-sampling, and normalization usually cause spatial resolution
loss and thus cannot fully preserve the original semantic information, leading
to semantically inconsistent results (e.g., missing small objects); 3) Existing
semantic image synthesis methods focus on modeling 'local' semantic information
from a single input semantic layout. However, they ignore 'global' semantic
information of multiple input semantic layouts, i.e., semantic cross-relations
between pixels across different input layouts. To tackle 1), we propose to use
the edge as an intermediate representation which is further adopted to guide
image generation via a proposed attention guided edge transfer module. To
tackle 2), we design an effective module to selectively highlight
class-dependent feature maps according to the original semantic layout to
preserve the semantic information. To tackle 3), inspired by current methods in
contrastive learning, we propose a novel contrastive learning method, which
aims to enforce pixel embeddings belonging to the same semantic class to
generate more similar image content than those from different classes. We
further propose a novel multi-scale contrastive learning method that aims to
push same-class features from different scales closer together being able to
capture more semantic relations by explicitly exploring the structures of
labeled pixels from multiple input semantic layouts from different scales.Comment: Accepted to TPAMI, an extended version of a paper published in
ICLR2023. arXiv admin note: substantial text overlap with arXiv:2003.1389
MySemCloud: Semantic-aware Word Cloud Editing
Word clouds are a popular text visualization technique that summarize an
input text by displaying its most important words in a compact image. The
traditional layout methods do not take proximity effects between words into
account; this has been improved in semantic word clouds, where relative word
placement is controlled by edges in a word similarity graph. We introduce
MySemCloud, a new human-in-the-loop tool to visualize and edit semantic word
clouds. MySemCloud lets users perform computer-assisted local moves of words,
which improve or at least retain the semantic quality. To achieve this, we
construct a word similarity graph on which a system of forces is applied to
generate a compact initial layout with good semantic quality. The force system
also allows us to maintain these attributes after each user interaction, as
well as preserve the user's mental map. The tool provides algorithmic support
for the editing operations to help the user enhance the semantic quality of the
visualization, while adjusting it to their personal preference. We show that
MySemCloud provides high user satisfaction as well as permits users to create
layouts of higher quality than state-of-the-art semantic word cloud generation
tools.Comment: Appeared at PacificVis 202
- …