7,867 research outputs found
Neural Motifs: Scene Graph Parsing with Global Context
We investigate the problem of producing structured graph representations of
visual scenes. Our work analyzes the role of motifs: regularly appearing
substructures in scene graphs. We present new quantitative insights on such
repeated structures in the Visual Genome dataset. Our analysis shows that
object labels are highly predictive of relation labels but not vice-versa. We
also find that there are recurring patterns even in larger subgraphs: more than
50% of graphs contain motifs involving at least two relations. Our analysis
motivates a new baseline: given object detections, predict the most frequent
relation between object pairs with the given labels, as seen in the training
set. This baseline improves on the previous state-of-the-art by an average of
3.6% relative improvement across evaluation settings. We then introduce Stacked
Motif Networks, a new architecture designed to capture higher order motifs in
scene graphs that further improves over our strong baseline by an average 7.1%
relative gain. Our code is available at github.com/rowanz/neural-motifs.Comment: CVPR 2018 camera read
Robot Navigation in Unseen Spaces using an Abstract Map
Human navigation in built environments depends on symbolic spatial
information which has unrealised potential to enhance robot navigation
capabilities. Information sources such as labels, signs, maps, planners, spoken
directions, and navigational gestures communicate a wealth of spatial
information to the navigators of built environments; a wealth of information
that robots typically ignore. We present a robot navigation system that uses
the same symbolic spatial information employed by humans to purposefully
navigate in unseen built environments with a level of performance comparable to
humans. The navigation system uses a novel data structure called the abstract
map to imagine malleable spatial models for unseen spaces from spatial symbols.
Sensorimotor perceptions from a robot are then employed to provide purposeful
navigation to symbolic goal locations in the unseen environment. We show how a
dynamic system can be used to create malleable spatial models for the abstract
map, and provide an open source implementation to encourage future work in the
area of symbolic navigation. Symbolic navigation performance of humans and a
robot is evaluated in a real-world built environment. The paper concludes with
a qualitative analysis of human navigation strategies, providing further
insights into how the symbolic navigation capabilities of robots in unseen
built environments can be improved in the future.Comment: 15 pages, published in IEEE Transactions on Cognitive and
Developmental Systems (http://doi.org/10.1109/TCDS.2020.2993855), see
https://btalb.github.io/abstract_map/ for access to softwar
Agents for educational games and simulations
This book consists mainly of revised papers that were presented at the Agents for Educational Games and Simulation (AEGS) workshop held on May 2, 2011, as part of the Autonomous Agents and MultiAgent Systems (AAMAS) conference in Taipei, Taiwan. The 12 full papers presented were carefully reviewed and selected from various submissions. The papers are organized topical sections on middleware applications, dialogues and learning, adaption and convergence, and agent applications
Predicting Future Instance Segmentation by Forecasting Convolutional Features
Anticipating future events is an important prerequisite towards intelligent
behavior. Video forecasting has been studied as a proxy task towards this goal.
Recent work has shown that to predict semantic segmentation of future frames,
forecasting at the semantic level is more effective than forecasting RGB frames
and then segmenting these. In this paper we consider the more challenging
problem of future instance segmentation, which additionally segments out
individual objects. To deal with a varying number of output labels per image,
we develop a predictive model in the space of fixed-sized convolutional
features of the Mask R-CNN instance segmentation model. We apply the "detection
head'" of Mask R-CNN on the predicted features to produce the instance
segmentation of future frames. Experiments show that this approach
significantly improves over strong baselines based on optical flow and
repurposed instance segmentation architectures
Semantic Similarity of Spatial Scenes
The formalization of similarity in spatial information systems can unleash their functionality and contribute technology not only useful, but also desirable by broad groups of users. As a paradigm for information retrieval, similarity supersedes tedious querying techniques and unveils novel ways for user-system interaction by naturally supporting modalities such as speech and sketching. As a tool within the scope of a broader objective, it can facilitate such diverse tasks as data integration, landmark determination, and prediction making. This potential motivated the development of several similarity models within the geospatial and computer science communities. Despite the merit of these studies, their cognitive plausibility can be limited due to neglect of well-established psychological principles about properties and behaviors of similarity. Moreover, such approaches are typically guided by experience, intuition, and observation, thereby often relying on more narrow perspectives or restrictive assumptions that produce inflexible and incompatible measures. This thesis consolidates such fragmentary efforts and integrates them along with novel formalisms into a scalable, comprehensive, and cognitively-sensitive framework for similarity queries in spatial information systems. Three conceptually different similarity queries at the levels of attributes, objects, and scenes are distinguished. An analysis of the relationship between similarity and change provides a unifying basis for the approach and a theoretical foundation for measures satisfying important similarity properties such as asymmetry and context dependence. The classification of attributes into categories with common structural and cognitive characteristics drives the implementation of a small core of generic functions, able to perform any type of attribute value assessment. Appropriate techniques combine such atomic assessments to compute similarities at the object level and to handle more complex inquiries with multiple constraints. These techniques, along with a solid graph-theoretical methodology adapted to the particularities of the geospatial domain, provide the foundation for reasoning about scene similarity queries. Provisions are made so that all methods comply with major psychological findings about people’s perceptions of similarity. An experimental evaluation supplies the main result of this thesis, which separates psychological findings with a major impact on the results from those that can be safely incorporated into the framework through computationally simpler alternatives
It's all Relative: Monocular 3D Human Pose Estimation from Weakly Supervised Data
We address the problem of 3D human pose estimation from 2D input images using
only weakly supervised training data. Despite showing considerable success for
2D pose estimation, the application of supervised machine learning to 3D pose
estimation in real world images is currently hampered by the lack of varied
training images with corresponding 3D poses. Most existing 3D pose estimation
algorithms train on data that has either been collected in carefully controlled
studio settings or has been generated synthetically. Instead, we take a
different approach, and propose a 3D human pose estimation algorithm that only
requires relative estimates of depth at training time. Such training signal,
although noisy, can be easily collected from crowd annotators, and is of
sufficient quality for enabling successful training and evaluation of 3D pose
algorithms. Our results are competitive with fully supervised regression based
approaches on the Human3.6M dataset, despite using significantly weaker
training data. Our proposed algorithm opens the door to using existing
widespread 2D datasets for 3D pose estimation by allowing fine-tuning with
noisy relative constraints, resulting in more accurate 3D poses.Comment: BMVC 2018. Project page available at
http://www.vision.caltech.edu/~mronchi/projects/RelativePos
Description Logic for Scene Understanding at the Example of Urban Road Intersections
Understanding a natural scene on the basis of external sensors is a task yet to be solved by computer algorithms. The present thesis investigates the suitability of a particular family of explicit, formal representation and reasoning formalisms for this task, which are subsumed under the term Description Logic
- …