217,010 research outputs found
A Simple Framework for Multi-mode Spatial-Temporal Data Modeling
Spatial-temporal data modeling aims to mine the underlying spatial
relationships and temporal dependencies of objects in a system. However, most
existing methods focus on the modeling of spatial-temporal data in a single
mode, lacking the understanding of multiple modes. Though very few methods have
been presented to learn the multi-mode relationships recently, they are built
on complicated components with higher model complexities. In this paper, we
propose a simple framework for multi-mode spatial-temporal data modeling to
bring both effectiveness and efficiency together. Specifically, we design a
general cross-mode spatial relationships learning component to adaptively
establish connections between multiple modes and propagate information along
the learned connections. Moreover, we employ multi-layer perceptrons to capture
the temporal dependencies and channel correlations, which are conceptually and
technically succinct. Experiments on three real-world datasets show that our
model can consistently outperform the baselines with lower space and time
complexity, opening up a promising direction for modeling spatial-temporal
data. The generalizability of the cross-mode spatial relationships learning
module is also validated
Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA
Text-based visual question answering (TextVQA) faces the significant
challenge of avoiding redundant relational inference. To be specific, a large
number of detected objects and optical character recognition (OCR) tokens
result in rich visual relationships. Existing works take all visual
relationships into account for answer prediction. However, there are three
observations: (1) a single subject in the images can be easily detected as
multiple objects with distinct bounding boxes (considered repetitive objects).
The associations between these repetitive objects are superfluous for answer
reasoning; (2) two spatially distant OCR tokens detected in the image
frequently have weak semantic dependencies for answer reasoning; and (3) the
co-existence of nearby objects and tokens may be indicative of important visual
cues for predicting answers. Rather than utilizing all of them for answer
prediction, we make an effort to identify the most important connections or
eliminate redundant ones. We propose a sparse spatial graph network (SSGN) that
introduces a spatially aware relation pruning technique to this task. As
spatial factors for relation measurement, we employ spatial distance, geometric
dimension, overlap area, and DIoU for spatially aware pruning. We consider
three visual relationships for graph learning: object-object, OCR-OCR tokens,
and object-OCR token relationships. SSGN is a progressive graph learning
architecture that verifies the pivotal relations in the correlated object-token
sparse graph, and then in the respective object-based sparse graph and
token-based sparse graph. Experiment results on TextVQA and ST-VQA datasets
demonstrate that SSGN achieves promising performances. And some visualization
results further demonstrate the interpretability of our method.Comment: Accepted by TIP 202
Selection of macroreference frames in spatial memory
Spatial memories are often hierarchically organized with different regions of space represented in unique clusters within the hierarchy. Each cluster is thought to be organized around its own microreference frame selected during learning, whereas relationships between clusters are organized by a macroreference frame. Two experiments were conducted in order to better understand important characteristics of macroreference frames. Participants learned overlapping spatial layouts of objects within a room-sized environment before performing a perspective-taking task from memory. Of critical importance were between-layout judgments thought to reflect the macroreference frame. The results indicate that (1) macroreference frames characterize overlapping spatial layouts, (2) macroreference frames are used even when microreference frames are aligned with one another, and (3) macroreference frame selection depends on an interaction between the global macroaxis (defined by characteristics of the layout of all learned objects), the relational macroaxis (defined by characteristics of the two layouts being related on a perspective-taking trial), and the learning view. These results refine the current understanding of macroreference frames and document their broad role in spatial memory
System alignment supports cross-domain learning and zero-shot generalisation
Recent findings suggest conceptual relationships hold across modalities. For instance, if two concepts occur in similar linguistic contexts, they also likely occur in similar visual contexts. These similarity structures may provide a valuable signal for alignment when learning to map between domains, such as when learning the names of objects. To assess this possibility, we conducted a paired-associate learning experiment in which participants mapped objects that varied on two visual features to locations that varied along two spatial dimensions. We manipulated whether the featural and spatial systems were aligned or misaligned. Although system alignment was not required to complete this supervised learning task, we found that participants learned more efficiently when systems aligned and that aligned systems facilitated zero-shot generalisation. We fit a variety of models to individuals' responses and found that models which included an offline unsupervised alignment mechanism best accounted for human performance. Our results provide empirical evidence that people align entire representation systems to accelerate learning, even when learning seemingly arbitrary associations between two domains
Spatial Object Recommendation with Hints: When Spatial Granularity Matters
Existing spatial object recommendation algorithms generally treat objects
identically when ranking them. However, spatial objects often cover different
levels of spatial granularity and thereby are heterogeneous. For example, one
user may prefer to be recommended a region (say Manhattan), while another user
might prefer a venue (say a restaurant). Even for the same user, preferences
can change at different stages of data exploration. In this paper, we study how
to support top-k spatial object recommendations at varying levels of spatial
granularity, enabling spatial objects at varying granularity, such as a city,
suburb, or building, as a Point of Interest (POI). To solve this problem, we
propose the use of a POI tree, which captures spatial containment relationships
between POIs. We design a novel multi-task learning model called MPR (short for
Multi-level POI Recommendation), where each task aims to return the top-k POIs
at a certain spatial granularity level. Each task consists of two subtasks: (i)
attribute-based representation learning; (ii) interaction-based representation
learning. The first subtask learns the feature representations for both users
and POIs, capturing attributes directly from their profiles. The second subtask
incorporates user-POI interactions into the model. Additionally, MPR can
provide insights into why certain recommendations are being made to a user
based on three types of hints: user-aspect, POI-aspect, and interaction-aspect.
We empirically validate our approach using two real-life datasets, and show
promising performance improvements over several state-of-the-art methods
Space to play: games and activities for spatial concepts in primary school children
Spatial concepts are amongst the most important mathematical concepts that young children develop. Ideas of shape, size, and position are part of the young child\u27s mathematical world from the very beginning. The spatial environment of the child is always changing. Objects move around in his environment, just as he moves around and observes them from different positions. The relationships between objects and their relative shapes, sizes and positions are constantly changing.
In our teaching of the Space strand of the primary mathematics syllabus, we are trying to develop in children the understanding and skills associated with spatial relationships that are important and appropriate to them.
The games and activities presented here are for the enjoyment and stimulation of the children in your class and focus on two important spatial concepts. The first of these is the idea of a boundary, and the second is the idea of scale.
We believe that the children in your class will enjoy learning about these ideas through playing the games and doing the activities we have written
Transductive Learning for Spatial Data Classification
Learning classifiers of spatial data presents several issues, such as the heterogeneity of spatial objects, the implicit definition of spatial relationships among objects, the spatial autocorrelation and the abundance of unlabelled data which potentially convey a large amount of information. The first three issues are due to the inherent structure of spatial units of analysis, which can be easily accommodated if a (multi-)relational data mining approach is considered. The fourth issue demands for the adoption of a transductive setting, which aims to make predictions for a given set of unlabelled data. Transduction is also motivated by the contiguity of the concept of positive autocorrelation, which typically affect spatial phenomena, with the smoothness assumption which characterize the transductive setting. In this work, we investigate a relational approach to spatial classification in a transductive setting. Computational solutions to the main difficulties met in this approach are presented. In particular, a relational upgrade of the nave Bayes classifier is proposed as discriminative model, an iterative algorithm is designed for the transductive classification of unlabelled data, and a distance measure between relational descriptions of spatial objects is defined in order to determine the k-nearest neighbors of each example in the dataset. Computational solutions have been tested on two real-world spatial datasets. The transformation of spatial data into a multi-relational representation and experimental results are reported and commented
Learning Detection with Diverse Proposals
To predict a set of diverse and informative proposals with enriched
representations, this paper introduces a differentiable Determinantal Point
Process (DPP) layer that is able to augment the object detection architectures.
Most modern object detection architectures, such as Faster R-CNN, learn to
localize objects by minimizing deviations from the ground-truth but ignore
correlation between multiple proposals and object categories. Non-Maximum
Suppression (NMS) as a widely used proposal pruning scheme ignores label- and
instance-level relations between object candidates resulting in multi-labeled
detections. In the multi-class case, NMS selects boxes with the largest
prediction scores ignoring the semantic relation between categories of
potential election. In contrast, our trainable DPP layer, allowing for Learning
Detection with Diverse Proposals (LDDP), considers both label-level contextual
information and spatial layout relationships between proposals without
increasing the number of parameters of the network, and thus improves location
and category specifications of final detected bounding boxes substantially
during both training and inference schemes. Furthermore, we show that LDDP
keeps it superiority over Faster R-CNN even if the number of proposals
generated by LDPP is only ~30% as many as those for Faster R-CNN.Comment: Accepted to CVPR 201
Spatial-Temporal Knowledge-Embedded Transformer for Video Scene Graph Generation
Video scene graph generation (VidSGG) aims to identify objects in visual
scenes and infer their relationships for a given video. It requires not only a
comprehensive understanding of each object scattered on the whole scene but
also a deep dive into their temporal motions and interactions. Inherently,
object pairs and their relationships enjoy spatial co-occurrence correlations
within each image and temporal consistency/transition correlations across
different images, which can serve as prior knowledge to facilitate VidSGG model
learning and inference. In this work, we propose a spatial-temporal
knowledge-embedded transformer (STKET) that incorporates the prior
spatial-temporal knowledge into the multi-head cross-attention mechanism to
learn more representative relationship representations. Specifically, we first
learn spatial co-occurrence and temporal transition correlations in a
statistical manner. Then, we design spatial and temporal knowledge-embedded
layers that introduce the multi-head cross-attention mechanism to fully explore
the interaction between visual representation and the knowledge to generate
spatial- and temporal-embedded representations, respectively. Finally, we
aggregate these representations for each subject-object pair to predict the
final semantic labels and their relationships. Extensive experiments show that
STKET outperforms current competing algorithms by a large margin, e.g.,
improving the mR@50 by 8.1%, 4.7%, and 2.1% on different settings over current
algorithms.Comment: Technical Repor
- …