25,734 research outputs found
STW-MD: A Novel Spatio-Temporal Weighting and Multi-Step Decision Tree Method for Considering Spatial Heterogeneity in Brain Gene Expression Data
Motivation: Gene expression during brain development or abnormal development
is a biological process that is highly dynamic in spatio and temporal. Due to
the lack of comprehensive integration of spatial and temporal dimensions of
brain gene expression data, previous studies have mainly focused on individual
brain regions or a certain developmental stage. Our motivation is to address
this gap by incorporating spatio-temporal information to gain a more complete
understanding of the mechanisms underlying brain development or disorders
associated with abnormal brain development, such as Alzheimer's disease (AD),
and to identify potential determinants of response.
Results: In this study, we propose a novel two-step framework based on
spatial-temporal information weighting and multi-step decision trees. This
framework can effectively exploit the spatial similarity and temporal
dependence between different stages and different brain regions, and facilitate
differential gene analysis in brain regions with high heterogeneity. We focus
on two datasets: the AD dataset, which includes gene expression data from
early, middle, and late stages, and the brain development dataset, spanning
fetal development to adulthood. Our findings highlight the advantages of the
proposed framework in discovering gene classes and elucidating their impact on
brain development and AD progression across diverse brain regions and stages.
These findings align with existing studies and provide insights into the
processes of normal and abnormal brain development.
Availability: The code of STW-MD is available at
https://github.com/tsnm1/STW-MD.Comment: 11 pages, 6 figure
Interaction-Aware Prompting for Zero-Shot Spatio-Temporal Action Detection
The goal of spatial-temporal action detection is to determine the time and
place where each person's action occurs in a video and classify the
corresponding action category. Most of the existing methods adopt
fully-supervised learning, which requires a large amount of training data,
making it very difficult to achieve zero-shot learning. In this paper, we
propose to utilize a pre-trained visual-language model to extract the
representative image and text features, and model the relationship between
these features through different interaction modules to obtain the interaction
feature. In addition, we use this feature to prompt each label to obtain more
appropriate text features. Finally, we calculate the similarity between the
interaction feature and the text feature for each label to determine the action
category. Our experiments on J-HMDB and UCF101-24 datasets demonstrate that the
proposed interaction module and prompting make the visual-language features
better aligned, thus achieving excellent accuracy for zero-shot spatio-temporal
action detection. The code will be released upon acceptance.Comment: the first Zero-Shot Spatio-Temporal Action Detection wor
SLIM : Scalable Linkage of Mobility Data
We present a scalable solution to link entities across mobility datasets using their spatio-temporal information. This is a fundamental problem in many applications such as linking user identities for security, understanding privacy limitations of location based services, or producing a unified dataset from multiple sources for urban planning. Such integrated datasets are also essential for service providers to optimise their services and improve business intelligence. In this paper, we first propose a mobility based representation and similarity computation for entities. An efficient matching process is then developed to identify the final linked pairs, with an automated mechanism to decide when to stop the linkage. We scale the process with a locality-sensitive hashing (LSH) based approach that significantly reduces candidate pairs for matching. To realize the effectiveness and efficiency of our techniques in practice, we introduce an algorithm called SLIM. In the experimental evaluation, SLIM outperforms the two existing state-of-the-art approaches in terms of precision and recall. Moreover, the LSH-based approach brings two to four orders of magnitude speedup
Automatic Action Annotation in Weakly Labeled Videos
Manual spatio-temporal annotation of human action in videos is laborious,
requires several annotators and contains human biases. In this paper, we
present a weakly supervised approach to automatically obtain spatio-temporal
annotations of an actor in action videos. We first obtain a large number of
action proposals in each video. To capture a few most representative action
proposals in each video and evade processing thousands of them, we rank them
using optical flow and saliency in a 3D-MRF based framework and select a few
proposals using MAP based proposal subset selection method. We demonstrate that
this ranking preserves the high quality action proposals. Several such
proposals are generated for each video of the same action. Our next challenge
is to iteratively select one proposal from each video so that all proposals are
globally consistent. We formulate this as Generalized Maximum Clique Graph
problem using shape, global and fine grained similarity of proposals across the
videos. The output of our method is the most action representative proposals
from each video. Our method can also annotate multiple instances of the same
action in a video. We have validated our approach on three challenging action
datasets: UCF Sport, sub-JHMDB and THUMOS'13 and have obtained promising
results compared to several baseline methods. Moreover, on UCF Sports, we
demonstrate that action classifiers trained on these automatically obtained
spatio-temporal annotations have comparable performance to the classifiers
trained on ground truth annotation
Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics
We address the problem of video representation learning without
human-annotated labels. While previous efforts address the problem by designing
novel self-supervised tasks using video data, the learned features are merely
on a frame-by-frame basis, which are not applicable to many video analytic
tasks where spatio-temporal features are prevailing. In this paper we propose a
novel self-supervised approach to learn spatio-temporal features for video
representation. Inspired by the success of two-stream approaches in video
classification, we propose to learn visual features by regressing both motion
and appearance statistics along spatial and temporal dimensions, given only the
input video data. Specifically, we extract statistical concepts (fast-motion
region and the corresponding dominant direction, spatio-temporal color
diversity, dominant color, etc.) from simple patterns in both spatial and
temporal domains. Unlike prior puzzles that are even hard for humans to solve,
the proposed approach is consistent with human inherent visual habits and
therefore easy to answer. We conduct extensive experiments with C3D to validate
the effectiveness of our proposed approach. The experiments show that our
approach can significantly improve the performance of C3D when applied to video
classification tasks. Code is available at
https://github.com/laura-wang/video_repres_mas.Comment: CVPR 201
- …