3 research outputs found
Fine-Grained Spatiotemporal Motion Alignment for Contrastive Video Representation Learning
As the most essential property in a video, motion information is critical to
a robust and generalized video representation. To inject motion dynamics,
recent works have adopted frame difference as the source of motion information
in video contrastive learning, considering the trade-off between quality and
cost. However, existing works align motion features at the instance level,
which suffers from spatial and temporal weak alignment across modalities. In
this paper, we present a \textbf{Fi}ne-grained \textbf{M}otion
\textbf{A}lignment (FIMA) framework, capable of introducing well-aligned and
significant motion information. Specifically, we first develop a dense
contrastive learning framework in the spatiotemporal domain to generate
pixel-level motion supervision. Then, we design a motion decoder and a
foreground sampling strategy to eliminate the weak alignments in terms of time
and space. Moreover, a frame-level motion contrastive loss is presented to
improve the temporal diversity of the motion features. Extensive experiments
demonstrate that the representations learned by FIMA possess great
motion-awareness capabilities and achieve state-of-the-art or competitive
results on downstream tasks across UCF101, HMDB51, and Diving48 datasets. Code
is available at \url{https://github.com/ZMHH-H/FIMA}.Comment: ACM MM 2023 Camera Read
Unbiased Directed Object Attention Graph for Object Navigation
Object navigation tasks require agents to locate specific objects in unknown
environments based on visual information. Previously, graph convolutions were
used to implicitly explore the relationships between objects. However, due to
differences in visibility among objects, it is easy to generate biases in
object attention. Thus, in this paper, we propose a directed object attention
(DOA) graph to guide the agent in explicitly learning the attention
relationships between objects, thereby reducing the object attention bias. In
particular, we use the DOA graph to perform unbiased adaptive object attention
(UAOA) on the object features and unbiased adaptive image attention (UAIA) on
the raw images, respectively. To distinguish features in different branches, a
concise adaptive branch energy distribution (ABED) method is proposed. We
assess our methods on the AI2-Thor dataset. Compared with the state-of-the-art
(SOTA) method, our method reports 7.4%, 8.1% and 17.6% increase in success rate
(SR), success weighted by path length (SPL) and success weighted by action
efficiency (SAE), respectively.Comment: 13 pages, ready to ACM Mutimedia, under revie
Prognostication of chronic disorders of consciousness using brain functional networks and clinical characteristics
Disorders of consciousness are a heterogeneous mixture of different diseases
or injuries. Although some indicators and models have been proposed for
prognostication, any single method when used alone carries a high risk of false
prediction. This study aimed to develop a multidomain prognostic model that
combines resting state functional MRI with three clinical characteristics to
predict one year outcomes at the single-subject level. The model discriminated
between patients who would later recover consciousness and those who would not
with an accuracy of around 90% on three datasets from two medical centers. It
was also able to identify the prognostic importance of different predictors,
including brain functions and clinical characteristics. To our knowledge, this
is the first implementation reported of a multidomain prognostic model based on
resting state functional MRI and clinical characteristics in chronic disorders
of consciousness. We therefore suggest that this novel prognostic model is
accurate, robust, and interpretable.Comment: Although some prognostic indicators and models have been proposed for
disorders of consciousness, each single method when used alone carries risks
of false prediction. Song et al. report that a model combining resting state
functional MRI with clinical characteristics provided accurate, robust, and
interpretable prognostications. 52 pages, 1 table, 7 figure