149 research outputs found
Symmetry of Planar Four-Body Convex Central Configurations
International audienceWe study the relationship between the masses and the geometric properties of central configurations. We prove that in the planar four-body problem, a convex central configuration is symmetric with respect to one diagonal if and only if the masses of the two particles on the other diagonal are equal. If these two masses are unequal, then the less massive one is closer to the former diagonal. Finally, we extend these results to the case of non-planar central configurations of five particles
VS-TransGRU: A Novel Transformer-GRU-based Framework Enhanced by Visual-Semantic Fusion for Egocentric Action Anticipation
Egocentric action anticipation is a challenging task that aims to make
advanced predictions of future actions from current and historical observations
in the first-person view. Most existing methods focus on improving the model
architecture and loss function based on the visual input and recurrent neural
network to boost the anticipation performance. However, these methods, which
merely consider visual information and rely on a single network architecture,
gradually reach a performance plateau. In order to fully understand what has
been observed and capture the dependencies between current observations and
future actions well enough, we propose a novel visual-semantic fusion enhanced
and Transformer GRU-based action anticipation framework in this paper. Firstly,
high-level semantic information is introduced to improve the performance of
action anticipation for the first time. We propose to use the semantic features
generated based on the class labels or directly from the visual observations to
augment the original visual features. Secondly, an effective visual-semantic
fusion module is proposed to make up for the semantic gap and fully utilize the
complementarity of different modalities. Thirdly, to take advantage of both the
parallel and autoregressive models, we design a Transformer based encoder for
long-term sequential modeling and a GRU-based decoder for flexible iteration
decoding. Extensive experiments on two large-scale first-person view datasets,
i.e., EPIC-Kitchens and EGTEA Gaze+, validate the effectiveness of our proposed
method, which achieves new state-of-the-art performance, outperforming previous
approaches by a large margin.Comment: 12 pages, 7 figure
MixCycle: Mixup Assisted Semi-Supervised 3D Single Object Tracking with Cycle Consistency
3D single object tracking (SOT) is an indispensable part of automated
driving. Existing approaches rely heavily on large, densely labeled datasets.
However, annotating point clouds is both costly and time-consuming. Inspired by
the great success of cycle tracking in unsupervised 2D SOT, we introduce the
first semi-supervised approach to 3D SOT. Specifically, we introduce two
cycle-consistency strategies for supervision: 1) Self tracking cycles, which
leverage labels to help the model converge better in the early stages of
training; 2) forward-backward cycles, which strengthen the tracker's robustness
to motion variations and the template noise caused by the template update
strategy. Furthermore, we propose a data augmentation strategy named SOTMixup
to improve the tracker's robustness to point cloud diversity. SOTMixup
generates training samples by sampling points in two point clouds with a mixing
rate and assigns a reasonable loss weight for training according to the mixing
rate. The resulting MixCycle approach generalizes to appearance matching-based
trackers. On the KITTI benchmark, based on the P2B tracker, MixCycle trained
with labels outperforms P2B trained with
labels, and achieves a precision improvement when using
labels. Our code will be released at
\url{https://github.com/Mumuqiao/MixCycle}.Comment: Accepted by ICCV2
Driving Simulator Validity of Driving Behavior in Work Zones
Driving simulation is an efficient, safe, and data-collection-friendly method to examine driving behavior in a controlled environment. However, the validity of a driving simulator is inconsistent when the type of the driving simulator or the driving scenario is different. The purpose of this research is to verify driving simulator validity in driving behavior research in work zones. A field experiment and a corresponding simulation experiment were conducted to collect behavioral data. Indicators such as speed, car-following distance, and reaction delay time were chosen to examine the absolute and relative validity of the driving simulator. In particular, a survival analysis method was proposed in this research to examine the validity of reaction delay time. The result indicates the following: (1) most indicators are valid in driving behavior research in the work zone. For example, spot speed, car-following distance, headway, and reaction delay time show absolute validity. (2) Standard deviation of the car-following distance shows relative validity. Consistent with previous researches, some driving behaviors appear to be more aggressive in the simulation environment.
Document type: Articl
Boosting Multi-view Stereo with Late Cost Aggregation
Pairwise matching cost aggregation is a crucial step for modern
learning-based Multi-view Stereo (MVS). Prior works adopt an early aggregation
scheme, which adds up pairwise costs into an intermediate cost. However, we
analyze that this process can degrade informative pairwise matchings, thereby
blocking the depth network from fully utilizing the original geometric matching
cues. To address this challenge, we present a late aggregation approach that
allows for aggregating pairwise costs throughout the network feed-forward
process, achieving accurate estimations with only minor changes of the plain
CasMVSNet. Instead of building an intermediate cost by weighted sum, late
aggregation preserves all pairwise costs along a distinct view channel. This
enables the succeeding depth network to fully utilize the crucial geometric
cues without loss of cost fidelity. Grounded in the new aggregation scheme, we
propose further techniques addressing view order dependence inside the
preserved cost, handling flexible testing views, and improving the depth
filtering process. Despite its technical simplicity, our method improves
significantly upon the baseline cascade-based approach, achieving comparable
results with state-of-the-art methods with favorable computation overhead.Comment: Code and models are available at https://github.com/Wuuu3511/LAMVSNE
Open-Vocabulary Video Anomaly Detection
Video anomaly detection (VAD) with weak supervision has achieved remarkable
performance in utilizing video-level labels to discriminate whether a video
frame is normal or abnormal. However, current approaches are inherently limited
to a closed-set setting and may struggle in open-world applications where there
can be anomaly categories in the test data unseen during training. A few recent
studies attempt to tackle a more realistic setting, open-set VAD, which aims to
detect unseen anomalies given seen anomalies and normal videos. However, such a
setting focuses on predicting frame anomaly scores, having no ability to
recognize the specific categories of anomalies, despite the fact that this
ability is essential for building more informed video surveillance systems.
This paper takes a step further and explores open-vocabulary video anomaly
detection (OVVAD), in which we aim to leverage pre-trained large models to
detect and categorize seen and unseen anomalies. To this end, we propose a
model that decouples OVVAD into two mutually complementary tasks --
class-agnostic detection and class-specific classification -- and jointly
optimizes both tasks. Particularly, we devise a semantic knowledge injection
module to introduce semantic knowledge from large language models for the
detection task, and design a novel anomaly synthesis module to generate pseudo
unseen anomaly videos with the help of large vision generation models for the
classification task. These semantic knowledge and synthesis anomalies
substantially extend our model's capability in detecting and categorizing a
variety of seen and unseen anomalies. Extensive experiments on three
widely-used benchmarks demonstrate our model achieves state-of-the-art
performance on OVVAD task.Comment: Submitte
S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning
VQA Natural Language Explanation (VQA-NLE) task aims to explain the
decision-making process of VQA models in natural language. Unlike traditional
attention or gradient analysis, free-text rationales can be easier to
understand and gain users' trust. Existing methods mostly use post-hoc or
self-rationalization models to obtain a plausible explanation. However, these
frameworks are bottlenecked by the following challenges: 1) the reasoning
process cannot be faithfully responded to and suffer from the problem of
logical inconsistency. 2) Human-annotated explanations are expensive and
time-consuming to collect. In this paper, we propose a new Semi-Supervised
VQA-NLE via Self-Critical Learning (S3C), which evaluates the candidate
explanations by answering rewards to improve the logical consistency between
answers and rationales. With a semi-supervised learning framework, the S3C can
benefit from a tremendous amount of samples without human-annotated
explanations. A large number of automatic measures and human evaluations all
show the effectiveness of our method. Meanwhile, the framework achieves a new
state-of-the-art performance on the two VQA-NLE datasets.Comment: CVPR202
- …