79 research outputs found
Decomposed Human Motion Prior for Video Pose Estimation via Adversarial Training
Estimating human pose from video is a task that receives considerable
attention due to its applicability in numerous 3D fields. The complexity of
prior knowledge of human body movements poses a challenge to neural network
models in the task of regressing keypoints. In this paper, we address this
problem by incorporating motion prior in an adversarial way. Different from
previous methods, we propose to decompose holistic motion prior to joint motion
prior, making it easier for neural networks to learn from prior knowledge
thereby boosting the performance on the task. We also utilize a novel
regularization loss to balance accuracy and smoothness introduced by motion
prior. Our method achieves 9\% lower PA-MPJPE and 29\% lower acceleration error
than previous methods tested on 3DPW. The estimator proves its robustness by
achieving impressive performance on in-the-wild dataset
Shareable Driving Style Learning and Analysis with a Hierarchical Latent Model
Driving style is usually used to characterize driving behavior for a driver
or a group of drivers. However, it remains unclear how one individual's driving
style shares certain common grounds with other drivers. Our insight is that
driving behavior is a sequence of responses to the weighted mixture of latent
driving styles that are shareable within and between individuals. To this end,
this paper develops a hierarchical latent model to learn the relationship
between driving behavior and driving styles. We first propose a fragment-based
approach to represent complex sequential driving behavior, allowing for
sufficiently representing driving behavior in a low-dimension feature space.
Then, we provide an analytical formulation for the interaction of driving
behavior and shareable driving style with a hierarchical latent model by
introducing the mechanism of Dirichlet allocation. Our developed model is
finally validated and verified with 100 drivers in naturalistic driving
settings with urban and highways. Experimental results reveal that individuals
share driving styles within and between them. We also analyzed the influence of
personalities (e.g., age, gender, and driving experience) on driving styles and
found that a naturally aggressive driver would not always keep driving
aggressively (i.e., could behave calmly sometimes) but with a higher proportion
of aggressiveness than other types of drivers
Modeling and Recognizing Driver Behavior Based on Driving Data: A Survey
In recent years, modeling and recognizing driver behavior have become crucial to understanding intelligence transport systems, human-vehicle systems, and intelligent vehicle systems. A wide range of both mathematical identification methods and modeling methods of driver behavior are presented from the control point of view in this paper based on the driving data, such as the brake/throttle pedal position and the steering wheel angle, among others. Subsequently, the driver’s characteristics derived from the driver model are embedded into the advanced driver assistance systems, and the evaluation and verification of vehicle systems based on the driver model are described
TinySAM: Pushing the Envelope for Efficient Segment Anything Model
Recently segment anything model (SAM) has shown powerful segmentation
capability and has drawn great attention in computer vision fields. Massive
following works have developed various applications based on the pretrained SAM
and achieved impressive performance on downstream vision tasks.
However, SAM consists of heavy architectures and requires massive
computational capacity, which hinders the further application of SAM on
computation constrained edge devices. To this end, in this paper we propose a
framework to obtain a tiny segment anything model (TinySAM) while maintaining
the strong zero-shot performance. We first propose a full-stage knowledge
distillation method with hard prompt sampling and hard mask weighting strategy
to distill a lightweight student model. We also adapt the post-training
quantization to the promptable segmentation task and further reduce the
computational cost. Moreover, a hierarchical segmenting everything strategy is
proposed to accelerate the everything inference by with almost no
performance degradation. With all these proposed methods, our TinySAM leads to
orders of magnitude computational reduction and pushes the envelope for
efficient segment anything task. Extensive experiments on various zero-shot
transfer tasks demonstrate the significantly advantageous performance of our
TinySAM against counterpart methods. Pre-trained models and codes are available
at https://github.com/xinghaochen/TinySAM and
https://gitee.com/mindspore/models/tree/master/research/cv/TinySAM
Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising
Leveraging large-scale image-text datasets and advancements in diffusion
models, text-driven generative models have made remarkable strides in the field
of image generation and editing. This study explores the potential of extending
the text-driven ability to the generation and editing of multi-text conditioned
long videos. Current methodologies for video generation and editing, while
innovative, are often confined to extremely short videos (typically less than
24 frames) and are limited to a single text condition. These constraints
significantly limit their applications given that real-world videos usually
consist of multiple segments, each bearing different semantic information. To
address this challenge, we introduce a novel paradigm dubbed as Gen-L-Video,
capable of extending off-the-shelf short video diffusion models for generating
and editing videos comprising hundreds of frames with diverse semantic segments
without introducing additional training, all while preserving content
consistency. We have implemented three mainstream text-driven video generation
and editing methodologies and extended them to accommodate longer videos imbued
with a variety of semantic segments with our proposed paradigm. Our
experimental outcomes reveal that our approach significantly broadens the
generative and editing capabilities of video diffusion models, offering new
possibilities for future research and applications. The code is available at
https://github.com/G-U-N/Gen-L-Video.Comment: The code is available at https://github.com/G-U-N/Gen-L-Vide
Real-Time Scalable Visual Tracking via Quadrangle Kernelized Correlation Filters
Correlation filter (CF) has been widely used in tracking tasks due to its simplicity and high efficiency. However, conventional CF-based trackers fail to handle the scale variation that occurs when the targeted object is moving, which is one of the most notable unsolved problems of visual object tracking. In this paper, we propose a scalable visual tracking algorithm based on kernelized correlation filters, referred to as quadrangle kernelized correlation filters (QKCF). Unlike existing complicated scalable trackers that either perform the correlation filtering operation multiple times or extract many candidate windows at various scales, our tracker intends to estimate the scale of the object based on the positions of its four corners, which can be detected using a new Gaussian training output matrix within one filtering process. After obtaining four peak values corresponding to the four corners, we measure the detection confidence of each part response by evaluating its spatial and temporal smoothness. On top of it, a weighted Bayesian inference framework is employed to estimate the final location and size of the bounding box from the response matrix, where the weights are synchronized with the calculated detection likelihoods. Experiments are performed on the OTB-100 data set and 16 benchmark sequences with significant scale variations. The results demonstrate the superiority of the proposed method in terms of both effectiveness and robustness, compared with the state-of-the-art methods
A New Type of Quartz Smog Chamber : Design and Characterization
Publisher Copyright: ©Since the 1960s, many indoor and outdoor smog chambers have been developed worldwide. However, most of them are made of Teflon films, which have relatively high background contaminations due to the wall effect. We developed the world's first medium-size quartz chamber (10 m(3)), which is jointed with 32 pieces of 5 mm thick polished quartz glasses and a stainless-steel frame. Characterizations show that this chamber exhibits excellent performance in terms of relative humidity (RH) (2-80%) and temperature (15-30 +/- 1 degrees C) control, mixing efficiency of the reactants (6-8 min), light transmittance (>90% above 290 nm), and wall loss of pollutants. The wall loss rates of the gas-phase pollutants are on the order of 10(-4) min(-1) at 298 K under dry conditions. It is 0.08 h(-1) for 100-500 nm particles, significantly lower than those of Teflon chambers. The photolysis rate of NO2 (J(NO2)) is automatically adjustable to simulate the diurnal variation of solar irradiation from 0 to 0.40 min(-1). The inner surface of the chamber can be repeatedly washed with deionized water, resulting in low background contaminations. Both experiments (toluene-NOx and alpha-pinene-ozone systems) and box model demonstrate that this new quartz chamber can provide high-quality data for investigating SOA and O-3 formation in the atmosphere.Peer reviewe
- …