495 research outputs found
Bifurcation and dynamic response analysis of rotating blade excited by upstream vortices
Acknowledgements The authors acknowledge the projects supported by the National Basic Research Program of China (973 Project)(No. 2015CB057405) and the National Natural Science Foundation of China (No. 11372082) and the State Scholarship Fund of CSC. DW thanks for the hospitality of the University of Aberdeen.Peer reviewedPostprin
CtxMIM: Context-Enhanced Masked Image Modeling for Remote Sensing Image Understanding
Learning representations through self-supervision on unlabeled data has
proven highly effective for understanding diverse images. However, remote
sensing images often have complex and densely populated scenes with multiple
land objects and no clear foreground objects. This intrinsic property generates
high object density, resulting in false positive pairs or missing contextual
information in self-supervised learning. To address these problems, we propose
a context-enhanced masked image modeling method (CtxMIM), a simple yet
efficient MIM-based self-supervised learning for remote sensing image
understanding. CtxMIM formulates original image patches as a reconstructive
template and employs a Siamese framework to operate on two sets of image
patches. A context-enhanced generative branch is introduced to provide
contextual information through context consistency constraints in the
reconstruction. With the simple and elegant design, CtxMIM encourages the
pre-training model to learn object-level or pixel-level features on a
large-scale dataset without specific temporal or geographical constraints.
Finally, extensive experiments show that features learned by CtxMIM outperform
fully supervised and state-of-the-art self-supervised learning methods on
various downstream tasks, including land cover classification, semantic
segmentation, object detection, and instance segmentation. These results
demonstrate that CtxMIM learns impressive remote sensing representations with
high generalization and transferability. Code and data will be made public
available
Learning Discriminative Representations for Skeleton Based Action Recognition
Human action recognition aims at classifying the category of human action
from a segment of a video. Recently, people have dived into designing GCN-based
models to extract features from skeletons for performing this task, because
skeleton representations are much more efficient and robust than other
modalities such as RGB frames. However, when employing the skeleton data, some
important clues like related items are also discarded. It results in some
ambiguous actions that are hard to be distinguished and tend to be
misclassified. To alleviate this problem, we propose an auxiliary feature
refinement head (FR Head), which consists of spatial-temporal decoupling and
contrastive feature refinement, to obtain discriminative representations of
skeletons. Ambiguous samples are dynamically discovered and calibrated in the
feature space. Furthermore, FR Head could be imposed on different stages of
GCNs to build a multi-level refinement for stronger supervision. Extensive
experiments are conducted on NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets.
Our proposed models obtain competitive results from state-of-the-art methods
and can help to discriminate those ambiguous samples. Codes are available at
https://github.com/zhysora/FR-Head.Comment: Accepted by CVPR2023. 10 pages, 5 figures, 5 table
Counting dense objects in remote sensing images
Estimating accurate number of interested objects from a given image is a
challenging yet important task. Significant efforts have been made to address
this problem and achieve great progress, yet counting number of ground objects
from remote sensing images is barely studied. In this paper, we are interested
in counting dense objects from remote sensing images. Compared with object
counting in natural scene, this task is challenging in following factors: large
scale variation, complex cluttered background and orientation arbitrariness.
More importantly, the scarcity of data severely limits the development of
research in this field. To address these issues, we first construct a
large-scale object counting dataset based on remote sensing images, which
contains four kinds of objects: buildings, crowded ships in harbor,
large-vehicles and small-vehicles in parking lot. We then benchmark the dataset
by designing a novel neural network which can generate density map of an input
image. The proposed network consists of three parts namely convolution block
attention module (CBAM), scale pyramid module (SPM) and deformable convolution
module (DCM). Experiments on the proposed dataset and comparisons with state of
the art methods demonstrate the challenging of the proposed dataset, and
superiority and effectiveness of our method
- β¦