26 research outputs found
DiffusionDepth: Diffusion Denoising Approach for Monocular Depth Estimation
Monocular depth estimation is a challenging task that predicts the pixel-wise
depth from a single 2D image. Current methods typically model this problem as a
regression or classification task. We propose DiffusionDepth, a new approach
that reformulates monocular depth estimation as a denoising diffusion process.
It learns an iterative denoising process to `denoise' random depth distribution
into a depth map with the guidance of monocular visual conditions. The process
is performed in the latent space encoded by a dedicated depth encoder and
decoder. Instead of diffusing ground truth (GT) depth, the model learns to
reverse the process of diffusing the refined depth of itself into random depth
distribution. This self-diffusion formulation overcomes the difficulty of
applying generative models to sparse GT depth scenarios. The proposed approach
benefits this task by refining depth estimation step by step, which is superior
for generating accurate and highly detailed depth maps. Experimental results on
KITTI and NYU-Depth-V2 datasets suggest that a simple yet efficient diffusion
approach could reach state-of-the-art performance in both indoor and outdoor
scenarios with acceptable inference time
A Simple Baseline for Supervised Surround-view Depth Estimation
Depth estimation has been widely studied and serves as the fundamental step
of 3D perception for autonomous driving. Though significant progress has been
made for monocular depth estimation in the past decades, these attempts are
mainly conducted on the KITTI benchmark with only front-view cameras, which
ignores the correlations across surround-view cameras. In this paper, we
propose S3Depth, a Simple Baseline for Supervised Surround-view Depth
Estimation, to jointly predict the depth maps across multiple surrounding
cameras. Specifically, we employ a global-to-local feature extraction module
which combines CNN with transformer layers for enriched representations.
Further, the Adjacent-view Attention mechanism is proposed to enable the
intra-view and inter-view feature propagation. The former is achieved by the
self-attention module within each view, while the latter is realized by the
adjacent attention module, which computes the attention across multi-cameras to
exchange the multi-scale representations across surround-view feature maps.
Extensive experiments show that our method achieves superior performance over
existing state-of-the-art methods on both DDAD and nuScenes datasets
GaitStrip: Gait Recognition via Effective Strip-based Feature Representations and Multi-Level Framework
Many gait recognition methods first partition the human gait into N-parts and
then combine them to establish part-based feature representations. Their gait
recognition performance is often affected by partitioning strategies, which are
empirically chosen in different datasets. However, we observe that strips as
the basic component of parts are agnostic against different partitioning
strategies. Motivated by this observation, we present a strip-based multi-level
gait recognition network, named GaitStrip, to extract comprehensive gait
information at different levels. To be specific, our high-level branch explores
the context of gait sequences and our low-level one focuses on detailed posture
changes. We introduce a novel StriP-Based feature extractor (SPB) to learn the
strip-based feature representations by directly taking each strip of the human
body as the basic unit. Moreover, we propose a novel multi-branch structure,
called Enhanced Convolution Module (ECM), to extract different representations
of gaits. ECM consists of the Spatial-Temporal feature extractor (ST), the
Frame-Level feature extractor (FL) and SPB, and has two obvious advantages:
First, each branch focuses on a specific representation, which can be used to
improve the robustness of the network. Specifically, ST aims to extract
spatial-temporal features of gait sequences, while FL is used to generate the
feature representation of each frame. Second, the parameters of the ECM can be
reduced in test by introducing a structural re-parameterization technique.
Extensive experimental results demonstrate that our GaitStrip achieves
state-of-the-art performance in both normal walking and complex conditions.Comment: Accepted to ACCV202
DyGait: Exploiting Dynamic Representations for High-performance Gait Recognition
Gait recognition is a biometric technology that recognizes the identity of
humans through their walking patterns. Compared with other biometric
technologies, gait recognition is more difficult to disguise and can be applied
to the condition of long-distance without the cooperation of subjects. Thus, it
has unique potential and wide application for crime prevention and social
security. At present, most gait recognition methods directly extract features
from the video frames to establish representations. However, these
architectures learn representations from different features equally but do not
pay enough attention to dynamic features, which refers to a representation of
dynamic parts of silhouettes over time (e.g. legs). Since dynamic parts of the
human body are more informative than other parts (e.g. bags) during walking, in
this paper, we propose a novel and high-performance framework named DyGait.
This is the first framework on gait recognition that is designed to focus on
the extraction of dynamic features. Specifically, to take full advantage of the
dynamic information, we propose a Dynamic Augmentation Module (DAM), which can
automatically establish spatial-temporal feature representations of the dynamic
parts of the human body. The experimental results show that our DyGait network
outperforms other state-of-the-art gait recognition methods. It achieves an
average Rank-1 accuracy of 71.4% on the GREW dataset, 66.3% on the Gait3D
dataset, 98.4% on the CASIA-B dataset and 98.3% on the OU-MVLP dataset
Multi-Prompt with Depth Partitioned Cross-Modal Learning
In recent years, soft prompt learning methods have been proposed to fine-tune
large-scale vision-language pre-trained models for various downstream tasks.
These methods typically combine learnable textual tokens with class tokens as
input for models with frozen parameters. However, they often employ a single
prompt to describe class contexts, failing to capture categories' diverse
attributes adequately. This study introduces the Partitioned Multi-modal Prompt
(PMPO), a multi-modal prompting technique that extends the soft prompt from a
single learnable prompt to multiple prompts. Our method divides the visual
encoder depths and connects learnable prompts to the separated visual depths,
enabling different prompts to capture the hierarchical contextual depths of
visual representations. Furthermore, to maximize the advantages of multi-prompt
learning, we incorporate prior information from manually designed templates and
learnable multi-prompts, thus improving the generalization capabilities of our
approach. We evaluate the effectiveness of our approach on three challenging
tasks: new class generalization, cross-dataset evaluation, and domain
generalization. For instance, our method achieves a harmonic mean,
averaged over 11 diverse image recognition datasets ( compared to CoOp),
demonstrating significant competitiveness compared to state-of-the-art
prompting methods
Temporal and spatial dynamics in soil acoustics and their relation to soil animal diversity
The observation and assessment of animal biodiversity using acoustic technology has developed considerably in recent years. Current eco-acoustic research focuses on automatic audio recorder arrays and acoustic indices, which may be used to study the spatial and temporal dynamics of local animal communities in high resolution. While such soundscapes have often been studied above ground, their applicability in soils has rarely been tested. For the first time, we applied acoustic and statistical methods to explore the spatial, diurnal, and seasonal dynamics of the soundscape in soils. We studied the dynamics of acoustic complexity in forest soils in the alpine Pfynwald forest in the Swiss canton of Valais and related them to meteorological and microclimatic data. To increase microclimatic variability, we used a long-term irrigation experiment. We also took soil samples close to the sensors on 6 days in different seasons. Daily and seasonal patterns of acoustic complexity were predicted to be associated with abiotic parameters—that is, meteorological and microclimatic conditions—and mediated by the dynamics of the diversity and activity of the soil fauna. Seasonal patterns in acoustic complexity showed the highest acoustic complexity values in spring and summer, decreasing in fall and winter. Diurnal acoustic complexity values were highest in the afternoon and lowest during the night. The measurement of acoustic diversity at the sampling site was significantly associated with soil communities, with relationships between taxa richness or community composition and acoustic complexity being strongest shortly before taking the soil samples. Our results suggest that the temporal and spatial dynamics of the diversity and community composition of soil organisms can be predicted by the acoustic complexity of soil soundscapes. This opens up the possibility of using soil soundscape analysis as a noninvasive and easy-to-use method for soil biodiversity monitoring programs.ISSN:1932-620
Modeling the Corn Residue Coverage after Harvesting and before Sowing in Northeast China by Random Forest and Soil Texture Zoning
Crop residue cover is vital for reducing soil erosion and improving soil fertility, which is an important way of conserving tillage to protect the black soil in Northeast China. How much the crop residue covers on cropland is of significance for black soil protection. Landsat-8 and Sentinel-2 images were used to estimate corn residue coverage (CRC) in Northeast China in this study. The estimation model of CRC was established for improving CRC estimation accuracy by the optimal combination of spectral indices and textural features, based on soil texture zoning, using the random forest regression method. Our results revealed that (1) the optimization C5 of spectral indices and textural features improves the CRC estimation accuracy after harvesting and before sowing with determination coefficients (R2) of 0.78 and 0.73, respectively; (2) the random forest improves the CRC estimation accuracy after harvesting and before sowing with R2 of 0.81 and 0.77, respectively; (3) considering the spatial heterogeneity of the soil background and the usage of soil texture zoning models increase the accuracy of CRC estimation after harvesting and before sowing with R2 of 0.84 and 0.81, respectively. In general, the CRC estimation accuracy after harvesting was better than that before sowing. The results revealed that the corn residue coverage in most of the study area was 0.3 to 0.6 and was mainly distributed in the Songnen Plain. By the estimated corn residue coverage results, the implementation of conservation tillage practices is identified, which is vital for protecting the black soil in Northeast China
Source apportionment analysis from the Tibetan Plateau
This data gives the contributions of different sources (Marine and salt-lake, dust, biomass burning and long-range transport anthropogenic pollutants) to chemical components measured in rainwater resulted from the Positive Matrix Factorization Model developed by Environmental Protection Agency (EPA-PMF)
Ice-Nucleating Particle Concentrations and Sources in Rainwater over the Third Pole, Tibetan Plateau
The ice-nucleating particles (INPs) modulate the microphysics and radiative properties of clouds. However, less is known concerning their abundance and sources in the most pristine and climatic sensitive regions, such as the Tibetan Plateau (TP). Here, to our best knowledge, we conduct the first investigation on INPs in rainwater collected in the TP region under mixed-phase cloud conditions. The INP concentrations vary from 0.002 to 0.675 L-1 Air over the temperature range from -7.1 to -27.5 °C, being within the INP spectra derived from precipitation under worldwide geophysical conditions, and are also comparable to those in the Arctic region. The heating-sensitive INPs account for 57%±30% of the observed INPs at -20 °C, and become increasingly important at warmer temperature regime, indicating biogenic particles as major contributors to INPs above -20 °C over the TP, especially, on the day with additional input of biogenic materials carried by dust particles. Chemical analysis demonstrates the rainwater components are mixture of dust particles, marine aerosol, and anthropogenic pollutants. Dust particles transported from the surrounding deserts and originated from ground surface of TP may contribute to the heating-resistant INPs at temperatures below -20 °C
Freezing experiment of rainwaters collected at the Nam Co Station in central part of the Tibetan Plateau
This data describe the freezing experiment of rainwaters collected in Tibetan Plateau (TP). The data set includes two parts, which are results of untreated samples and samples after being heated to 95 °C in 10 minutes