26 research outputs found

    DiffusionDepth: Diffusion Denoising Approach for Monocular Depth Estimation

    Full text link
    Monocular depth estimation is a challenging task that predicts the pixel-wise depth from a single 2D image. Current methods typically model this problem as a regression or classification task. We propose DiffusionDepth, a new approach that reformulates monocular depth estimation as a denoising diffusion process. It learns an iterative denoising process to `denoise' random depth distribution into a depth map with the guidance of monocular visual conditions. The process is performed in the latent space encoded by a dedicated depth encoder and decoder. Instead of diffusing ground truth (GT) depth, the model learns to reverse the process of diffusing the refined depth of itself into random depth distribution. This self-diffusion formulation overcomes the difficulty of applying generative models to sparse GT depth scenarios. The proposed approach benefits this task by refining depth estimation step by step, which is superior for generating accurate and highly detailed depth maps. Experimental results on KITTI and NYU-Depth-V2 datasets suggest that a simple yet efficient diffusion approach could reach state-of-the-art performance in both indoor and outdoor scenarios with acceptable inference time

    A Simple Baseline for Supervised Surround-view Depth Estimation

    Full text link
    Depth estimation has been widely studied and serves as the fundamental step of 3D perception for autonomous driving. Though significant progress has been made for monocular depth estimation in the past decades, these attempts are mainly conducted on the KITTI benchmark with only front-view cameras, which ignores the correlations across surround-view cameras. In this paper, we propose S3Depth, a Simple Baseline for Supervised Surround-view Depth Estimation, to jointly predict the depth maps across multiple surrounding cameras. Specifically, we employ a global-to-local feature extraction module which combines CNN with transformer layers for enriched representations. Further, the Adjacent-view Attention mechanism is proposed to enable the intra-view and inter-view feature propagation. The former is achieved by the self-attention module within each view, while the latter is realized by the adjacent attention module, which computes the attention across multi-cameras to exchange the multi-scale representations across surround-view feature maps. Extensive experiments show that our method achieves superior performance over existing state-of-the-art methods on both DDAD and nuScenes datasets

    GaitStrip: Gait Recognition via Effective Strip-based Feature Representations and Multi-Level Framework

    Full text link
    Many gait recognition methods first partition the human gait into N-parts and then combine them to establish part-based feature representations. Their gait recognition performance is often affected by partitioning strategies, which are empirically chosen in different datasets. However, we observe that strips as the basic component of parts are agnostic against different partitioning strategies. Motivated by this observation, we present a strip-based multi-level gait recognition network, named GaitStrip, to extract comprehensive gait information at different levels. To be specific, our high-level branch explores the context of gait sequences and our low-level one focuses on detailed posture changes. We introduce a novel StriP-Based feature extractor (SPB) to learn the strip-based feature representations by directly taking each strip of the human body as the basic unit. Moreover, we propose a novel multi-branch structure, called Enhanced Convolution Module (ECM), to extract different representations of gaits. ECM consists of the Spatial-Temporal feature extractor (ST), the Frame-Level feature extractor (FL) and SPB, and has two obvious advantages: First, each branch focuses on a specific representation, which can be used to improve the robustness of the network. Specifically, ST aims to extract spatial-temporal features of gait sequences, while FL is used to generate the feature representation of each frame. Second, the parameters of the ECM can be reduced in test by introducing a structural re-parameterization technique. Extensive experimental results demonstrate that our GaitStrip achieves state-of-the-art performance in both normal walking and complex conditions.Comment: Accepted to ACCV202

    DyGait: Exploiting Dynamic Representations for High-performance Gait Recognition

    Full text link
    Gait recognition is a biometric technology that recognizes the identity of humans through their walking patterns. Compared with other biometric technologies, gait recognition is more difficult to disguise and can be applied to the condition of long-distance without the cooperation of subjects. Thus, it has unique potential and wide application for crime prevention and social security. At present, most gait recognition methods directly extract features from the video frames to establish representations. However, these architectures learn representations from different features equally but do not pay enough attention to dynamic features, which refers to a representation of dynamic parts of silhouettes over time (e.g. legs). Since dynamic parts of the human body are more informative than other parts (e.g. bags) during walking, in this paper, we propose a novel and high-performance framework named DyGait. This is the first framework on gait recognition that is designed to focus on the extraction of dynamic features. Specifically, to take full advantage of the dynamic information, we propose a Dynamic Augmentation Module (DAM), which can automatically establish spatial-temporal feature representations of the dynamic parts of the human body. The experimental results show that our DyGait network outperforms other state-of-the-art gait recognition methods. It achieves an average Rank-1 accuracy of 71.4% on the GREW dataset, 66.3% on the Gait3D dataset, 98.4% on the CASIA-B dataset and 98.3% on the OU-MVLP dataset

    Multi-Prompt with Depth Partitioned Cross-Modal Learning

    Full text link
    In recent years, soft prompt learning methods have been proposed to fine-tune large-scale vision-language pre-trained models for various downstream tasks. These methods typically combine learnable textual tokens with class tokens as input for models with frozen parameters. However, they often employ a single prompt to describe class contexts, failing to capture categories' diverse attributes adequately. This study introduces the Partitioned Multi-modal Prompt (PMPO), a multi-modal prompting technique that extends the soft prompt from a single learnable prompt to multiple prompts. Our method divides the visual encoder depths and connects learnable prompts to the separated visual depths, enabling different prompts to capture the hierarchical contextual depths of visual representations. Furthermore, to maximize the advantages of multi-prompt learning, we incorporate prior information from manually designed templates and learnable multi-prompts, thus improving the generalization capabilities of our approach. We evaluate the effectiveness of our approach on three challenging tasks: new class generalization, cross-dataset evaluation, and domain generalization. For instance, our method achieves a 79.2879.28 harmonic mean, averaged over 11 diverse image recognition datasets (+7.62+7.62 compared to CoOp), demonstrating significant competitiveness compared to state-of-the-art prompting methods

    Temporal and spatial dynamics in soil acoustics and their relation to soil animal diversity

    No full text
    The observation and assessment of animal biodiversity using acoustic technology has developed considerably in recent years. Current eco-acoustic research focuses on automatic audio recorder arrays and acoustic indices, which may be used to study the spatial and temporal dynamics of local animal communities in high resolution. While such soundscapes have often been studied above ground, their applicability in soils has rarely been tested. For the first time, we applied acoustic and statistical methods to explore the spatial, diurnal, and seasonal dynamics of the soundscape in soils. We studied the dynamics of acoustic complexity in forest soils in the alpine Pfynwald forest in the Swiss canton of Valais and related them to meteorological and microclimatic data. To increase microclimatic variability, we used a long-term irrigation experiment. We also took soil samples close to the sensors on 6 days in different seasons. Daily and seasonal patterns of acoustic complexity were predicted to be associated with abiotic parameters—that is, meteorological and microclimatic conditions—and mediated by the dynamics of the diversity and activity of the soil fauna. Seasonal patterns in acoustic complexity showed the highest acoustic complexity values in spring and summer, decreasing in fall and winter. Diurnal acoustic complexity values were highest in the afternoon and lowest during the night. The measurement of acoustic diversity at the sampling site was significantly associated with soil communities, with relationships between taxa richness or community composition and acoustic complexity being strongest shortly before taking the soil samples. Our results suggest that the temporal and spatial dynamics of the diversity and community composition of soil organisms can be predicted by the acoustic complexity of soil soundscapes. This opens up the possibility of using soil soundscape analysis as a noninvasive and easy-to-use method for soil biodiversity monitoring programs.ISSN:1932-620

    Modeling the Corn Residue Coverage after Harvesting and before Sowing in Northeast China by Random Forest and Soil Texture Zoning

    No full text
    Crop residue cover is vital for reducing soil erosion and improving soil fertility, which is an important way of conserving tillage to protect the black soil in Northeast China. How much the crop residue covers on cropland is of significance for black soil protection. Landsat-8 and Sentinel-2 images were used to estimate corn residue coverage (CRC) in Northeast China in this study. The estimation model of CRC was established for improving CRC estimation accuracy by the optimal combination of spectral indices and textural features, based on soil texture zoning, using the random forest regression method. Our results revealed that (1) the optimization C5 of spectral indices and textural features improves the CRC estimation accuracy after harvesting and before sowing with determination coefficients (R2) of 0.78 and 0.73, respectively; (2) the random forest improves the CRC estimation accuracy after harvesting and before sowing with R2 of 0.81 and 0.77, respectively; (3) considering the spatial heterogeneity of the soil background and the usage of soil texture zoning models increase the accuracy of CRC estimation after harvesting and before sowing with R2 of 0.84 and 0.81, respectively. In general, the CRC estimation accuracy after harvesting was better than that before sowing. The results revealed that the corn residue coverage in most of the study area was 0.3 to 0.6 and was mainly distributed in the Songnen Plain. By the estimated corn residue coverage results, the implementation of conservation tillage practices is identified, which is vital for protecting the black soil in Northeast China

    Source apportionment analysis from the Tibetan Plateau

    No full text
    This data gives the contributions of different sources (Marine and salt-lake, dust, biomass burning and long-range transport anthropogenic pollutants) to chemical components measured in rainwater resulted from the Positive Matrix Factorization Model developed by Environmental Protection Agency (EPA-PMF)

    Ice-Nucleating Particle Concentrations and Sources in Rainwater over the Third Pole, Tibetan Plateau

    No full text
    The ice-nucleating particles (INPs) modulate the microphysics and radiative properties of clouds. However, less is known concerning their abundance and sources in the most pristine and climatic sensitive regions, such as the Tibetan Plateau (TP). Here, to our best knowledge, we conduct the first investigation on INPs in rainwater collected in the TP region under mixed-phase cloud conditions. The INP concentrations vary from 0.002 to 0.675 L-1 Air over the temperature range from -7.1 to -27.5 °C, being within the INP spectra derived from precipitation under worldwide geophysical conditions, and are also comparable to those in the Arctic region. The heating-sensitive INPs account for 57%±30% of the observed INPs at -20 °C, and become increasingly important at warmer temperature regime, indicating biogenic particles as major contributors to INPs above -20 °C over the TP, especially, on the day with additional input of biogenic materials carried by dust particles. Chemical analysis demonstrates the rainwater components are mixture of dust particles, marine aerosol, and anthropogenic pollutants. Dust particles transported from the surrounding deserts and originated from ground surface of TP may contribute to the heating-resistant INPs at temperatures below -20 °C

    Freezing experiment of rainwaters collected at the Nam Co Station in central part of the Tibetan Plateau

    No full text
    This data describe the freezing experiment of rainwaters collected in Tibetan Plateau (TP). The data set includes two parts, which are results of untreated samples and samples after being heated to 95 °C in 10 minutes
    corecore