15 research outputs found

    REC-MV: REconstructing 3D Dynamic Cloth from Monocular Videos

    Full text link
    Reconstructing dynamic 3D garment surfaces with open boundaries from monocular videos is an important problem as it provides a practical and low-cost solution for clothes digitization. Recent neural rendering methods achieve high-quality dynamic clothed human reconstruction results from monocular video, but these methods cannot separate the garment surface from the body. Moreover, despite existing garment reconstruction methods based on feature curve representation demonstrating impressive results for garment reconstruction from a single image, they struggle to generate temporally consistent surfaces for the video input. To address the above limitations, in this paper, we formulate this task as an optimization problem of 3D garment feature curves and surface reconstruction from monocular video. We introduce a novel approach, called REC-MV, to jointly optimize the explicit feature curves and the implicit signed distance field (SDF) of the garments. Then the open garment meshes can be extracted via garment template registration in the canonical space. Experiments on multiple casually captured datasets show that our approach outperforms existing methods and can produce high-quality dynamic garment surfaces. The source code is available at https://github.com/GAP-LAB-CUHK-SZ/REC-MV.Comment: CVPR2023; Project Page:https://lingtengqiu.github.io/2023/REC-MV

    SAMPro3D: Locating SAM Prompts in 3D for Zero-Shot Scene Segmentation

    Full text link
    We introduce SAMPro3D for zero-shot 3D indoor scene segmentation. Given the 3D point cloud and multiple posed 2D frames of 3D scenes, our approach segments 3D scenes by applying the pretrained Segment Anything Model (SAM) to 2D frames. Our key idea involves locating 3D points in scenes as natural 3D prompts to align their projected pixel prompts across frames, ensuring frame-consistency in both pixel prompts and their SAM-predicted masks. Moreover, we suggest filtering out low-quality 3D prompts based on feedback from all 2D frames, for enhancing segmentation quality. We also propose to consolidate different 3D prompts if they are segmenting the same object, bringing a more comprehensive segmentation. Notably, our method does not require any additional training on domain-specific data, enabling us to preserve the zero-shot power of SAM. Extensive qualitative and quantitative results show that our method consistently achieves higher quality and more diverse segmentation than previous zero-shot or fully supervised approaches, and in many cases even surpasses human-level annotations. The project page can be accessed at https://mutianxu.github.io/sampro3d/.Comment: Project page: https://mutianxu.github.io/sampro3d

    Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks

    Full text link
    Despite the rapid advancement of unsupervised learning in visual representation, it requires training on large-scale datasets that demand costly data collection, and pose additional challenges due to concerns regarding data privacy. Recently, synthetic images generated by text-to-image diffusion models, have shown great potential for benefiting image recognition. Although promising, there has been inadequate exploration dedicated to unsupervised learning on diffusion-generated images. To address this, we start by uncovering that diffusion models' cross-attention layers inherently provide annotation-free attention masks aligned with corresponding text inputs on generated images. We then investigate the problems of three prevalent unsupervised learning techniques ( i.e., contrastive learning, masked modeling, and vision-language pretraining) and introduce customized solutions by fully exploiting the aforementioned free attention masks. Our approach is validated through extensive experiments that show consistent improvements in baseline models across various downstream tasks, including image classification, detection, segmentation, and image-text retrieval. By utilizing our method, it is possible to close the performance gap between unsupervised pretraining on synthetic data and real-world scenarios

    MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with Informative-Preserved Reconstruction and Self-Distilled Consistency

    Full text link
    Masked Modeling (MM) has demonstrated widespread success in various vision challenges, by reconstructing masked visual patches. Yet, applying MM for large-scale 3D scenes remains an open problem due to the data sparsity and scene complexity. The conventional random masking paradigm used in 2D images often causes a high risk of ambiguity when recovering the masked region of 3D scenes. To this end, we propose a novel informative-preserved reconstruction, which explores local statistics to discover and preserve the representative structured points, effectively enhancing the pretext masking task for 3D scene understanding. Integrated with a progressive reconstruction manner, our method can concentrate on modeling regional geometry and enjoy less ambiguity for masked reconstruction. Besides, such scenes with progressive masking ratios can also serve to self-distill their intrinsic spatial consistency, requiring to learn the consistent representations from unmasked areas. By elegantly combining informative-preserved reconstruction on masked areas and consistency self-distillation from unmasked areas, a unified framework called MM-3DScene is yielded. We conduct comprehensive experiments on a host of downstream tasks. The consistent improvement (e.g., +6.1 [email protected] on object detection and +2.2% mIoU on semantic segmentation) demonstrates the superiority of our approach

    An experimental validation method on GNSS signal attenuation model in soil

    Get PDF
    The attenuation of GNSS signals in soil is of great significance for the related research of using GNSS signals to measure soil moisture. In this paper, for the first time, the attenuation of BDS (BeiDou navigation satellite system) and GPS (global positioning system) signals in the soil was studied through experiments. In the experimental design, the GNSS antenna was placed into the soil, then the soil thickness and moisture above the antenna were continuously changed to collect the power attenuation data of the GNSS signal. Finally, these data were used to retrieve soil moisture in order to validate the GNSS signal attenuation model. Experimental results show that soil can significantly attenuate GNSS signals. The greater the soil moisture value and thickness value is, the more severe the attenuation is. In the case of clay type soil and soil moisture of 0.15~0.30 cm3/cm3, the GNSS signal power has been attenuated to be undetectable by the GNSS receiver when the soil thickness reaches 21 cm. Further retrieval of soil moisture based on the GNSS signal attenuation model was carried out, the results show that the attenuation model is more accurate when the soil thickness is larger than or equal to 10 cm and when the satellite elevation angle is larger than 50°. And under this situation, the root mean square error of soil moisture retrieval using BeiDou B1 signal and GPS L1 signal is less than 0.04 cm3/cm3 and 0.09 cm3/cm3, respectively

    TO-Scene: A Large-scale Dataset for Understanding 3D Tabletop Scenes

    Full text link
    Many basic indoor activities such as eating or writing are always conducted upon different tabletops (e.g., coffee tables, writing desks). It is indispensable to understanding tabletop scenes in 3D indoor scene parsing applications. Unfortunately, it is hard to meet this demand by directly deploying data-driven algorithms, since 3D tabletop scenes are rarely available in current datasets. To remedy this defect, we introduce TO-Scene, a large-scale dataset focusing on tabletop scenes, which contains 20,740 scenes with three variants. To acquire the data, we design an efficient and scalable framework, where a crowdsourcing UI is developed to transfer CAD objects from ModelNet and ShapeNet onto tables from ScanNet, then the output tabletop scenes are simulated into real scans and annotated automatically. Further, a tabletop-aware learning strategy is proposed for better perceiving the small-sized tabletop instances. Notably, we also provide a real scanned test set TO-Real to verify the practical value of TO-Scene. Experiments show that the algorithms trained on TO-Scene indeed work on the realistic test data, and our proposed tabletop-aware learning strategy greatly improves the state-of-the-art results on both 3D semantic segmentation and object detection tasks. Dataset and code are available at https://github.com/GAP-LAB-CUHK-SZ/TO-Scene.Comment: ECCV 2022 (Oral Presentation

    A Semi-Empirical SNR Model for Soil Moisture Retrieval Using GNSS SNR Data

    No full text
    The Global Navigation Satellite System-Interferometry and Reflectometry (GNSS-IR) technique on soil moisture remote sensing was studied. A semi-empirical Signal-to-Noise Ratio (SNR) model was proposed as a curve-fitting model for SNR data routinely collected by a GNSS receiver. This model aims at reconstructing the direct and reflected signal from SNR data and at the same time extracting frequency and phase information that is affected by soil moisture as proposed by K. M. Larson et al. This is achieved empirically through approximating the direct and reflected signal by a second-order and fourth-order polynomial, respectively, based on the well-established SNR model. Compared with other models (K. M. Larson et al., T. Yang et al.), this model can improve the Quality of Fit (QoF) with little prior knowledge needed and can allow soil permittivity to be estimated from the reconstructed signals. In developing this model, we showed how noise affects the receiver SNR estimation and thus the model performance through simulations under the bare soil assumption. Results showed that the reconstructed signals with a grazing angle of 5°–15° were better for soil moisture retrieval. The QoF was improved by around 45%, which resulted in better estimation of the frequency and phase information. However, we found that the improvement on phase estimation could be neglected. Experimental data collected at Lamasquère, France, were also used to validate the proposed model. The results were compared with the simulation and previous works. It was found that the model could ensure good fitting quality even in the case of irregular SNR variation. Additionally, the soil moisture calculated from the reconstructed signals was about 15% closer in relation to the ground truth measurements. A deeper insight into the Larson model and the proposed model was given at this stage, which formed a possible explanation of this fact. Furthermore, frequency and phase information extracted using this model were also studied for their capability to monitor soil moisture variation. Finally, phenomena such as retrieval ambiguity and error sensitivity were stated and discussed

    Soil Moisture Retrieval Utilizing GNSS Interference Signal Amplitude

    No full text
    A soil moisture retrieval model was developed using GNSS interference signal amplitude, according to the interference phenomenon and GNSS receiver SNR estimation method. Antenna gain, soil permittivity and noise effect was considered in this model. The AMPD algorithm was used to extract the interference peaks and valleys from noisy normalized interference power which were then used to retrieve soil permittivity and moisture, and a simulation was performed to verify its feasibility. Results showed that the soil moisture retrieval performance using interference valleys was better compared to that using peaks, the relatively stable retrieval elevation angle range is 5&#176;~25&#176;, and the retrieved value was more accurate when moisture is larger than 0.06 cm<sup>3</sup>/cm<sup>3</sup>,with the standard deviation around 0.01 cm<sup>3</sup>/cm<sup>3</sup>

    HybridCap: Inertia-Aid Monocular Capture of Challenging Human Motions

    No full text
    Monocular 3D motion capture (mocap) is beneficial to many applications. The use of a single camera, however, often fails to handle occlusions of different body parts and hence it is limited to capture relatively simple movements. We present a light-weight, hybrid mocap technique called HybridCap that augments the camera with only 4 Inertial Measurement Units (IMUs) in a novel learning-and-optimization framework. We first employ a weakly-supervised and hierarchical motion inference module based on cooperative pure residual recurrent blocks that serve as limb, body and root trackers as well as an inverse kinematics solver. Our network effectively narrows the search space of plausible motions via coarse-to-fine pose estimation and manages to tackle challenging movements with high efficiency. We further develop a hybrid optimization scheme that combines inertial feedback and visual cues to improve tracking accuracy. Extensive experiments on various datasets demonstrate HybridCap can robustly handle challenging movements ranging from fitness actions to Latin dance. It also achieves real-time performance up to 60 fps with state-of-the-art accuracy
    corecore