11 research outputs found
RDFC-GAN: RGB-Depth Fusion CycleGAN for Indoor Depth Completion
The raw depth image captured by indoor depth sensors usually has an extensive
range of missing depth values due to inherent limitations such as the inability
to perceive transparent objects and the limited distance range. The incomplete
depth map with missing values burdens many downstream vision tasks, and a
rising number of depth completion methods have been proposed to alleviate this
issue. While most existing methods can generate accurate dense depth maps from
sparse and uniformly sampled depth maps, they are not suitable for
complementing large contiguous regions of missing depth values, which is common
and critical in images captured in indoor environments. To overcome these
challenges, we design a novel two-branch end-to-end fusion network named
RDFC-GAN, which takes a pair of RGB and incomplete depth images as input to
predict a dense and completed depth map. The first branch employs an
encoder-decoder structure, by adhering to the Manhattan world assumption and
utilizing normal maps from RGB-D information as guidance, to regress the local
dense depth values from the raw depth map. In the other branch, we propose an
RGB-depth fusion CycleGAN to transfer the RGB image to the fine-grained
textured depth map. We adopt adaptive fusion modules named W-AdaIN to propagate
the features across the two branches, and we append a confidence fusion head to
fuse the two outputs of the branches for the final depth map. Extensive
experiments on NYU-Depth V2 and SUN RGB-D demonstrate that our proposed method
clearly improves the depth completion performance, especially in a more
realistic setting of indoor environments, with the help of our proposed pseudo
depth maps in training.Comment: Haowen Wang and Zhengping Che are with equal contributions. Under
review. An earlier version has been accepted by CVPR 2022 (arXiv:2203.10856
DTF-Net: Category-Level Pose Estimation and Shape Reconstruction via Deformable Template Field
Estimating 6D poses and reconstructing 3D shapes of objects in open-world
scenes from RGB-depth image pairs is challenging. Many existing methods rely on
learning geometric features that correspond to specific templates while
disregarding shape variations and pose differences among objects in the same
category. As a result, these methods underperform when handling unseen object
instances in complex environments. In contrast, other approaches aim to achieve
category-level estimation and reconstruction by leveraging normalized geometric
structure priors, but the static prior-based reconstruction struggles with
substantial intra-class variations. To solve these problems, we propose the
DTF-Net, a novel framework for pose estimation and shape reconstruction based
on implicit neural fields of object categories. In DTF-Net, we design a
deformable template field to represent the general category-wise shape latent
features and intra-category geometric deformation features. The field
establishes continuous shape correspondences, deforming the category template
into arbitrary observed instances to accomplish shape reconstruction. We
introduce a pose regression module that shares the deformation features and
template codes from the fields to estimate the accurate 6D pose of each object
in the scene. We integrate a multi-modal representation extraction module to
extract object features and semantic masks, enabling end-to-end inference.
Moreover, during training, we implement a shape-invariant training strategy and
a viewpoint sampling method to further enhance the model's capability to
extract object pose features. Extensive experiments on the REAL275 and CAMERA25
datasets demonstrate the superiority of DTF-Net in both synthetic and real
scenes. Furthermore, we show that DTF-Net effectively supports grasping tasks
with a real robot arm.Comment: The first two authors are with equal contributions. Paper accepted by
ACM MM 202
Edge-Assisted Distributed DNN Collaborative Computing Approach for Mobile Web Augmented Reality in 5G Networks
Web-based DNNs provide accurate object recognition to the mobile Web AR, which is newly emerging as a lightweight mobile AR solution. Webbased DNNs are attracting a great deal of attention. However, balancing the UX against the computing cost for DNN-based object recognition on the Web is difficult for both self-contained and cloud-based offloading approaches, as it is a latency-sensitive service but also has high requirements in terms of computing and networking abilities. Fortunately, the emerging 5G networks promise not only bandwidth and latency improvement but also the pervasive deployment of edge servers which are closer to the users. In this article, we propose the first edge-based collaborative object recognition solution for mobile Web AR in the 5G era. First, we explore the finegrained and adaptive DNN partitioning for the collaboration between the cloud, the edge, and the mobile Web browser. Second, we propose a differentiated DNN computation scheduling approach specially designed for the edge platform. On one hand, performing part of DNN computations on mobile Web without decreasing the UX (i.e., keep response latency below a specific threshold) will effectively reduce the computing cost of the cloud system; on the other hand, performing the remaining DNN computations on the cloud (including remote and edge cloud) will also improve the inference latency and thus UX when compared to the self-contained solution. Obviously, our collaborative solution will balance the interests of both users and service providers. Experiments have been conducted in an actually deployed 5G trial network, and the results show the superiority of our proposed collaborative solution
Toward holographic video communications:a promising AI-driven solution
Abstract
Real-time holographic video communications enable immersive experiences for next-generation video services in the future metaverse era. However, high-fidelity holographic videos require high bandwidth and significant computation resources, which exceed the transferring and computing capacity of 5G networks. This article reviews state-of-the-art holographic point cloud video transmission techniques and highlights the critical challenges of delivering such immersive services. We further implement a preliminary prototype of an AI-driven holographic video communication system and present critical experimental results to evaluate its performance. Finally, we identify future research directions and discuss potential solutions for providing real-time and high-quality holographic experiences
Nano-Montmorillonite Regulated Crystallization of Hierarchical Strontium Carbonate in a Microbial Mineralization System
In this paper, nano-montmorillonite (nano-MMT) was introduced into the microbial mineralization system of strontium carbonate (SrCO3). By changing the nano-MMT concentration and the mineralization time, the mechanism of mineralization was studied. SrCO3 superstructures with complex forms were acquired in the presence of nano-MMT as a crystal growth regulator. At low concentrations of nano-MMT, a cross-shaped SrCO3 superstructure was obtained. As the concentration increased, flower-like SrCO3 crystals formed via the dissolution and recrystallization processes. An emerging self-assembly process and crystal polymerization mechanism have been proposed by forming complex flower-like SrCO3 superstructures in high concentrations of nano-MMT. The above research indicated that unique bionic synthesis strategies in microbial systems could not only provide a useful route for the production of inorganic or inorganic/organic composites with a novel morphology and unique structure but also provide new ideas for the treatment of radionuclides