120 research outputs found
Self-Supervised Pre-training for 3D Point Clouds via View-Specific Point-to-Image Translation
The past few years have witnessed the great success and prevalence of
self-supervised representation learning within the language and 2D vision
communities. However, such advancements have not been fully migrated to the
field of 3D point cloud learning. Different from existing pre-training
paradigms designed for deep point cloud feature extractors that fall into the
scope of generative modeling or contrastive learning, this paper proposes a
translative pre-training framework, namely PointVST, driven by a novel
self-supervised pretext task of cross-modal translation from 3D point clouds to
their corresponding diverse forms of 2D rendered images. More specifically, we
begin with deducing view-conditioned point-wise embeddings through the
insertion of the viewpoint indicator, and then adaptively aggregate a
view-specific global codeword, which can be further fed into subsequent 2D
convolutional translation heads for image generation. Extensive experimental
evaluations on various downstream task scenarios demonstrate that our PointVST
shows consistent and prominent performance superiority over current
state-of-the-art approaches as well as satisfactory domain transfer capability.
Our code will be publicly available at https://github.com/keeganhk/PointVST
PointMCD: Boosting Deep Point Cloud Encoders via Multi-view Cross-modal Distillation for 3D Shape Recognition
As two fundamental representation modalities of 3D objects, 3D point clouds
and multi-view 2D images record shape information from different domains of
geometric structures and visual appearances. In the current deep learning era,
remarkable progress in processing such two data modalities has been achieved
through respectively customizing compatible 3D and 2D network architectures.
However, unlike multi-view image-based 2D visual modeling paradigms, which have
shown leading performance in several common 3D shape recognition benchmarks,
point cloud-based 3D geometric modeling paradigms are still highly limited by
insufficient learning capacity, due to the difficulty of extracting
discriminative features from irregular geometric signals. In this paper, we
explore the possibility of boosting deep 3D point cloud encoders by
transferring visual knowledge extracted from deep 2D image encoders under a
standard teacher-student distillation workflow. Generally, we propose PointMCD,
a unified multi-view cross-modal distillation architecture, including a
pretrained deep image encoder as the teacher and a deep point encoder as the
student. To perform heterogeneous feature alignment between 2D visual and 3D
geometric domains, we further investigate visibility-aware feature projection
(VAFP), by which point-wise embeddings are reasonably aggregated into
view-specific geometric descriptors. By pair-wisely aligning multi-view visual
and geometric descriptors, we can obtain more powerful deep point encoders
without exhausting and complicated network modification. Experiments on 3D
shape classification, part segmentation, and unsupervised learning strongly
validate the effectiveness of our method. The code and data will be publicly
available at https://github.com/keeganhk/PointMCD
Bidirectional Propagation for Cross-Modal 3D Object Detection
Recent works have revealed the superiority of feature-level fusion for
cross-modal 3D object detection, where fine-grained feature propagation from 2D
image pixels to 3D LiDAR points has been widely adopted for performance
improvement. Still, the potential of heterogeneous feature propagation between
2D and 3D domains has not been fully explored. In this paper, in contrast to
existing pixel-to-point feature propagation, we investigate an opposite
point-to-pixel direction, allowing point-wise features to flow inversely into
the 2D image branch. Thus, when jointly optimizing the 2D and 3D streams, the
gradients back-propagated from the 2D image branch can boost the representation
ability of the 3D backbone network working on LiDAR point clouds. Then,
combining pixel-to-point and point-to-pixel information flow mechanisms, we
construct an bidirectional feature propagation framework, dubbed BiProDet. In
addition to the architectural design, we also propose normalized local
coordinates map estimation, a new 2D auxiliary task for the training of the 2D
image branch, which facilitates learning local spatial-aware features from the
image modality and implicitly enhances the overall 3D detection performance.
Extensive experiments and ablation studies validate the effectiveness of our
method. Notably, we rank on the highly competitive
KITTI benchmark on the cyclist class by the time of submission. The source code
is available at https://github.com/Eaphan/BiProDet.Comment: Accepted by ICLR2023. Code is avaliable at
https://github.com/Eaphan/BiProDe
GLENet: Boosting 3D Object Detectors with Generative Label Uncertainty Estimation
The inherent ambiguity in ground-truth annotations of 3D bounding boxes
caused by occlusions, signal missing, or manual annotation errors can confuse
deep 3D object detectors during training, thus deteriorating the detection
accuracy. However, existing methods overlook such issues to some extent and
treat the labels as deterministic. In this paper, we formulate the label
uncertainty problem as the diversity of potentially plausible bounding boxes of
objects, then propose GLENet, a generative framework adapted from conditional
variational autoencoders, to model the one-to-many relationship between a
typical 3D object and its potential ground-truth bounding boxes with latent
variables. The label uncertainty generated by GLENet is a plug-and-play module
and can be conveniently integrated into existing deep 3D detectors to build
probabilistic detectors and supervise the learning of the localization
uncertainty. Besides, we propose an uncertainty-aware quality estimator
architecture in probabilistic detectors to guide the training of IoU-branch
with predicted localization uncertainty. We incorporate the proposed methods
into various popular base 3D detectors and demonstrate significant and
consistent performance gains on both KITTI and Waymo benchmark datasets.
Especially, the proposed GLENet-VR outperforms all published LiDAR-based
approaches by a large margin and ranks among single-modal methods on
the challenging KITTI test set. We will make the source code and pre-trained
models publicly available
Multilevel Perception Boundary-guided Network for Breast Lesion Segmentation in Ultrasound Images
Automatic segmentation of breast tumors from the ultrasound images is
essential for the subsequent clinical diagnosis and treatment plan. Although
the existing deep learning-based methods have achieved significant progress in
automatic segmentation of breast tumor, their performance on tumors with
similar intensity to the normal tissues is still not pleasant, especially for
the tumor boundaries. To address this issue, we propose a PBNet composed by a
multilevel global perception module (MGPM) and a boundary guided module (BGM)
to segment breast tumors from ultrasound images. Specifically, in MGPM, the
long-range spatial dependence between the voxels in a single level feature maps
are modeled, and then the multilevel semantic information is fused to promote
the recognition ability of the model for non-enhanced tumors. In BGM, the tumor
boundaries are extracted from the high-level semantic maps using the dilation
and erosion effects of max pooling, such boundaries are then used to guide the
fusion of low and high-level features. Moreover, to improve the segmentation
performance for tumor boundaries, a multi-level boundary-enhanced segmentation
(BS) loss is proposed. The extensive comparison experiments on both publicly
available dataset and in-house dataset demonstrate that the proposed PBNet
outperforms the state-of-the-art methods in terms of both qualitative
visualization results and quantitative evaluation metrics, with the Dice score,
Jaccard coefficient, Specificity and HD95 improved by 0.70%, 1.1%, 0.1% and
2.5% respectively. In addition, the ablation experiments validate that the
proposed MGPM is indeed beneficial for distinguishing the non-enhanced tumors
and the BGM as well as the BS loss are also helpful for refining the
segmentation contours of the tumor.Comment: 12pages,5 figure
NeuroGF: A Neural Representation for Fast Geodesic Distance and Path Queries
Geodesics are essential in many geometry processing applications. However,
traditional algorithms for computing geodesic distances and paths on 3D mesh
models are often inefficient and slow. This makes them impractical for
scenarios that require extensive querying of arbitrary point-to-point
geodesics. Although neural implicit representations have emerged as a popular
way of representing 3D shape geometries, there is still no research on
representing geodesics with deep implicit functions. To bridge this gap, this
paper presents the first attempt to represent geodesics on 3D mesh models using
neural implicit functions. Specifically, we introduce neural geodesic fields
(NeuroGFs), which are learned to represent the all-pairs geodesics of a given
mesh. By using NeuroGFs, we can efficiently and accurately answer queries of
arbitrary point-to-point geodesic distances and paths, overcoming the
limitations of traditional algorithms. Evaluations on common 3D models show
that NeuroGFs exhibit exceptional performance in solving the single-source
all-destination (SSAD) and point-to-point geodesics, and achieve high accuracy
consistently. Moreover, NeuroGFs offer the unique advantage of encoding both 3D
geometry and geodesics in a unified representation. Code is made available at
https://github.com/keeganhk/NeuroGF/tree/master
A new smart mobile system for chronic wound care management
Nonhealing wounds pose a major challenge in clinical medicine. Typical chronic wounds, such as diabetic foot ulcers and venous leg ulcers, have brought substantial difficulties to millions of patients around the world. The management of chronic wound care remains challenging in terms of precise wound size measurement, comprehensive wound assessment, timely wound healing monitoring, and efficient wound case management. Despite the rapid progress of digital health technologies in recent years, practical smart wound care management systems are yet to be developed. One of the main difficulties is in-depth communication and interaction with nurses and doctors throughout the complex wound care process. This paper presents a systematic approach for the user-centered design and development of a new smart mobile system for the management of chronic wound care that manages the nurse's task flow and meets the requirements for the care of different types of wounds in both clinic and hospital wards. The system evaluation and satisfaction review was carried out with a group of ten nurses from various clinical departments after using the system for over one month. The survey results demonstrated high effectiveness and usability of the smart mobile system for chronic wound care management, in contrast to the traditional pen-and-paper approach, in busy clinical contexts
A study on the appropriate dose of rocuronium for intraoperative neuromonitoring in Da Vinci robot thyroid surgery: a randomized, double-blind, controlled trial
BackgroundThis study was to explore the effect of different doses of rocuronium bromide on neuromonitoring during Da Vinci robot thyroid surgery.MethodsThis was a prospective, randomized, double-blind, controlled trial that included 189 patients who underwent Da Vinci robot thyroidectomy with intraoperative neuromonitoring(IONM). Patients were randomly divided into three groups and given three different doses of rocuronium (0.3mg/kg, 0.6mg/kg, 0.9mg/kg). Outcome measurements included IONM evoked potential, postoperative Voice Handicap Index-30(VHI-30), intraoperative body movement incidence rate, Cooper score, and hemodynamic changes during anesthesia induction.Results: The difference in IONM evoked potentials at various time points between the three groups was not statistically significant (P>0.05). The difference in Cooper scores and intraoperative body movement incidence rate between 0.6 and 0.9mg/kg groups was statistically significant compared with the 0.3mg/kg group (both P<0.001). There was no statistically significant difference in VHI-30 score and hemodynamic changes during anesthesia induction among the three groups (both P>0.05).ConclusionsFor patients undergoing Da Vinci robot thyroidectomy, a single dose of rocuronium at 0.6 and 0.9mg/kg during anesthesia induction can provide stable IONM evoked potential. Additionally, compared to 0.3 mg/kg, it can offer better tracheal intubation conditions and lower incidence of body movements during surgery. It is worth noting that the use of higher doses of rocuronium should be adjusted based on the duration of IONM and local practices
- …