88 research outputs found
Hinge-Wasserstein: Mitigating Overconfidence in Regression by Classification
Modern deep neural networks are prone to being overconfident despite their
drastically improved performance. In ambiguous or even unpredictable real-world
scenarios, this overconfidence can pose a major risk to the safety of
applications. For regression tasks, the regression-by-classification approach
has the potential to alleviate these ambiguities by instead predicting a
discrete probability density over the desired output. However, a density
estimator still tends to be overconfident when trained with the common NLL
loss. To mitigate the overconfidence problem, we propose a loss function,
hinge-Wasserstein, based on the Wasserstein Distance. This loss significantly
improves the quality of both aleatoric and epistemic uncertainty, compared to
previous work. We demonstrate the capabilities of the new loss on a synthetic
dataset, where both types of uncertainty are controlled separately. Moreover,
as a demonstration for real-world scenarios, we evaluate our approach on the
benchmark dataset Horizon Lines in the Wild. On this benchmark, using the
hinge-Wasserstein loss reduces the Area Under Sparsification Error (AUSE) for
horizon parameters slope and offset, by 30.47% and 65.00%, respectively
Towards Safer Robot-Assisted Surgery: A Markerless Augmented Reality Framework
Robot-assisted surgery is rapidly developing in the medical field, and the
integration of augmented reality shows the potential of improving the surgeons'
operation performance by providing more visual information. In this paper, we
proposed a markerless augmented reality framework to enhance safety by avoiding
intra-operative bleeding which is a high risk caused by the collision between
the surgical instruments and the blood vessel. Advanced stereo reconstruction
and segmentation networks are compared to find out the best combination to
reconstruct the intra-operative blood vessel in the 3D space for the
registration of the pre-operative model, and the minimum distance detection
between the instruments and the blood vessel is implemented. A robot-assisted
lymphadenectomy is simulated on the da Vinci Research Kit in a dry lab, and ten
human subjects performed this operation to explore the usability of the
proposed framework. The result shows that the augmented reality framework can
help the users to avoid the dangerous collision between the instruments and the
blood vessel while not introducing an extra load. It provides a flexible
framework that integrates augmented reality into the medical robot platform to
enhance safety during the operation
DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors
Camera-based 3D object detectors are welcome due to their wider deployment
and lower price than LiDAR sensors. We revisit the prior stereo modeling DSGN
about the stereo volume constructions for representing both 3D geometry and
semantics. We polish the stereo modeling and propose our approach, DSGN++,
aiming for improving information flow throughout the 2D-to-3D pipeline in the
following three main aspects. First, to effectively lift the 2D information to
stereo volume, we propose depth-wise plane sweeping (DPS) that allows denser
connections and extracts depth-guided features. Second, for better grasping
differently spaced features, we present a novel stereo volume -- Dual-view
Stereo Volume (DSV) that integrates front-view and top-view features and
reconstructs sub-voxel depth in the camera frustum. Third, as the foreground
region becomes less dominant in 3D space, we firstly propose a multi-modal data
editing strategy -- Stereo-LiDAR Copy-Paste, which ensures cross-modal
alignment and improves data efficiency. Without bells and whistles, extensive
experiments in various modality setups on the popular KITTI benchmark show that
our method consistently outperforms other camera-based 3D detectors for all
categories. Code will be released at https://github.com/chenyilun95/DSGN2
DDL-MVS: Depth Discontinuity Learning for MVS Networks
Traditional MVS methods have good accuracy but struggle with completeness,
while recently developed learning-based multi-view stereo (MVS) techniques have
improved completeness except accuracy being compromised. We propose depth
discontinuity learning for MVS methods, which further improves accuracy while
retaining the completeness of the reconstruction. Our idea is to jointly
estimate the depth and boundary maps where the boundary maps are explicitly
used for further refinement of the depth maps. We validate our idea and
demonstrate that our strategies can be easily integrated into the existing
learning-based MVS pipeline where the reconstruction depends on high-quality
depth map estimation. Extensive experiments on various datasets show that our
method improves reconstruction quality compared to baseline. Experiments also
demonstrate that the presented model and strategies have good generalization
capabilities. The source code will be available soon
- …