944 research outputs found
Robust 2D and 3D registration with deep neural networks
Recovering 3D geometry is a crucial task in computer vision, essential for accurate world reconstruction and perception. Modern applications in AR, VR, autonomous driving, and medical imaging rely heavily on 3D and 4D reconstruction techniques. This thesis aims to enhance registration methods, which play a key role in reconstruction, by fusing classical multi-view geometry and deep neural networks. We explore this theme in three primary directions, each distinguished by registration dimensionality: 3D–3D, 3D–2D, and 2D–2D.
First, we focus on improving the alignment of 3D point clouds in both rigid and non-rigid scenarios. In non-rigid 3D registration, traditional methods directly optimize a motion field between a source and target surface. This often leads to slow convergence and being trapped in local minima. We introduce a neural network-based scene flow to initialize the optimization, providing a more efficient and robust solution. Additionally, we present a novel surface normal estimation technique that aids both rigid and non-rigid registration. Unlike conventional methods that use a fixed global neighbor parameter, our approach employs a self-attention mechanism to adapt to local geometry variations.
Second, we address the challenge of registering 2D images to 3D Neural Radiance Fields (NeRF) through joint optimization of NeRF and camera parameters. Original NeRF training mandates pre-processed camera parameters, creating a bottleneck in the workflow. Our approach allows for end-to-end camera parameter estimation during NeRF training while reusing the existing photometric loss in NeRF. We further extend this to account for larger camera movements by incorporating a monocular depth prior.
Lastly, we propose a method for interest point discovery, which is beneficial for 2D image registration. Unlike existing interest point identification methods that suffer from significant viewpoint changes and occlusion boundaries, we propose a multi-view interest point discovery approach to address these limitations. Our method is trained in a self-supervised fashion with pure-geometric constraints that encourage point identification repeatability, sparsity, and multi-view consistency.
In summary, this thesis explores the fusion of traditional multi-view geometry concepts with deep learning priors in various registration tasks, including point cloud registration, image-to-NeRF registration, and image-to-image registration
Neighbourhood-insensitive point cloud normal estimation network
We introduce a novel self-attention-based normal estimation network that is able to focus softly on relevant points and adjust the softness by learning a temperature parameter, making it able to work naturally and effectively within a large neighbourhood range. As a result, our model outperforms all existing normal estimation algorithms by a large margin, achieving 94.1% accuracy in comparison with the previous state of the art of 91.2%, with a 25x smaller model and 12x faster inference time. We also use point-to-plane Iterative Closest Point (ICP) as an application case to show that our normal estimations lead to faster convergence than normal estimations from other methods, without manually fine-tuning neighbourhood range parameters. Code available at https://code.active.vision
Neighbourhood-Insensitive Point Cloud Normal Estimation Network
We introduce a novel self-attention-based normal estimation network that is
able to focus softly on relevant points and adjust the softness by learning a
temperature parameter, making it able to work naturally and effectively within
a large neighbourhood range. As a result, our model outperforms all existing
normal estimation algorithms by a large margin, achieving 94.1% accuracy in
comparison with the previous state of the art of 91.2%, with a 25x smaller
model and 12x faster inference time. We also use point-to-plane Iterative
Closest Point (ICP) as an application case to show that our normal estimations
lead to faster convergence than normal estimations from other methods, without
manually fine-tuning neighbourhood range parameters. Code available at
https://code.active.vision.Comment: Accepted in BMVC 2020 as oral presentation. Code available at
https://code.active.vision and project page at http://ninormal.active.visio
Direct-PoseNet: Absolute Pose Regression with Photometric Consistency
We present a relocalization pipeline, which combines an absolute pose
regression (APR) network with a novel view synthesis based direct matching
module, offering superior accuracy while maintaining low inference time. Our
contribution is twofold: i) we design a direct matching module that supplies a
photometric supervision signal to refine the pose regression network via
differentiable rendering; ii) we modify the rotation representation from the
classical quaternion to SO(3) in pose regression, removing the need for
balancing rotation and translation loss terms. As a result, our network
Direct-PoseNet achieves state-of-the-art performance among all other
single-image APR methods on the 7-Scenes benchmark and the LLFF dataset
Observation of the decays B0(s) → Ds1(2536)∓K±
[Abstract] This paper reports the observation of the decays B0(s) → Ds1(2536)∓K± using proton-proton collision data collected by the LHCb experiment, corresponding to an integrated luminosity of 9 fb−1. The branching fractions of these decays are measured relative to the normalisation channel B0 → D 0 K+K−. The Ds1(2536)− meson is reconstructed in the D∗(2007)0K− decay channel and the products of branching fractions are measured to be
B(B0s→ Ds1(2536)∓K±) × B(Ds1(2536)−→ D∗(2007)0K−)
= (2.49 ± 0.11 ± 0.12 ± 0.25 ± 0.06) × 10−5,
B(B0→ Ds1(2536)∓K±) × B(Ds1(2536)−→ D∗(2007)0K−)
= (0.510 ± 0.021 ± 0.036 ± 0.050) × 10−5.
The first uncertainty is statistical, the second systematic, and the third arises from the uncertainty of the branching fraction of the B0 → D0K+K− normalisation channel. The last uncertainty in the B0s result is due to the limited knowledge of the fragmentation fraction ratio, fs/fd. The significance for the B0
s and B0 signals is larger than 10 σ. The ratio of the helicity amplitudes which governs the angular distribution of the Ds1(2536)− → D∗(2007)0K− decay is determined from the data. The ratio of the S- and D-wave amplitudes is found to be 1.11±0.15±0.06 and the phase difference between them 0.70 ± 0.09 ± 0.04 rad, where the first uncertainty is statistical and the second systematic.We express our gratitude to our colleagues in the CERN accelerator departments for the excellent performance of the LHC. We thank the technical and administrative staf at the LHCb institutes. We acknowledge support from CERN and from the national agencies: CAPES, CNPq, FAPERJ and FINEP (Brazil); MOST and NSFC (China); CNRS/IN2P3 (France); BMBF, DFG and MPG (Germany); INFN (Italy); NWO (Netherlands); MNiSW and NCN (Poland); MCID/IFA (Romania); MICINN (Spain); SNSF and SER (Switzerland); NASU (Ukraine); STFC (United Kingdom); DOE NP and NSF (USA). We acknowledge the computing resources that are provided by CERN, IN2P3 (France), KIT and DESY (Germany), INFN (Italy), SURF (Netherlands), PIC (Spain), GridPP (United Kingdom), CSCS (Switzerland), IFIN-HH (Romania), CBPF (Brazil), Polish WLCG (Poland) and NERSC (USA). We are indebted to the communities behind the multiple open-source software packages on which we depend. Individual groups or members have received support from ARC and ARDC (Australia); Minciencias (Colombia); AvH Foundation (Germany); EPLANET, Marie Skłodowska-Curie Actions, ERC and NextGenerationEU (European
Union); A*MIDEX, ANR, IPhU and Labex P2IO, and Région Auvergne-Rhône-Alpes (France); Key Research Program of Frontier Sciences of CAS, CAS PIFI, CAS CCEPP, Fundamental Research Funds for the Central Universities, and Sci. & Tech. Program of Guangzhou (China); GVA, XuntaGal, GENCAT, Inditex, InTalent and Prog. Atracción Talento, CM (Spain); SRC (Sweden); the Leverhulme Trust, the Royal Society and UKRI (United Kingdom)
Reachability-Based Confidence-Aware Probabilistic Collision Detection in Highway Driving
Risk assessment is a crucial component of collision warning and avoidance
systems in intelligent vehicles. To accurately detect potential vehicle
collisions, reachability-based formal approaches have been developed to ensure
driving safety, but suffer from over-conservatism, potentially leading to
false-positive risk events in complicated real-world applications. In this
work, we combine two reachability analysis techniques, i.e., backward reachable
set (BRS) and stochastic forward reachable set (FRS), and propose an integrated
probabilistic collision detection framework in highway driving. Within the
framework, we can firstly use a BRS to formally check whether a two-vehicle
interaction is safe; otherwise, a prediction-based stochastic FRS is employed
to estimate a collision probability at each future time step. In doing so, the
framework can not only identify non-risky events with guaranteed safety, but
also provide accurate collision risk estimation in safety-critical events. To
construct the stochastic FRS, we develop a neural network-based acceleration
model for surrounding vehicles, and further incorporate confidence-aware
dynamic belief to improve the prediction accuracy. Extensive experiments are
conducted to validate the performance of the acceleration prediction model
based on naturalistic highway driving data, and the efficiency and
effectiveness of the framework with the infused confidence belief are tested
both in naturalistic and simulated highway scenarios. The proposed risk
assessment framework is promising in real-world applications.Comment: Under review at Engineering. arXiv admin note: text overlap with
arXiv:2205.0135
- …