66 research outputs found
Multi-Modal Dataset Acquisition for Photometrically Challenging Object
This paper addresses the limitations of current datasets for 3D vision tasks
in terms of accuracy, size, realism, and suitable imaging modalities for
photometrically challenging objects. We propose a novel annotation and
acquisition pipeline that enhances existing 3D perception and 6D object pose
datasets. Our approach integrates robotic forward-kinematics, external infrared
trackers, and improved calibration and annotation procedures. We present a
multi-modal sensor rig, mounted on a robotic end-effector, and demonstrate how
it is integrated into the creation of highly accurate datasets. Additionally,
we introduce a freehand procedure for wider viewpoint coverage. Both approaches
yield high-quality 3D data with accurate object and camera pose annotations.
Our methods overcome the limitations of existing datasets and provide valuable
resources for 3D vision research.Comment: Accepted at ICCV 2023 TRICKY Worksho
Analytic Derivatives of Quartic-Scaling Doubly Hybrid XYGJ-OS Functional: Theory, Implementation, and Benchmark Comparison with M06-2X and MP2 Geometries for Nonbonded Complexes
Analytic first derivative expression of opposite-spin (OS) ansatz-adapted quartic scaling doubly hybrid XYGJ-OS functional is derived and implemented into Q-Chem. The resulting algorithm scales quartically with system size as in OS-MP2 gradient, by utilizing the combination of Laplace transformation and density
fitting technique. The performance of XYGJ-OS geometry optimization is assessed by comparing the bond lengths and the intermolecular properties in reference coupled cluster methods. For the selected nonbonded complexes in the S22 and S66 data sets used in the present benchmark test, it is shown that XYGJOS geometries are more accurate than M06-2X and RI-MP2, the two quantum chemical methods widely used to obtain accurate geometries for practical systems, and comparable to CCSD(T) geometries
Learning to Discriminate Information for Online Action Detection
From a streaming video, online action detection aims to identify actions in
the present. For this task, previous methods use recurrent networks to model
the temporal sequence of current action frames. However, these methods overlook
the fact that an input image sequence includes background and irrelevant
actions as well as the action of interest. For online action detection, in this
paper, we propose a novel recurrent unit to explicitly discriminate the
information relevant to an ongoing action from others. Our unit, named
Information Discrimination Unit (IDU), decides whether to accumulate input
information based on its relevance to the current action. This enables our
recurrent network with IDU to learn a more discriminative representation for
identifying ongoing actions. In experiments on two benchmark datasets, TVSeries
and THUMOS-14, the proposed method outperforms state-of-the-art methods by a
significant margin. Moreover, we demonstrate the effectiveness of our recurrent
unit by conducting comprehensive ablation studies.Comment: To appear in CVPR 202
Polarimetric Information for Multi-Modal 6D Pose Estimation of Photometrically Challenging Objects with Limited Data
6D pose estimation pipelines that rely on RGB-only or RGB-D data show
limitations for photometrically challenging objects with e.g. textureless
surfaces, reflections or transparency. A supervised learning-based method
utilising complementary polarisation information as input modality is proposed
to overcome such limitations. This supervised approach is then extended to a
self-supervised paradigm by leveraging physical characteristics of polarised
light, thus eliminating the need for annotated real data. The methods achieve
significant advancements in pose estimation by leveraging geometric information
from polarised light and incorporating shape priors and invertible physical
constraints.Comment: Accepted at ICCV 2023 TRICKY Worksho
Robust Monocular Depth Estimation under Challenging Conditions
While state-of-the-art monocular depth estimation approaches achieve
impressive results in ideal settings, they are highly unreliable under
challenging illumination and weather conditions, such as at nighttime or in the
presence of rain. In this paper, we uncover these safety-critical issues and
tackle them with md4all: a simple and effective solution that works reliably
under both adverse and ideal conditions, as well as for different types of
learning supervision. We achieve this by exploiting the efficacy of existing
methods under perfect settings. Therefore, we provide valid training signals
independently of what is in the input. First, we generate a set of complex
samples corresponding to the normal training ones. Then, we train the model by
guiding its self- or full-supervision by feeding the generated samples and
computing the standard losses on the corresponding original images. Doing so
enables a single model to recover information across diverse conditions without
modifications at inference time. Extensive experiments on two challenging
public datasets, namely nuScenes and Oxford RobotCar, demonstrate the
effectiveness of our techniques, outperforming prior works by a large margin in
both standard and challenging conditions. Source code and data are available
at: https://md4all.github.io.Comment: ICCV 2023. Source code and data: https://md4all.github.i
Polarimetric Pose Prediction
Light has many properties that vision sensors can passively measure.
Colour-band separated wavelength and intensity are arguably the most commonly
used for monocular 6D object pose estimation. This paper explores how
complementary polarisation information, i.e. the orientation of light wave
oscillations, influences the accuracy of pose predictions. A hybrid model that
leverages physical priors jointly with a data-driven learning strategy is
designed and carefully tested on objects with different levels of photometric
complexity. Our design significantly improves the pose accuracy compared to
state-of-the-art photometric approaches and enables object pose estimation for
highly reflective and transparent objects. A new multi-modal instance-level 6D
object pose dataset with highly accurate pose annotations for multiple objects
with varying photometric complexity is introduced as a benchmark.Comment: Accepted at ECCV 2022; 25 pages (14 main paper + References + 7
Appendix
Assessment of the modulation degrees of intensity-modulated radiation therapy plans
Background
To evaluate the modulation indices (MIs) for predicting the plan delivery accuracies of intensity-modulated radiation therapy (IMRT) plans.
Methods
A total of 100 dynamic IMRT plans that used TrueBeam STx and 102 dynamic IMRT plans that used Trilogy were selected. For each plan, various MIs were calculated, which included the modulation complexity score (MCS), plan-averaged beam area (PA), plan-averaged beam irregularity (PI), plan-averaged beam modulation (PM), MI quantifying multi-leaf collimator (MLC) speeds (MIs), MI quantifying MLC acceleration (MIa), and MI quantifying MLC acceleration and segment aperture irregularity (MIc,IMRT). To determine plan delivery accuracy, global gamma passing rates, MLC errors of log files, and dose-volumetric parameter differences between original and log file-reconstructed IMRT plans were obtained. To assess the ability of each MI for predicting plan delivery accuracy, Spearmans rank correlation coefficients (rs) between MIs and plan delivery accuracy measures were calculated.
Results
PI showed moderately strong correlations with gamma passing rates in MapCHECK2 measurements of both TrueBeam STx and Trilogy (rs = − 0.591 with p < 0.001 and − 0.427 with p < 0.001 to with gamma criterion of 2%/2 mm, respectively). For ArcCHECK measurements, PI also showed moderately strong correlations with the gamma passing rates in the ArcCHECK measurements of TrueBeam STx and Trilogy (rs = − 0.545 with p < 0.001 and rs = − 0.581 with p < 0.001 with gamma criterion of 2%/2 mm, respectively). The PI showed the second strongest correlation with MLC errors in both TrueBeam STx and Trilogy (rs = 0.861 with p < 0.001 and rs = 0.767 with p < 0.001, respectively). In general, the PI showed moderately strong correlations with every plan delivery accuracy measure.
Conclusions
The PI showed moderately strong correlations with every plan delivery accuracy measure and therefore is a useful predictor of IMRT delivery accuracy.This work was supported by a National Research Foundation of Korea (NRF) grant from the Korea government (MSIP). (No.2017M2A2A7A02020639, No.2017M2A2A7A02020640, No.2017M2A2A7A02020641, No.2017M2A2A7A02020643)
On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks
Learning-based methods to solve dense 3D vision problems typically train on
3D sensor data. The respectively used principle of measuring distances provides
advantages and drawbacks. These are typically not compared nor discussed in the
literature due to a lack of multi-modal datasets. Texture-less regions are
problematic for structure from motion and stereo, reflective material poses
issues for active sensing, and distances for translucent objects are intricate
to measure with existing hardware. Training on inaccurate or corrupt data
induces model bias and hampers generalisation capabilities. These effects
remain unnoticed if the sensor measurement is considered as ground truth during
the evaluation. This paper investigates the effect of sensor errors for the
dense 3D vision tasks of depth estimation and reconstruction. We rigorously
show the significant impact of sensor characteristics on the learned
predictions and notice generalisation issues arising from various technologies
in everyday household environments. For evaluation, we introduce a carefully
designed dataset\footnote{dataset available at
https://github.com/Junggy/HAMMER-dataset} comprising measurements from
commodity sensors, namely D-ToF, I-ToF, passive/active stereo, and monocular
RGB+P. Our study quantifies the considerable sensor noise impact and paves the
way to improved dense vision estimates and targeted data fusion.Comment: Accepted at CVPR 2023, Main Paper + Supp. Mat. arXiv admin note:
substantial text overlap with arXiv:2205.0456
HouseCat6D -- A Large-Scale Multi-Modal Category Level 6D Object Pose Dataset with Household Objects in Realistic Scenarios
Estimating the 6D pose of objects is a major 3D computer vision problem.
Since the promising outcomes from instance-level approaches, research heads
also move towards category-level pose estimation for more practical application
scenarios. However, unlike well-established instance-level pose datasets,
available category-level datasets lack annotation quality and provided pose
quantity. We propose the new category-level 6D pose dataset HouseCat6D
featuring 1) Multi-modality of Polarimetric RGB and Depth (RGBD+P), 2) Highly
diverse 194 objects of 10 household object categories including 2
photometrically challenging categories, 3) High-quality pose annotation with an
error range of only 1.35 mm to 1.74 mm, 4) 41 large-scale scenes with extensive
viewpoint coverage and occlusions, 5) Checkerboard-free environment throughout
the entire scene, and 6) Additionally annotated dense 6D parallel-jaw grasps.
Furthermore, we also provide benchmark results of state-of-the-art
category-level pose estimation networks
- …