66 research outputs found

    Multi-Modal Dataset Acquisition for Photometrically Challenging Object

    Full text link
    This paper addresses the limitations of current datasets for 3D vision tasks in terms of accuracy, size, realism, and suitable imaging modalities for photometrically challenging objects. We propose a novel annotation and acquisition pipeline that enhances existing 3D perception and 6D object pose datasets. Our approach integrates robotic forward-kinematics, external infrared trackers, and improved calibration and annotation procedures. We present a multi-modal sensor rig, mounted on a robotic end-effector, and demonstrate how it is integrated into the creation of highly accurate datasets. Additionally, we introduce a freehand procedure for wider viewpoint coverage. Both approaches yield high-quality 3D data with accurate object and camera pose annotations. Our methods overcome the limitations of existing datasets and provide valuable resources for 3D vision research.Comment: Accepted at ICCV 2023 TRICKY Worksho

    Analytic Derivatives of Quartic-Scaling Doubly Hybrid XYGJ-OS Functional: Theory, Implementation, and Benchmark Comparison with M06-2X and MP2 Geometries for Nonbonded Complexes

    Get PDF
    Analytic first derivative expression of opposite-spin (OS) ansatz-adapted quartic scaling doubly hybrid XYGJ-OS functional is derived and implemented into Q-Chem. The resulting algorithm scales quartically with system size as in OS-MP2 gradient, by utilizing the combination of Laplace transformation and density fitting technique. The performance of XYGJ-OS geometry optimization is assessed by comparing the bond lengths and the intermolecular properties in reference coupled cluster methods. For the selected nonbonded complexes in the S22 and S66 data sets used in the present benchmark test, it is shown that XYGJOS geometries are more accurate than M06-2X and RI-MP2, the two quantum chemical methods widely used to obtain accurate geometries for practical systems, and comparable to CCSD(T) geometries

    Learning to Discriminate Information for Online Action Detection

    Full text link
    From a streaming video, online action detection aims to identify actions in the present. For this task, previous methods use recurrent networks to model the temporal sequence of current action frames. However, these methods overlook the fact that an input image sequence includes background and irrelevant actions as well as the action of interest. For online action detection, in this paper, we propose a novel recurrent unit to explicitly discriminate the information relevant to an ongoing action from others. Our unit, named Information Discrimination Unit (IDU), decides whether to accumulate input information based on its relevance to the current action. This enables our recurrent network with IDU to learn a more discriminative representation for identifying ongoing actions. In experiments on two benchmark datasets, TVSeries and THUMOS-14, the proposed method outperforms state-of-the-art methods by a significant margin. Moreover, we demonstrate the effectiveness of our recurrent unit by conducting comprehensive ablation studies.Comment: To appear in CVPR 202

    Polarimetric Information for Multi-Modal 6D Pose Estimation of Photometrically Challenging Objects with Limited Data

    Full text link
    6D pose estimation pipelines that rely on RGB-only or RGB-D data show limitations for photometrically challenging objects with e.g. textureless surfaces, reflections or transparency. A supervised learning-based method utilising complementary polarisation information as input modality is proposed to overcome such limitations. This supervised approach is then extended to a self-supervised paradigm by leveraging physical characteristics of polarised light, thus eliminating the need for annotated real data. The methods achieve significant advancements in pose estimation by leveraging geometric information from polarised light and incorporating shape priors and invertible physical constraints.Comment: Accepted at ICCV 2023 TRICKY Worksho

    Robust Monocular Depth Estimation under Challenging Conditions

    Full text link
    While state-of-the-art monocular depth estimation approaches achieve impressive results in ideal settings, they are highly unreliable under challenging illumination and weather conditions, such as at nighttime or in the presence of rain. In this paper, we uncover these safety-critical issues and tackle them with md4all: a simple and effective solution that works reliably under both adverse and ideal conditions, as well as for different types of learning supervision. We achieve this by exploiting the efficacy of existing methods under perfect settings. Therefore, we provide valid training signals independently of what is in the input. First, we generate a set of complex samples corresponding to the normal training ones. Then, we train the model by guiding its self- or full-supervision by feeding the generated samples and computing the standard losses on the corresponding original images. Doing so enables a single model to recover information across diverse conditions without modifications at inference time. Extensive experiments on two challenging public datasets, namely nuScenes and Oxford RobotCar, demonstrate the effectiveness of our techniques, outperforming prior works by a large margin in both standard and challenging conditions. Source code and data are available at: https://md4all.github.io.Comment: ICCV 2023. Source code and data: https://md4all.github.i

    Polarimetric Pose Prediction

    Full text link
    Light has many properties that vision sensors can passively measure. Colour-band separated wavelength and intensity are arguably the most commonly used for monocular 6D object pose estimation. This paper explores how complementary polarisation information, i.e. the orientation of light wave oscillations, influences the accuracy of pose predictions. A hybrid model that leverages physical priors jointly with a data-driven learning strategy is designed and carefully tested on objects with different levels of photometric complexity. Our design significantly improves the pose accuracy compared to state-of-the-art photometric approaches and enables object pose estimation for highly reflective and transparent objects. A new multi-modal instance-level 6D object pose dataset with highly accurate pose annotations for multiple objects with varying photometric complexity is introduced as a benchmark.Comment: Accepted at ECCV 2022; 25 pages (14 main paper + References + 7 Appendix

    Assessment of the modulation degrees of intensity-modulated radiation therapy plans

    Get PDF
    Background To evaluate the modulation indices (MIs) for predicting the plan delivery accuracies of intensity-modulated radiation therapy (IMRT) plans. Methods A total of 100 dynamic IMRT plans that used TrueBeam STx and 102 dynamic IMRT plans that used Trilogy were selected. For each plan, various MIs were calculated, which included the modulation complexity score (MCS), plan-averaged beam area (PA), plan-averaged beam irregularity (PI), plan-averaged beam modulation (PM), MI quantifying multi-leaf collimator (MLC) speeds (MIs), MI quantifying MLC acceleration (MIa), and MI quantifying MLC acceleration and segment aperture irregularity (MIc,IMRT). To determine plan delivery accuracy, global gamma passing rates, MLC errors of log files, and dose-volumetric parameter differences between original and log file-reconstructed IMRT plans were obtained. To assess the ability of each MI for predicting plan delivery accuracy, Spearmans rank correlation coefficients (rs) between MIs and plan delivery accuracy measures were calculated. Results PI showed moderately strong correlations with gamma passing rates in MapCHECK2 measurements of both TrueBeam STx and Trilogy (rs = − 0.591 with p <  0.001 and − 0.427 with p <  0.001 to with gamma criterion of 2%/2 mm, respectively). For ArcCHECK measurements, PI also showed moderately strong correlations with the gamma passing rates in the ArcCHECK measurements of TrueBeam STx and Trilogy (rs = − 0.545 with p <  0.001 and rs = − 0.581 with p <  0.001 with gamma criterion of 2%/2 mm, respectively). The PI showed the second strongest correlation with MLC errors in both TrueBeam STx and Trilogy (rs = 0.861 with p <  0.001 and rs = 0.767 with p <  0.001, respectively). In general, the PI showed moderately strong correlations with every plan delivery accuracy measure. Conclusions The PI showed moderately strong correlations with every plan delivery accuracy measure and therefore is a useful predictor of IMRT delivery accuracy.This work was supported by a National Research Foundation of Korea (NRF) grant from the Korea government (MSIP). (No.2017M2A2A7A02020639, No.2017M2A2A7A02020640, No.2017M2A2A7A02020641, No.2017M2A2A7A02020643)

    On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks

    Get PDF
    Learning-based methods to solve dense 3D vision problems typically train on 3D sensor data. The respectively used principle of measuring distances provides advantages and drawbacks. These are typically not compared nor discussed in the literature due to a lack of multi-modal datasets. Texture-less regions are problematic for structure from motion and stereo, reflective material poses issues for active sensing, and distances for translucent objects are intricate to measure with existing hardware. Training on inaccurate or corrupt data induces model bias and hampers generalisation capabilities. These effects remain unnoticed if the sensor measurement is considered as ground truth during the evaluation. This paper investigates the effect of sensor errors for the dense 3D vision tasks of depth estimation and reconstruction. We rigorously show the significant impact of sensor characteristics on the learned predictions and notice generalisation issues arising from various technologies in everyday household environments. For evaluation, we introduce a carefully designed dataset\footnote{dataset available at https://github.com/Junggy/HAMMER-dataset} comprising measurements from commodity sensors, namely D-ToF, I-ToF, passive/active stereo, and monocular RGB+P. Our study quantifies the considerable sensor noise impact and paves the way to improved dense vision estimates and targeted data fusion.Comment: Accepted at CVPR 2023, Main Paper + Supp. Mat. arXiv admin note: substantial text overlap with arXiv:2205.0456

    HouseCat6D -- A Large-Scale Multi-Modal Category Level 6D Object Pose Dataset with Household Objects in Realistic Scenarios

    Full text link
    Estimating the 6D pose of objects is a major 3D computer vision problem. Since the promising outcomes from instance-level approaches, research heads also move towards category-level pose estimation for more practical application scenarios. However, unlike well-established instance-level pose datasets, available category-level datasets lack annotation quality and provided pose quantity. We propose the new category-level 6D pose dataset HouseCat6D featuring 1) Multi-modality of Polarimetric RGB and Depth (RGBD+P), 2) Highly diverse 194 objects of 10 household object categories including 2 photometrically challenging categories, 3) High-quality pose annotation with an error range of only 1.35 mm to 1.74 mm, 4) 41 large-scale scenes with extensive viewpoint coverage and occlusions, 5) Checkerboard-free environment throughout the entire scene, and 6) Additionally annotated dense 6D parallel-jaw grasps. Furthermore, we also provide benchmark results of state-of-the-art category-level pose estimation networks
    corecore