46 research outputs found
A Minimalist Approach to Type-Agnostic Detection of Quadrics in Point Clouds
This paper proposes a segmentation-free, automatic and efficient procedure to
detect general geometric quadric forms in point clouds, where clutter and
occlusions are inevitable. Our everyday world is dominated by man-made objects
which are designed using 3D primitives (such as planes, cones, spheres,
cylinders, etc.). These objects are also omnipresent in industrial
environments. This gives rise to the possibility of abstracting 3D scenes
through primitives, thereby positions these geometric forms as an integral part
of perception and high level 3D scene understanding.
As opposed to state-of-the-art, where a tailored algorithm treats each
primitive type separately, we propose to encapsulate all types in a single
robust detection procedure. At the center of our approach lies a closed form 3D
quadric fit, operating in both primal & dual spaces and requiring as low as 4
oriented-points. Around this fit, we design a novel, local null-space voting
strategy to reduce the 4-point case to 3. Voting is coupled with the famous
RANSAC and makes our algorithm orders of magnitude faster than its conventional
counterparts. This is the first method capable of performing a generic
cross-type multi-object primitive detection in difficult scenes. Results on
synthetic and real datasets support the validity of our method.Comment: Accepted for publication at CVPR 201
Multi-Modal Dataset Acquisition for Photometrically Challenging Object
This paper addresses the limitations of current datasets for 3D vision tasks
in terms of accuracy, size, realism, and suitable imaging modalities for
photometrically challenging objects. We propose a novel annotation and
acquisition pipeline that enhances existing 3D perception and 6D object pose
datasets. Our approach integrates robotic forward-kinematics, external infrared
trackers, and improved calibration and annotation procedures. We present a
multi-modal sensor rig, mounted on a robotic end-effector, and demonstrate how
it is integrated into the creation of highly accurate datasets. Additionally,
we introduce a freehand procedure for wider viewpoint coverage. Both approaches
yield high-quality 3D data with accurate object and camera pose annotations.
Our methods overcome the limitations of existing datasets and provide valuable
resources for 3D vision research.Comment: Accepted at ICCV 2023 TRICKY Worksho
TexPose: Neural Texture Learning for Self-Supervised 6D Object Pose Estimation
In this paper, we introduce neural texture learning for 6D object pose
estimation from synthetic data and a few unlabelled real images. Our major
contribution is a novel learning scheme which removes the drawbacks of previous
works, namely the strong dependency on co-modalities or additional refinement.
These have been previously necessary to provide training signals for
convergence. We formulate such a scheme as two sub-optimisation problems on
texture learning and pose learning. We separately learn to predict realistic
texture of objects from real image collections and learn pose estimation from
pixel-perfect synthetic data. Combining these two capabilities allows then to
synthesise photorealistic novel views to supervise the pose estimator with
accurate geometry. To alleviate pose noise and segmentation imperfection
present during the texture learning phase, we propose a surfel-based
adversarial training loss together with texture regularisation from synthetic
data. We demonstrate that the proposed approach significantly outperforms the
recent state-of-the-art methods without ground-truth pose annotations and
demonstrates substantial generalisation improvements towards unseen scenes.
Remarkably, our scheme improves the adopted pose estimators substantially even
when initialised with much inferior performance
RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance
Conversational AI tools that can generate and discuss clinically correct
radiology reports for a given medical image have the potential to transform
radiology. Such a human-in-the-loop radiology assistant could facilitate a
collaborative diagnostic process, thus saving time and improving the quality of
reports. Towards this goal, we introduce RaDialog, the first thoroughly
evaluated and publicly available large vision-language model for radiology
report generation and interactive dialog. RaDialog effectively integrates
visual image features and structured pathology findings with a large language
model (LLM) while simultaneously adapting it to a specialized domain using
parameter-efficient fine-tuning. To keep the conversational abilities of the
underlying LLM, we propose a comprehensive, semi-automatically labeled,
image-grounded instruct dataset for chest X-ray radiology tasks. By training
with this dataset, our method achieves state-of-the-art clinical correctness in
report generation and shows impressive abilities in interactive tasks such as
correcting reports and answering questions, serving as a foundational step
toward clinical dialog systems. Our code is available on github:
https://github.com/ChantalMP/RaDialog.Comment: 12 pages, 7 figure