60 research outputs found
CASAPose: Class-Adaptive and Semantic-Aware Multi-Object Pose Estimation
Applications in the field of augmented reality or robotics often require
joint localisation and 6D pose estimation of multiple objects. However, most
algorithms need one network per object class to be trained in order to provide
the best results. Analysing all visible objects demands multiple inferences,
which is memory and time-consuming. We present a new single-stage architecture
called CASAPose that determines 2D-3D correspondences for pose estimation of
multiple different objects in RGB images in one pass. It is fast and memory
efficient, and achieves high accuracy for multiple objects by exploiting the
output of a semantic segmentation decoder as control input to a keypoint
recognition decoder via local class-adaptive normalisation. Our new
differentiable regression of keypoint locations significantly contributes to a
faster closing of the domain gap between real test and synthetic training data.
We apply segmentation-aware convolutions and upsampling operations to increase
the focus inside the object mask and to reduce mutual interference of occluding
objects. For each inserted object, the network grows by only one output
segmentation map and a negligible number of parameters. We outperform
state-of-the-art approaches in challenging multi-object scenes with
inter-object occlusion and synthetic training.Comment: BMVC 2022, camera-ready version (this submission includes the paper
and supplementary material
BTSeg: Barlow Twins Regularization for Domain Adaptation in Semantic Segmentation
Semantic image segmentation is a critical component in many computer vision
systems, such as autonomous driving. In such applications, adverse conditions
(heavy rain, night time, snow, extreme lighting) on the one hand pose specific
challenges, yet are typically underrepresented in the available datasets.
Generating more training data is cumbersome and expensive, and the process
itself is error-prone due to the inherent aleatoric uncertainty. To address
this challenging problem, we propose BTSeg, which exploits image-level
correspondences as weak supervision signal to learn a segmentation model that
is agnostic to adverse conditions. To this end, our approach uses the Barlow
twins loss from the field of unsupervised learning and treats images taken at
the same location but under different adverse conditions as "augmentations" of
the same unknown underlying base image. This allows the training of a
segmentation model that is robust to appearance changes introduced by different
adverse conditions. We evaluate our approach on ACDC and the new challenging
ACG benchmark to demonstrate its robustness and generalization capabilities.
Our approach performs favorably when compared to the current state-of-the-art
methods, while also being simpler to implement and train. The code will be
released upon acceptance
Video-Driven Animation of Neural Head Avatars
We present a new approach for video-driven animation of high-quality neural
3D head models, addressing the challenge of person-independent animation from
video input. Typically, high-quality generative models are learned for specific
individuals from multi-view video footage, resulting in person-specific latent
representations that drive the generation process. In order to achieve
person-independent animation from video input, we introduce an LSTM-based
animation network capable of translating person-independent expression features
into personalized animation parameters of person-specific 3D head models. Our
approach combines the advantages of personalized head models (high quality and
realism) with the convenience of video-driven animation employing multi-person
facial performance capture. We demonstrate the effectiveness of our approach on
synthesized animations with high quality based on different source videos as
well as an ablation study
Automatic Reconstruction of Semantic 3D Models from 2D Floor Plans
Digitalization of existing buildings and the creation of 3D BIM models for
them has become crucial for many tasks. Of particular importance are floor
plans, which contain information about building layouts and are vital for
processes such as construction, maintenance or refurbishing. However, this data
is not always available in digital form, especially for older buildings
constructed before CAD tools were widely available, or lacks semantic
information. The digitalization of such information usually requires manual
work of an expert that must reconstruct the layouts by hand, which is a
cumbersome and error-prone process. In this paper, we present a pipeline for
reconstruction of vectorized 3D models from scanned 2D plans, aiming at
increasing the efficiency of this process. The method presented achieves
state-of-the-art results in the public dataset CubiCasa5k, and shows good
generalization to different types of plans. Our vectorization approach is
particularly effective, outperforming previous methods.Comment: 5 pages, 1 figur
Automated Damage Inspection of Power Transmission Towers from UAV Images
Infrastructure inspection is a very costly task, requiring technicians to
access remote or hard-to-reach places. This is the case for power transmission
towers, which are sparsely located and require trained workers to climb them to
search for damages. Recently, the use of drones or helicopters for remote
recording is increasing in the industry, sparing the technicians this perilous
task. This, however, leaves the problem of analyzing big amounts of images,
which has great potential for automation. This is a challenging task for
several reasons. First, the lack of freely available training data and the
difficulty to collect it complicate this problem. Additionally, the boundaries
of what constitutes a damage are fuzzy, introducing a degree of subjectivity in
the labelling of the data. The unbalanced class distribution in the images also
plays a role in increasing the difficulty of the task. This paper tackles the
problem of structural damage detection in transmission towers, addressing these
issues. Our main contributions are the development of a system for damage
detection on remotely acquired drone images, applying techniques to overcome
the issue of data scarcity and ambiguity, as well as the evaluation of the
viability of such an approach to solve this particular problem.Comment: 8 pages, 10 figures, accepted for VISAPP 202
Multispectral Stereo-Image Fusion for 3D Hyperspectral Scene Reconstruction
Spectral imaging enables the analysis of optical material properties that are
invisible to the human eye. Different spectral capturing setups, e.g., based on
filter-wheel, push-broom, line-scanning, or mosaic cameras, have been
introduced in the last years to support a wide range of applications in
agriculture, medicine, and industrial surveillance. However, these systems
often suffer from different disadvantages, such as lack of real-time
capability, limited spectral coverage or low spatial resolution. To address
these drawbacks, we present a novel approach combining two calibrated
multispectral real-time capable snapshot cameras, covering different spectral
ranges, into a stereo-system. Therefore, a hyperspectral data-cube can be
continuously captured. The combined use of different multispectral snapshot
cameras enables both 3D reconstruction and spectral analysis. Both captured
images are demosaicked avoiding spatial resolution loss. We fuse the spectral
data from one camera into the other to receive a spatially and spectrally high
resolution video stream. Experiments demonstrate the feasibility of this
approach and the system is investigated with regard to its applicability for
surgical assistance monitoring.Comment: VISAPP 2024 - 19th International Conference on Computer Vision Theory
and Application
- …