11,165 research outputs found
Real-Time Seamless Single Shot 6D Object Pose Prediction
We propose a single-shot approach for simultaneously detecting an object in
an RGB image and predicting its 6D pose without requiring multiple stages or
having to examine multiple hypotheses. Unlike a recently proposed single-shot
technique for this task (Kehl et al., ICCV'17) that only predicts an
approximate 6D pose that must then be refined, ours is accurate enough not to
require additional post-processing. As a result, it is much faster - 50 fps on
a Titan X (Pascal) GPU - and more suitable for real-time processing. The key
component of our method is a new CNN architecture inspired by the YOLO network
design that directly predicts the 2D image locations of the projected vertices
of the object's 3D bounding box. The object's 6D pose is then estimated using a
PnP algorithm.
For single object and multiple object pose estimation on the LINEMOD and
OCCLUSION datasets, our approach substantially outperforms other recent
CNN-based approaches when they are all used without post-processing. During
post-processing, a pose refinement step can be used to boost the accuracy of
the existing methods, but at 10 fps or less, they are much slower than our
method.Comment: CVPR 201
A Review and Characterization of Progressive Visual Analytics
Progressive Visual Analytics (PVA) has gained increasing attention over the past years.
It brings the user into the loop during otherwise long-running and non-transparent computations
by producing intermediate partial results. These partial results can be shown to the user
for early and continuous interaction with the emerging end result even while it is still being
computed. Yet as clear-cut as this fundamental idea seems, the existing body of literature puts forth
various interpretations and instantiations that have created a research domain of competing terms,
various definitions, as well as long lists of practical requirements and design guidelines spread across
different scientific communities. This makes it more and more difficult to get a succinct understanding
of PVA’s principal concepts, let alone an overview of this increasingly diverging field. The review and
discussion of PVA presented in this paper address these issues and provide (1) a literature collection
on this topic, (2) a conceptual characterization of PVA, as well as (3) a consolidated set of practical
recommendations for implementing and using PVA-based visual analytics solutions
Volumetric real-time particle-based representation of large unstructured tetrahedral polygon meshes
In this paper we propose a particle-based volume rendering approach for unstructured, three-dimensional, tetrahedral polygon meshes. We stochastically generate millions of particles per second and project them on the screen in real-time. In contrast to previous rendering techniques of tetrahedral volume meshes, our method does not need a prior depth sorting of geometry. Instead, the rendered image is generated by choosing particles closest to the camera. Furthermore, we use spatial superimposing. Each pixel is constructed from multiple subpixels. This approach not only increases projection accuracy, but allows also a combination of subpixels into one superpixel that creates the well-known translucency effect of volume rendering. We show that our method is fast enough for the visualization of unstructured three-dimensional grids with hard real-time constraints and that it scales well for a high number of particles
Parallel Mesh Processing
Die aktuelle Forschung im Bereich der Computergrafik versucht den zunehmenden Ansprüchen der Anwender gerecht zu werden und erzeugt immer realistischer wirkende Bilder. Dementsprechend werden die Szenen und Verfahren, die zur Darstellung der Bilder genutzt werden, immer komplexer. So eine Entwicklung ist unweigerlich mit der Steigerung der erforderlichen Rechenleistung verbunden, da die Modelle, aus denen eine Szene besteht, aus Milliarden von Polygonen bestehen können und in Echtzeit dargestellt werden müssen.
Die realistische Bilddarstellung ruht auf drei Säulen: Modelle, Materialien und Beleuchtung. Heutzutage gibt es einige Verfahren für effiziente und realistische Approximation der globalen Beleuchtung. Genauso existieren Algorithmen zur Erstellung von realistischen Materialien. Es gibt zwar auch Verfahren für das Rendering von Modellen in Echtzeit, diese funktionieren aber meist nur für Szenen mittlerer Komplexität und scheitern bei sehr komplexen Szenen.
Die Modelle bilden die Grundlage einer Szene; deren Optimierung hat unmittelbare
Auswirkungen auf die Effizienz der Verfahren zur Materialdarstellung und Beleuchtung, so dass erst eine optimierte Modellrepräsentation eine Echtzeitdarstellung ermöglicht. Viele der in der Computergrafik verwendeten Modelle werden mit Hilfe der Dreiecksnetze repräsentiert. Das darin enthaltende Datenvolumen ist enorm, um letztlich den Detailreichtum der jeweiligen Objekte darstellen bzw. den wachsenden Realitätsanspruch bewältigen zu können. Das Rendern von komplexen, aus Millionen von Dreiecken bestehenden
Modellen stellt selbst fĂĽr moderne Grafikkarten eine groĂźe Herausforderung dar.
Daher ist es insbesondere für die Echtzeitsimulationen notwendig, effiziente Algorithmen zu entwickeln. Solche Algorithmen sollten einerseits Visibility Culling1, Level-of-Detail, (LOD), Out-of-Core Speicherverwaltung und Kompression unterstützen. Anderseits sollte diese Optimierung sehr effizient arbeiten, um das Rendering nicht noch zusätzlich zu behindern. Dies erfordert die Entwicklung paralleler Verfahren, die in der Lage sind, die enorme Datenflut effizient zu verarbeiten.
Der Kernbeitrag dieser Arbeit sind neuartige Algorithmen und Datenstrukturen, die speziell für eine effiziente parallele Datenverarbeitung entwickelt wurden und in der Lage sind sehr komplexe Modelle und Szenen in Echtzeit darzustellen, sowie zu modellieren. Diese Algorithmen arbeiten in zwei Phasen: Zunächst wird in einer Offline-Phase die Datenstruktur erzeugt und für parallele Verarbeitung optimiert. Die optimierte Datenstruktur wird dann in der zweiten Phase für das Echtzeitrendering verwendet.
Ein weiterer Beitrag dieser Arbeit ist ein Algorithmus, welcher in der Lage ist, einen sehr realistisch wirkenden Planeten prozedural zu generieren und in Echtzeit zu rendern
CheckerPose: Progressive Dense Keypoint Localization for Object Pose Estimation with Graph Neural Network
Estimating the 6-DoF pose of a rigid object from a single RGB image is a
crucial yet challenging task. Recent studies have shown the great potential of
dense correspondence-based solutions, yet improvements are still needed to
reach practical deployment. In this paper, we propose a novel pose estimation
algorithm named CheckerPose, which improves on three main aspects. Firstly,
CheckerPose densely samples 3D keypoints from the surface of the 3D object and
finds their 2D correspondences progressively in the 2D image. Compared to
previous solutions that conduct dense sampling in the image space, our strategy
enables the correspondence searching in a 2D grid (i.e., pixel coordinate).
Secondly, for our 3D-to-2D correspondence, we design a compact binary code
representation for 2D image locations. This representation not only allows for
progressive correspondence refinement but also converts the correspondence
regression to a more efficient classification problem. Thirdly, we adopt a
graph neural network to explicitly model the interactions among the sampled 3D
keypoints, further boosting the reliability and accuracy of the
correspondences. Together, these novel components make our CheckerPose a strong
pose estimation algorithm. When evaluated on the popular Linemod, Linemod-O,
and YCB-V object pose estimation benchmarks, CheckerPose clearly boosts the
accuracy of correspondence-based methods and achieves state-of-the-art
performances
DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting
In this paper, we propose a new operator, called 3D DeFormable Attention
(DFA3D), for 2D-to-3D feature lifting, which transforms multi-view 2D image
features into a unified 3D space for 3D object detection. Existing feature
lifting approaches, such as Lift-Splat-based and 2D attention-based, either use
estimated depth to get pseudo LiDAR features and then splat them to a 3D space,
which is a one-pass operation without feature refinement, or ignore depth and
lift features by 2D attention mechanisms, which achieve finer semantics while
suffering from a depth ambiguity problem. In contrast, our DFA3D-based method
first leverages the estimated depth to expand each view's 2D feature map to 3D
and then utilizes DFA3D to aggregate features from the expanded 3D feature
maps. With the help of DFA3D, the depth ambiguity problem can be effectively
alleviated from the root, and the lifted features can be progressively refined
layer by layer, thanks to the Transformer-like architecture. In addition, we
propose a mathematically equivalent implementation of DFA3D which can
significantly improve its memory efficiency and computational speed. We
integrate DFA3D into several methods that use 2D attention-based feature
lifting with only a few modifications in code and evaluate on the nuScenes
dataset. The experiment results show a consistent improvement of +1.41\% mAP on
average, and up to +15.1\% mAP improvement when high-quality depth information
is available, demonstrating the superiority, applicability, and huge potential
of DFA3D. The code is available at
https://github.com/IDEA-Research/3D-deformable-attention.git
Planet-Sized Batched Dynamic Adaptive Meshes (P-BDAM)
This paper describes an efficient technique for out-of-core management and interactive rendering of planet sized textured terrain surfaces. The technique, called planet-sized batched dynamic adaptive meshes (P-BDAM), extends the BDAM approach by using as basic primitive a general triangulation of points on a displaced triangle. The proposed framework introduces several advances with respect to the state of the art: thanks to a batched host-to-graphics communication model, we outperform current adaptive tessellation solutions in terms of rendering speed; we guarantee overall geometric continuity, exploiting programmable graphics hardware to cope with the accuracy issues introduced by single precision floating points; we exploit a compressed out of core representation and speculative prefetching for hiding disk latency during rendering of out-of-core data; we efficiently construct high quality simplified representations with a novel distributed out of core simplification algorithm working on a standard PC network.147-15
- …