25 research outputs found
Pose Normalization of Indoor Mapping Datasets Partially Compliant with the Manhattan World Assumption
In this paper, we present a novel pose normalization method for indoor
mapping point clouds and triangle meshes that is robust against large fractions
of the indoor mapping geometries deviating from an ideal Manhattan World
structure. In the case of building structures that contain multiple Manhattan
World systems, the dominant Manhattan World structure supported by the largest
fraction of geometries is determined and used for alignment. In a first step, a
vertical alignment orienting a chosen axis to be orthogonal to horizontal floor
and ceiling surfaces is conducted. Subsequently, a rotation around the
resulting vertical axis is determined that aligns the dataset horizontally with
the coordinate axes. The proposed method is evaluated quantitatively against
several publicly available indoor mapping datasets. Our implementation of the
proposed procedure along with code for reproducing the evaluation will be made
available to the public upon acceptance for publication
Indoor Mapping and Reconstruction with Mobile Augmented Reality Sensor Systems
Augmented Reality (AR) ermöglicht es, virtuelle, dreidimensionale Inhalte direkt
innerhalb der realen Umgebung darzustellen. Anstatt jedoch beliebige virtuelle
Objekte an einem willkĂŒrlichen Ort anzuzeigen, kann AR Technologie auch genutzt
werden, um Geodaten in situ an jenem Ort darzustellen, auf den sich die Daten
beziehen. Damit eröffnet AR die Möglichkeit, die reale Welt durch virtuelle, ortbezogene
Informationen anzureichern. Im Rahmen der vorliegenen Arbeit wird diese
Spielart von AR als "Fused Reality" definiert und eingehend diskutiert.
Der praktische Mehrwert, den dieses Konzept der Fused Reality bietet, lÀsst sich
gut am Beispiel seiner Anwendung im Zusammenhang mit digitalen GebÀudemodellen
demonstrieren, wo sich gebÀudespezifische Informationen - beispielsweise der
Verlauf von Leitungen und Kabeln innerhalb der WĂ€nde - lagegerecht am realen
Objekt darstellen lassen. Um das skizzierte Konzept einer Indoor Fused Reality
Anwendung realisieren zu können, mĂŒssen einige grundlegende Bedingungen erfĂŒllt
sein. So kann ein bestimmtes GebÀude nur dann mit ortsbezogenen Informationen
augmentiert werden, wenn von diesem GebĂ€ude ein digitales Modell verfĂŒgbar ist.
Zwar werden gröĂere Bauprojekt heutzutage oft unter Zuhilfename von Building
Information Modelling (BIM) geplant und durchgefĂŒhrt, sodass ein digitales Modell
direkt zusammen mit dem realen GebÀude ensteht, jedoch sind im Falle Àlterer
BestandsgebĂ€ude digitale Modelle meist nicht verfĂŒgbar. Ein digitales Modell eines
bestehenden GebĂ€udes manuell zu erstellen, ist zwar möglich, jedoch mit groĂem
Aufwand verbunden. Ist ein passendes GebÀudemodell vorhanden, muss ein AR
GerĂ€t auĂerdem in der Lage sein, die eigene Position und Orientierung im GebĂ€ude
relativ zu diesem Modell bestimmen zu können, um Augmentierungen lagegerecht
anzeigen zu können.
Im Rahmen dieser Arbeit werden diverse Aspekte der angesprochenen Problematik
untersucht und diskutiert. Dabei werden zunÀchst verschiedene Möglichkeiten
diskutiert, Indoor-GebĂ€udegeometrie mittels Sensorsystemen zu erfassen. AnschlieĂend
wird eine Untersuchung prÀsentiert, inwiefern moderne AR GerÀte, die
in der Regel ebenfalls ĂŒber eine Vielzahl an Sensoren verfĂŒgen, ebenfalls geeignet
sind, als Indoor-Mapping-Systeme eingesetzt zu werden. Die resultierenden Indoor
Mapping DatensÀtze können daraufhin genutzt werden, um automatisiert
GebÀudemodelle zu rekonstruieren. Zu diesem Zweck wird ein automatisiertes,
voxel-basiertes Indoor-Rekonstruktionsverfahren vorgestellt. Dieses wird auĂerdem
auf der Grundlage vierer zu diesem Zweck erfasster DatensÀtze mit zugehörigen
Referenzdaten quantitativ evaluiert. Desweiteren werden verschiedene
Möglichkeiten diskutiert, mobile AR GerÀte innerhalb eines GebÀudes und des zugehörigen
GebĂ€udemodells zu lokalisieren. In diesem Kontext wird auĂerdem auch
die Evaluierung einer Marker-basierten Indoor-Lokalisierungsmethode prÀsentiert.
AbschlieĂend wird zudem ein neuer Ansatz, Indoor-Mapping DatensĂ€tze an den
Achsen des Koordinatensystems auszurichten, vorgestellt
Convex Decomposition of Indoor Scenes
We describe a method to parse a complex, cluttered indoor scene into
primitives which offer a parsimonious abstraction of scene structure. Our
primitives are simple convexes. Our method uses a learned regression procedure
to parse a scene into a fixed number of convexes from RGBD input, and can
optionally accept segmentations to improve the decomposition. The result is
then polished with a descent method which adjusts the convexes to produce a
very good fit, and greedily removes superfluous primitives. Because the entire
scene is parsed, we can evaluate using traditional depth, normal, and
segmentation error metrics. Our evaluation procedure demonstrates that the
error from our primitive representation is comparable to that of predicting
depth from a single image.Comment: 18 pages, 12 figure
Finding Good Configurations of Planar Primitives in Unorganized Point Clouds
International audienceWe present an algorithm for detecting planar primitives from unorganized 3D point clouds. Departing from an initial configuration, the algorithm refines both the continuous plane parameters and the discrete assignment of input points to them by seeking high fidelity, high simplicity and high completeness. Our key contribution relies upon the design of an exploration mechanism guided by a multiobjective energy function. The transitions within the large solution space are handled by five geometric operators that create, remove and modify primitives. We demonstrate the potential of our method on a variety of scenes, from organic shapes to man-made objects, and sensors, from multiview stereo to laser. We show its efficacy with respect to existing primitive fitting approaches and illustrate its applicative interest in compact mesh reconstruction, when combined with a plane assembly method
iBARLE: imBalance-Aware Room Layout Estimation
Room layout estimation predicts layouts from a single panorama. It requires
datasets with large-scale and diverse room shapes to train the models. However,
there are significant imbalances in real-world datasets including the
dimensions of layout complexity, camera locations, and variation in scene
appearance. These issues considerably influence the model training performance.
In this work, we propose the imBalance-Aware Room Layout Estimation (iBARLE)
framework to address these issues. iBARLE consists of (1) Appearance Variation
Generation (AVG) module, which promotes visual appearance domain
generalization, (2) Complex Structure Mix-up (CSMix) module, which enhances
generalizability w.r.t. room structure, and (3) a gradient-based layout
objective function, which allows more effective accounting for occlusions in
complex layouts. All modules are jointly trained and help each other to achieve
the best performance. Experiments and ablation studies based on
ZInD~\cite{cruz2021zillow} dataset illustrate that iBARLE has state-of-the-art
performance compared with other layout estimation baselines
Line Primitives and Their Applications in Geometric Computer Vision
Line primitives are widely found in structured scenes which provide a higher level of structure information about the scenes than point primitives. Furthermore, line primitives in space are closely related to Euclidean transformations, because the dual vector (also known as Pluecker coordinates) representation of 3D lines is the counterpart of the dual quaternion which depicts an Euclidean transformation. These geometric properties of line primitives motivate the work in this thesis with the following contributions: Firstly, by combining local appearances of lines and geometric constraints between line pairs in images, a line segment matching algorithm is developed which constructs a novel line band descriptor to depict the local appearance of a line and builds a relational graph to measure the pair-wise consistency between line correspondences. Experiments show that the matching algorithm is robust to various image transformations and more efficient than conventional graph based line matching algorithms. Secondly, by investigating the symmetric property of line directions in space, this thesis presents a complete analysis about the solutions of the Perspective-3-Line (P3L) problem which estimates the camera pose from three reference lines in space and their 2D projections. For three spatial lines in general configurations, a P3L polynomial is derived which is employed to develop a solution of the Perspective-n-Line problem. The proposed robust PnL algorithm can efficiently and accurately estimate the camera pose for both small numbers and large numbers of line correspondences. For three spatial lines in special configurations (e.g., in a Manhattan world which consists of three mutually orthogonal dominant directions), the solution of the P3L problem is employed to solve the vanishing point estimation and line classification problem. The proposed vanishing point estimation algorithm achieves high accuracy and efficiency by thoroughly utilizing the Manhattan world characteristic. Another advantage of the proposed framework is that it can be easily generalized to images taken by central catadioptric cameras or uncalibrated cameras. The third major contribution of this thesis is about structure-from-motion using line primitives. To circumvent the Pluecker constraints on the Pluecker coordinates of lines, the Cayley representation of lines is developed which is inspired by the geometric property of the Pluecker coordinates of lines. To build the line observation model, two derivations of line projection functions are presented: one is based on the dual relationship between points and lines; and the other is based on the relationship between Pluecker coordinates and the Pluecker matrix. Then the motion and structure parameters are initialized by an incremental approach and optimized by sparse bundle adjustment. Quantitative validations show the increase in performance when compared to conventional line reconstruction algorithms
PARSAC: Accelerating Robust Multi-Model Fitting with Parallel Sample Consensus
We present a real-time method for robust estimation of multiple instances of
geometric models from noisy data. Geometric models such as vanishing points,
planar homographies or fundamental matrices are essential for 3D scene
analysis. Previous approaches discover distinct model instances in an iterative
manner, thus limiting their potential for speedup via parallel computation. In
contrast, our method detects all model instances independently and in parallel.
A neural network segments the input data into clusters representing potential
model instances by predicting multiple sets of sample and inlier weights. Using
the predicted weights, we determine the model parameters for each potential
instance separately in a RANSAC-like fashion. We train the neural network via
task-specific loss functions, i.e. we do not require a ground-truth
segmentation of the input data. As suitable training data for homography and
fundamental matrix fitting is scarce, we additionally present two new synthetic
datasets. We demonstrate state-of-the-art performance on these as well as
multiple established datasets, with inference times as small as five
milliseconds per image.Comment: AAAI 202
3D Scene Geometry Estimation from 360 Imagery: A Survey
This paper provides a comprehensive survey on pioneer and state-of-the-art 3D
scene geometry estimation methodologies based on single, two, or multiple
images captured under the omnidirectional optics. We first revisit the basic
concepts of the spherical camera model, and review the most common acquisition
technologies and representation formats suitable for omnidirectional (also
called 360, spherical or panoramic) images and videos. We then survey
monocular layout and depth inference approaches, highlighting the recent
advances in learning-based solutions suited for spherical data. The classical
stereo matching is then revised on the spherical domain, where methodologies
for detecting and describing sparse and dense features become crucial. The
stereo matching concepts are then extrapolated for multiple view camera setups,
categorizing them among light fields, multi-view stereo, and structure from
motion (or visual simultaneous localization and mapping). We also compile and
discuss commonly adopted datasets and figures of merit indicated for each
purpose and list recent results for completeness. We conclude this paper by
pointing out current and future trends.Comment: Published in ACM Computing Survey