138 research outputs found
AirLine: Efficient Learnable Line Detection with Local Edge Voting
Line detection is widely used in many robotic tasks such as scene
recognition, 3D reconstruction, and simultaneous localization and mapping
(SLAM). Compared to points, lines can provide both low-level and high-level
geometrical information for downstream tasks. In this paper, we propose a novel
edge-based line detection algorithm, AirLine, which can be applied to various
tasks. In contrast to existing learnable endpoint-based methods which are
sensitive to the geometrical condition of environments, AirLine can extract
line segments directly from edges, resulting in a better generalization ability
for unseen environments. Also to balance efficiency and accuracy, we introduce
a region-grow algorithm and local edge voting scheme for line parameterization.
To the best of our knowledge, AirLine is one of the first learnable edge-based
line detection methods. Our extensive experiments show that it retains
state-of-the-art-level precision yet with a 3-80 times runtime acceleration
compared to other learning-based methods, which is critical for low-power
robots
Stereo Visual Odometry with Deep Learning-Based Point and Line Feature Matching using an Attention Graph Neural Network
Robust feature matching forms the backbone for most Visual Simultaneous
Localization and Mapping (vSLAM), visual odometry, 3D reconstruction, and
Structure from Motion (SfM) algorithms. However, recovering feature matches
from texture-poor scenes is a major challenge and still remains an open area of
research. In this paper, we present a Stereo Visual Odometry (StereoVO)
technique based on point and line features which uses a novel feature-matching
mechanism based on an Attention Graph Neural Network that is designed to
perform well even under adverse weather conditions such as fog, haze, rain, and
snow, and dynamic lighting conditions such as nighttime illumination and glare
scenarios. We perform experiments on multiple real and synthetic datasets to
validate the ability of our method to perform StereoVO under low visibility
weather and lighting conditions through robust point and line matches. The
results demonstrate that our method achieves more line feature matches than
state-of-the-art line matching algorithms, which when complemented with point
feature matches perform consistently well in adverse weather and dynamic
lighting conditions
EVOLIN Benchmark: Evaluation of Line Detection and Association
Lines are interesting geometrical features commonly seen in indoor and urban
environments. There is missing a complete benchmark where one can evaluate
lines from a sequential stream of images in all its stages: Line detection,
Line Association and Pose error. To do so, we present a complete and exhaustive
benchmark for visual lines in a SLAM front-end, both for RGB and RGBD, by
providing a plethora of complementary metrics. We have also labelled data from
well-known SLAM datasets in order to have all in one poses and accurately
annotated lines. In particular, we have evaluated 17 line detection algorithms,
5 line associations methods and the resultant pose error for aligning a pair of
frames with several combinations of detector-association. We have packaged all
methods and evaluations metrics and made them publicly available on web-page
https://prime-slam.github.io/evolin/
Visual Odometry Using Line Features and Machine Learning Enhanced Line Description
The research on 2D lines in images has increased strongly in the last decade; on the one hand, due to more computing power available, on the other hand, due to an increased interest in odometry methods and autonomous systems. Line features have some advantages over the more thoroughly researched point features. Lines are detected on gradients, they do not need texture to be found. Thus, as long as there are gradients between homogeneous regions, they can cope with difficult situations in which mostly homogeneous areas are present. By being detected on gradients, they are also well suited to represent structure. Furthermore, lines have a very high accuracy orthogonal to their direction, as they consist of numerous points which all lie on the gradient contributing to this locational accuracy. First, we introduce a visual odometry approach which achieves real-time performance and runs solely using lines features, it does not require point features. We developed a heuristic filter algorithm which takes neighbouring line features into account and thereby improves tracking of lines and matching of lines in images taken from arbitrary camera locations. This increases the number of tracked lines and is especially beneficial in difficult scenes where it is hard to match lines by tracking them. Additionally, we employed the Cayley representation for 3D lines to avoid overparameterization in the optimization. To show the advancement of the method, it is benchmarked on commonly used datasets and compared to other state of the art approaches. Second, we developed a machine learning based line feature descriptor for line matching. This descriptor can be used to match lines from arbitrary camera locations. The training data was created synthetically using the Unreal Engine 4. We trained a model based on the ResNet architecture using a triplet loss. We evaluated the descriptor on real world scenes and show its improvement over the famous Line Band Descriptor. Third, we build upon our previous descriptor to create an improved version. Therefor, we added an image pyramid, gabor wavelets and increased the descriptor size. The evaluation of the new descriptor additionally contains competing new approaches which are also machine learning based. It shows that our improved approach outperforms them. Finally, we provide an extended evaluation of our descriptor which shows the influences of different settings and processing steps. And we present an analysis of settings for practical usage scenarios. The influence of a maximum descriptor distance threshold, of a Left-Right consistency check and of a descriptor distance ratio threshold between the first and second best match were investigated. It turns out that, for the ratio of true to false matches, it is almost always better to use a descriptor distance ratio threshold than a maximum descriptor distance threshold
DeepLSD: Line Segment Detection and Refinement with Deep Image Gradients
Line segments are ubiquitous in our human-made world and are increasingly
used in vision tasks. They are complementary to feature points thanks to their
spatial extent and the structural information they provide. Traditional line
detectors based on the image gradient are extremely fast and accurate, but lack
robustness in noisy images and challenging conditions. Their learned
counterparts are more repeatable and can handle challenging images, but at the
cost of a lower accuracy and a bias towards wireframe lines. We propose to
combine traditional and learned approaches to get the best of both worlds: an
accurate and robust line detector that can be trained in the wild without
ground truth lines. Our new line segment detector, DeepLSD, processes images
with a deep network to generate a line attraction field, before converting it
to a surrogate image gradient magnitude and angle, which is then fed to any
existing handcrafted line detector. Additionally, we propose a new optimization
tool to refine line segments based on the attraction field and vanishing
points. This refinement improves the accuracy of current deep detectors by a
large margin. We demonstrate the performance of our method on low-level line
detection metrics, as well as on several downstream tasks using multiple
challenging datasets. The source code and models are available at
https://github.com/cvg/DeepLSD.Comment: Accepted at CVPR 202
Camera Re-Localization with Data Augmentation by Image Rendering and Image-to-Image Translation
Die Selbstlokalisierung von Automobilen, Robotern oder unbemannten Luftfahrzeugen sowie die Selbstlokalisierung von Fußgängern ist und wird für eine Vielzahl an Anwendungen von hohem Interesse sein.
Eine Hauptaufgabe ist die autonome Navigation von solchen Fahrzeugen, wobei die Lokalisierung in der umgebenden Szene eine SchlĂĽsselkomponente darstellt.
Da Kameras etablierte fest verbaute Sensoren in Automobilen, Robotern und unbemannten Luftfahrzeugen sind, ist der Mehraufwand diese auch fĂĽr Aufgaben der Lokalisierung zu verwenden gering bis gar nicht vorhanden.
Das gleiche gilt für die Selbstlokalisierung von Fußgängern, bei der Smartphones als mobile Plattformen für Kameras zum Einsatz kommen.
Kamera-Relokalisierung, bei der die Pose einer Kamera bezüglich einer festen Umgebung bestimmt wird, ist ein wertvoller Prozess um eine Lösung oder Unterstützung der Lokalisierung für Fahrzeuge oder Fußgänger darzustellen.
Kameras sind zudem kostengĂĽnstige Sensoren welche im Alltag von Menschen und Maschinen etabliert sind.
Die Unterstützung von Kamera-Relokalisierung ist nicht auf Anwendungen bezüglich der Navigation begrenzt, sondern kann allgemein zur Unterstützung von Bildanalyse oder Bildverarbeitung wie Szenenrekonstruktion, Detektion, Klassifizierung oder ähnlichen Anwendungen genutzt werden.
FĂĽr diese Zwecke, befasst sich diese Arbeit mit der Verbesserung des Prozesses der Kamera-Relokalisierung.
Da Convolutional Neural Networks (CNNs) und hybride Lösungen um die Posen von Kameras zu bestimmen in den letzten Jahren mit etablierten manuell entworfenen Methoden konkurrieren, ist der Fokus in dieser Thesis auf erstere Methoden gesetzt.
Die Hauptbeiträge dieser Arbeit beinhalten den Entwurf eines CNN zur Schätzung von Kameraposen, wobei der Schwerpunkt auf einer flachen Architektur liegt, die den Anforderungen an mobile Plattformen genügt.
Dieses Netzwerk erreicht Genauigkeiten in gleichem Grad wie tiefere CNNs mit umfangreicheren Modelgrößen.
Desweiteren ist die Performanz von CNNs stark von der Quantität und Qualität der zugrundeliegenden Trainingsdaten, die für die Optimierung genutzt werden, abhängig.
Daher, befassen sich die weiteren Beiträge dieser Thesis mit dem Rendern von Bildern und Bild-zu-Bild Umwandlungen zur Erweiterung solcher Trainingsdaten. Das generelle Erweitern solcher Trainingsdaten wird Data Augmentation (DA) genannt.
FĂĽr das Rendern von Bildern zur nĂĽtzlichen Erweiterung von Trainingsdaten werden 3D Modelle genutzt.
Generative Adversarial Networks (GANs) dienen zur Bild-zu-Bild Umwandlung. Während das Rendern von Bildern die Quantität in einem Bilddatensatz erhöht, verbessert die Bild-zu-Bild Umwandlung die Qualität dieser gerenderten Daten.
Experimente werden sowohl mit erweiterten Datensätzen aus gerenderten Bildern als auch mit umgewandelten Bildern durchgeführt.
Beide Ansätze der DA tragen zur Verbesserung der Genauigkeit der Lokalisierung bei.
Somit werden in dieser Arbeit Kamera-Relokalisierung mit modernsten Methoden durch DA verbessert
- …