30 research outputs found

    Depth- and Semantics-aware Multi-modal Domain Translation: Generating 3D Panoramic Color Images from LiDAR Point Clouds

    Full text link
    This work presents a new depth- and semantics-aware conditional generative model, named TITAN-Next, for cross-domain image-to-image translation in a multi-modal setup between LiDAR and camera sensors. The proposed model leverages scene semantics as a mid-level representation and is able to translate raw LiDAR point clouds to RGB-D camera images by solely relying on semantic scene segments. We claim that this is the first framework of its kind and it has practical applications in autonomous vehicles such as providing a fail-safe mechanism and augmenting available data in the target image domain. The proposed model is evaluated on the large-scale and challenging Semantic-KITTI dataset, and experimental findings show that it considerably outperforms the original TITAN-Net and other strong baselines by 23.7%\% margin in terms of IoU

    Deep learning methods applied to digital elevation models: state of the art

    Get PDF
    Deep Learning (DL) has a wide variety of applications in various thematic domains, including spatial information. Although with limitations, it is also starting to be considered in operations related to Digital Elevation Models (DEMs). This study aims to review the methods of DL applied in the field of altimetric spatial information in general, and DEMs in particular. Void Filling (VF), Super-Resolution (SR), landform classification and hydrography extraction are just some of the operations where traditional methods are being replaced by DL methods. Our review concludes that although these methods have great potential, there are aspects that need to be improved. More appropriate terrain information or algorithm parameterisation are some of the challenges that this methodology still needs to face.Functional Quality of Digital Elevation Models in Engineering’ of the State Agency Research of SpainPID2019-106195RB- I00/AEI/10.13039/50110001103

    Survey on Controlable Image Synthesis with Deep Learning

    Full text link
    Image synthesis has attracted emerging research interests in academic and industry communities. Deep learning technologies especially the generative models greatly inspired controllable image synthesis approaches and applications, which aim to generate particular visual contents with latent prompts. In order to further investigate low-level controllable image synthesis problem which is crucial for fine image rendering and editing tasks, we present a survey of some recent works on 3D controllable image synthesis using deep learning. We first introduce the datasets and evaluation indicators for 3D controllable image synthesis. Then, we review the state-of-the-art research for geometrically controllable image synthesis in two aspects: 1) Viewpoint/pose-controllable image synthesis; 2) Structure/shape-controllable image synthesis. Furthermore, the photometrically controllable image synthesis approaches are also reviewed for 3D re-lighting researches. While the emphasis is on 3D controllable image synthesis algorithms, the related applications, products and resources are also briefly summarized for practitioners.Comment: 19 pages, 17 figure

    A Generalized Multi-Task Learning Approach to Stereo DSM Filtering in Urban Areas

    Get PDF
    City models and height maps of urban areas serve as a valuable data source for numerous applications, such as disaster management or city planning. While this information is not globally available, it can be substituted by digital surface models (DSMs), automatically produced from inexpensive satellite imagery. However, stereo DSMs often suffer from noise and blur. Furthermore, they are heavily distorted by vegetation, which is of lesser relevance for most applications. Such basic models can be filtered by convolutional neural networks (CNNs), trained on labels derived from digital elevation models (DEMs) and 3D city models, in order to obtain a refined DSM. We propose a modular multi-task learning concept that consolidates existing approaches into a generalized framework. Our encoder-decoder models with shared encoders and multiple task-specific decoders leverage roof type classification as a secondary task and multiple objectives including a conditional adversarial term. The contributing single-objective losses are automatically weighted in the final multi-task loss function based on learned uncertainty estimates. We evaluated the performance of specific instances of this family of network architectures. Our method consistently outperforms the state of the art on common data, both quantitatively and qualitatively, and generalizes well to a new dataset of an independent study area.Comment: This paper was accepted for publication in the ISPRS Journal of Photogrammetry and Remote Sensin

    AI-generated Content for Various Data Modalities: A Survey

    Full text link
    AI-generated content (AIGC) methods aim to produce text, images, videos, 3D assets, and other media using AI algorithms. Due to its wide range of applications and the demonstrated potential of recent works, AIGC developments have been attracting lots of attention recently, and AIGC methods have been developed for various data modalities, such as image, video, text, 3D shape (as voxels, point clouds, meshes, and neural implicit fields), 3D scene, 3D human avatar (body and head), 3D motion, and audio -- each presenting different characteristics and challenges. Furthermore, there have also been many significant developments in cross-modality AIGC methods, where generative methods can receive conditioning input in one modality and produce outputs in another. Examples include going from various modalities to image, video, 3D shape, 3D scene, 3D avatar (body and head), 3D motion (skeleton and avatar), and audio modalities. In this paper, we provide a comprehensive review of AIGC methods across different data modalities, including both single-modality and cross-modality methods, highlighting the various challenges, representative works, and recent technical directions in each setting. We also survey the representative datasets throughout the modalities, and present comparative results for various modalities. Moreover, we also discuss the challenges and potential future research directions

    Scalability of Learning Tasks on 3D CAE Models Using Point Cloud Autoencoders

    Get PDF
    Geometric Deep Learning (GDL) methods have recently gained interest as powerful, high-dimensional models for approaching various geometry processing tasks. However, training deep neural network models on geometric input requires considerable computational effort. Even more so, if one considers typical problem sizes found in application domains such as engineering tasks, where geometric data are often orders of magnitude larger than the inputs currently considered in GDL literature. Hence, an assessment of the scalability of the training task is necessary, where model and data set parameters can be mapped to the computational demand during training. The present paper therefore studies the effects of data set size and the number of free model parameters on the computational effort of training a Point Cloud Autoencoder (PC-AE). We further review pre-processing techniques to obtain efficient representations of high-dimensional inputs to the PC-AE and investigate the effects of these techniques on the information abstracted by the trained model. We perform these experiments on synthetic geometric data inspired by engineering applications using computing hardware with particularly recent graphics processing units (GPUs) with high memory specifications. The present study thus provides a comprehensive evaluation of how to scale geometric deep learning architectures to high-dimensional inputs to allow for an application of state-of-the-art deep learning methods in real-world tasks.Algorithms and the Foundations of Software technolog

    Intelligent Sensing and Learning for Advanced MIMO Communication Systems

    Get PDF

    LiDAR Domain Adaptation - Automotive 3D Scene Understanding

    Get PDF
    Umgebungswahrnehmung und Szeneverständnis spielen bei autonomen Fahrzeugen eine wesentliche Rolle. Ein Fahrzeug muss sich der Geometrie und Semantik seiner Umgebung bewusst sein, um das Verhalten anderer Verkehrsteilnehmer:innen vorherzusagen und sich selbst im fahrbaren Raum zu lokalisieren, um somit richtig zu navigieren. Heutzutage verwenden praktisch alle modernen Wahrnehmungssysteme für das automatisierte Fahren tiefe neuronale Netze. Um diese zu trainieren, werden enorme Datenmengen mit passenden Annotationen benötigt. Die Beschaffung der Daten ist relativ unaufwendig, da nur ein mit den richtigen Sensoren ausgestattetes Fahrzeug herumfahren muss. Die Erstellung von Annotationen ist jedoch ein sehr zeitaufwändiger und teurer Prozess. Erschwerend kommt hinzu, dass autonome Fahrzeuge praktisch überall (z.B. Europa und Asien, auf dem Land und in der Stadt) und zu jeder Zeit (z.B. Tag und Nacht, Sommer und Winter, Regen und Nebel) eingesetzt werden müssen. Dies erfordert, dass die Daten eine noch größere Anzahl unterschiedlicher Szenarien und Domänen abdecken. Es ist nicht praktikabel, Daten für eine solche Vielzahl von Domänen zu sammeln und zu annotieren. Wenn jedoch nur mit Daten aus einer Domäne trainiert wird, führt dies aufgrund von Unterschieden in den Daten zu einer schlechten Leistung in einer anderen Zieldomäne. Für eine sicherheitskritische Anwendung ist dies nicht akzeptabel. Das Gebiet der sogenannten Domänenanpassung führt Methoden ein, die helfen, diese Domänenlücken ohne die Verwendung von Annotationen aus der Zieldomäne zu schließen und somit auf die Entwicklung skalierbarer Wahrnehmungssysteme hinzuarbeiten. Die Mehrzahl der Arbeiten zur Domänenanpassung konzentriert sich auf die zweidimensionale Kamerawahrnehmung. In autonomen Fahrzeugen ist jedoch das dreidimensionale Verständnis der Szene essentiell, wofür heutzutage häufig LiDAR-Sensoren verwendet werden. Diese Dissertation befasst sich mit der Domänenanpassung für LiDAR-Wahrnehmung unter mehreren Aspekten. Zunächst wird eine Reihe von Techniken vorgestellt, die die Leistung und die Laufzeit von semantischen Segmentierungssystemen verbessern. Die gewonnenen Erkenntnisse werden in das Wahrnehmungsmodell integriert, das in dieser Dissertation verwendet wird, um die Wirksamkeit der vorgeschlagenen Domänenanpassungsansätze zu bewerten. Zweitens werden bestehende Ansätze diskutiert und Forschungslücken durch die Formulierung von offenen Forschungsfragen aufgezeigt. Um einige dieser Fragen zu beantworten, wird in dieser Dissertation eine neuartige quantitative Metrik vorgestellt. Diese Metrik erlaubt es, den Realismus von LiDAR-Daten abzuschätzen, der für die Leistung eines Wahrnehmungssystems entscheidend ist. So wird die Metrik zur Bewertung der Qualität von LiDAR-Punktwolken verwendet, die zum Zweck des Domänenmappings erzeugt werden, bei dem Daten von einer Domäne in eine anderen übertragen werden. Dies ermöglicht die Wiederverwendung von Annotationen aus einer Quelldomäne in der Zieldomäne. In einem weiteren Feld der Domänenanpassung wird in dieser Dissertation eine neuartige Methode vorgeschlagen, die die Geometrie der Szene nutzt, um domäneninvariante Merkmale zu lernen. Die geometrischen Informationen helfen dabei, die Domänenanpassungsfähigkeiten des Segmentierungsmodells zu verbessern und ohne zusätzlichen Mehraufwand bei der Inferenz die beste Leistung zu erzielen. Schließlich wird eine neuartige Methode zur Erzeugung semantisch sinnvoller Objektformen aus kontinuierlichen Beschreibungen vorgeschlagen, die – mit zusätzlicher Arbeit – zur Erweiterung von Szenen verwendet werden kann, um die Erkennungsfähigkeiten der Modelle zu verbessern. Zusammenfassend stellt diese Dissertation ein umfassendes System für die Domänenanpassung und semantische Segmentierung von LiDAR-Punktwolken im Kontext des autonomen Fahrens vor
    corecore