32 research outputs found
Heightfields for Efficient Scene Reconstruction for AR
3D scene reconstruction from a sequence of posed RGB images is a cornerstone task for computer vision and augmented reality (AR). While depth-based fusion is the foundation of most real-time approaches for 3D reconstruction, recent learning based methods that operate directly on RGB images can achieve higher quality reconstructions, but at the cost of increased runtime and memory requirements, making them unsuitable for AR applications. We propose an efficient learning-based method that refines the 3D reconstruction obtained by a traditional fusion approach. By leveraging a top-down heightfield representation, our method remains real-time while approaching the quality of other learning-based methods. Despite being a simplification, our heightfield is perfectly appropriate for robotic path planning or augmented reality character placement. We outline several innovations that push the performance beyond existing top-down prediction baselines, and we present an evaluation framework on the challenging ScanNetV2 dataset, targeting AR tasks
Advances in Data-Driven Analysis and Synthesis of 3D Indoor Scenes
This report surveys advances in deep learning-based modeling techniques that
address four different 3D indoor scene analysis tasks, as well as synthesis of
3D indoor scenes. We describe different kinds of representations for indoor
scenes, various indoor scene datasets available for research in the
aforementioned areas, and discuss notable works employing machine learning
models for such scene modeling tasks based on these representations.
Specifically, we focus on the analysis and synthesis of 3D indoor scenes. With
respect to analysis, we focus on four basic scene understanding tasks -- 3D
object detection, 3D scene segmentation, 3D scene reconstruction and 3D scene
similarity. And for synthesis, we mainly discuss neural scene synthesis works,
though also highlighting model-driven methods that allow for human-centric,
progressive scene synthesis. We identify the challenges involved in modeling
scenes for these tasks and the kind of machinery that needs to be developed to
adapt to the data representation, and the task setting in general. For each of
these tasks, we provide a comprehensive summary of the state-of-the-art works
across different axes such as the choice of data representation, backbone,
evaluation metric, input, output, etc., providing an organized review of the
literature. Towards the end, we discuss some interesting research directions
that have the potential to make a direct impact on the way users interact and
engage with these virtual scene models, making them an integral part of the
metaverse.Comment: Published in Computer Graphics Forum, Aug 202
Exploiting Novel Deep Learning Architecture in Character Animation Pipelines
This doctoral dissertation aims to show a body of work proposed for improving different blocks in the character animation pipelines resulting in less manual work and more realistic character animation. To that purpose, we describe a variety of cutting-edge deep learning approaches that have been applied to the field of human motion modelling and character animation.
The recent advances in motion capture systems and processing hardware have shifted from physics-based approaches to data-driven approaches that are heavily used in the current game production frameworks. However, despite these
significant successes, there are still shortcomings to address. For example, the existing production pipelines contain processing steps such as marker
labelling in the motion capture pipeline or annotating motion primitives, which should be done manually. In addition, most of the current approaches for character animation used in game production are limited by the amount of stored animation data resulting in many duplicates and repeated patterns.
We present our work in four main chapters. We first present a large dataset of human motion called MoVi. Secondly, we show how machine learning approaches can be used to automate proprocessing data blocks of optical motion capture pipelines. Thirdly, we show how generative models can be used to generate batches of synthetic motion sequences given only weak control signals. Finally, we show how novel generative models can be applied to real-time character control in the game production
Exploiting Novel Deep Learning Architecture in Character Animation Pipelines
This doctoral dissertation aims to show a body of work proposed for improving different blocks in the character animation pipelines resulting in less manual work and more realistic character animation. To that purpose, we describe a variety of cutting-edge deep learning approaches that have been applied to the field of human motion modelling and character animation.
The recent advances in motion capture systems and processing hardware have shifted from physics-based approaches to data-driven approaches that are heavily used in the current game production frameworks. However, despite these
significant successes, there are still shortcomings to address. For example, the existing production pipelines contain processing steps such as marker
labelling in the motion capture pipeline or annotating motion primitives, which should be done manually. In addition, most of the current approaches for character animation used in game production are limited by the amount of stored animation data resulting in many duplicates and repeated patterns.
We present our work in four main chapters. We first present a large dataset of human motion called MoVi. Secondly, we show how machine learning approaches can be used to automate proprocessing data blocks of optical motion capture pipelines. Thirdly, we show how generative models can be used to generate batches of synthetic motion sequences given only weak control signals. Finally, we show how novel generative models can be applied to real-time character control in the game production
Learning-based depth and pose prediction for 3D scene reconstruction in endoscopy
Colorectal cancer is the third most common cancer worldwide. Early detection and treatment of pre-cancerous tissue during colonoscopy is critical to improving prognosis. However, navigating within the colon and inspecting the endoluminal tissue comprehensively are challenging, and success in both varies based on the endoscopist's skill and experience. Computer-assisted interventions in colonoscopy show much promise in improving navigation and inspection. For instance, 3D reconstruction of the colon during colonoscopy could promote more thorough examinations and increase adenoma detection rates which are associated with improved survival rates. Given the stakes, this thesis seeks to advance the state of research from feature-based traditional methods closer to a data-driven 3D reconstruction pipeline for colonoscopy.
More specifically, this thesis explores different methods that improve subtasks of learning-based 3D reconstruction. The main tasks are depth prediction and camera pose estimation. As training data is unavailable, the author, together with her co-authors, proposes and publishes several synthetic datasets and promotes domain adaptation models to improve applicability to real data. We show, through extensive experiments, that our depth prediction methods produce more robust results than previous work. Our pose estimation network trained on our new synthetic data outperforms self-supervised methods on real sequences. Our box embeddings allow us to interpret the geometric relationship and scale difference between two images of the same surface without the need for feature matches that are often unobtainable in surgical scenes. Together, the methods introduced in this thesis help work towards a complete, data-driven 3D reconstruction pipeline for endoscopy
Indoor Mapping and Reconstruction with Mobile Augmented Reality Sensor Systems
Augmented Reality (AR) ermöglicht es, virtuelle, dreidimensionale Inhalte direkt
innerhalb der realen Umgebung darzustellen. Anstatt jedoch beliebige virtuelle
Objekte an einem willkĂŒrlichen Ort anzuzeigen, kann AR Technologie auch genutzt
werden, um Geodaten in situ an jenem Ort darzustellen, auf den sich die Daten
beziehen. Damit eröffnet AR die Möglichkeit, die reale Welt durch virtuelle, ortbezogene
Informationen anzureichern. Im Rahmen der vorliegenen Arbeit wird diese
Spielart von AR als "Fused Reality" definiert und eingehend diskutiert.
Der praktische Mehrwert, den dieses Konzept der Fused Reality bietet, lÀsst sich
gut am Beispiel seiner Anwendung im Zusammenhang mit digitalen GebÀudemodellen
demonstrieren, wo sich gebÀudespezifische Informationen - beispielsweise der
Verlauf von Leitungen und Kabeln innerhalb der WĂ€nde - lagegerecht am realen
Objekt darstellen lassen. Um das skizzierte Konzept einer Indoor Fused Reality
Anwendung realisieren zu können, mĂŒssen einige grundlegende Bedingungen erfĂŒllt
sein. So kann ein bestimmtes GebÀude nur dann mit ortsbezogenen Informationen
augmentiert werden, wenn von diesem GebĂ€ude ein digitales Modell verfĂŒgbar ist.
Zwar werden gröĂere Bauprojekt heutzutage oft unter Zuhilfename von Building
Information Modelling (BIM) geplant und durchgefĂŒhrt, sodass ein digitales Modell
direkt zusammen mit dem realen GebÀude ensteht, jedoch sind im Falle Àlterer
BestandsgebĂ€ude digitale Modelle meist nicht verfĂŒgbar. Ein digitales Modell eines
bestehenden GebĂ€udes manuell zu erstellen, ist zwar möglich, jedoch mit groĂem
Aufwand verbunden. Ist ein passendes GebÀudemodell vorhanden, muss ein AR
GerĂ€t auĂerdem in der Lage sein, die eigene Position und Orientierung im GebĂ€ude
relativ zu diesem Modell bestimmen zu können, um Augmentierungen lagegerecht
anzeigen zu können.
Im Rahmen dieser Arbeit werden diverse Aspekte der angesprochenen Problematik
untersucht und diskutiert. Dabei werden zunÀchst verschiedene Möglichkeiten
diskutiert, Indoor-GebĂ€udegeometrie mittels Sensorsystemen zu erfassen. AnschlieĂend
wird eine Untersuchung prÀsentiert, inwiefern moderne AR GerÀte, die
in der Regel ebenfalls ĂŒber eine Vielzahl an Sensoren verfĂŒgen, ebenfalls geeignet
sind, als Indoor-Mapping-Systeme eingesetzt zu werden. Die resultierenden Indoor
Mapping DatensÀtze können daraufhin genutzt werden, um automatisiert
GebÀudemodelle zu rekonstruieren. Zu diesem Zweck wird ein automatisiertes,
voxel-basiertes Indoor-Rekonstruktionsverfahren vorgestellt. Dieses wird auĂerdem
auf der Grundlage vierer zu diesem Zweck erfasster DatensÀtze mit zugehörigen
Referenzdaten quantitativ evaluiert. Desweiteren werden verschiedene
Möglichkeiten diskutiert, mobile AR GerÀte innerhalb eines GebÀudes und des zugehörigen
GebĂ€udemodells zu lokalisieren. In diesem Kontext wird auĂerdem auch
die Evaluierung einer Marker-basierten Indoor-Lokalisierungsmethode prÀsentiert.
AbschlieĂend wird zudem ein neuer Ansatz, Indoor-Mapping DatensĂ€tze an den
Achsen des Koordinatensystems auszurichten, vorgestellt
The Impact of Digital Technologies on Public Health in Developed and Developing Countries
This open access book constitutes the refereed proceedings of the 18th International Conference on String Processing and Information Retrieval, ICOST 2020, held in Hammamet, Tunisia, in June 2020.* The 17 full papers and 23 short papers presented in this volume were carefully reviewed and selected from 49 submissions. They cover topics such as: IoT and AI solutions for e-health; biomedical and health informatics; behavior and activity monitoring; behavior and activity monitoring; and wellbeing technology. *This conference was held virtually due to the COVID-19 pandemic
Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs
Humans are able to form a complex mental model of the environment they move
in. This mental model captures geometric and semantic aspects of the scene,
describes the environment at multiple levels of abstractions (e.g., objects,
rooms, buildings), includes static and dynamic entities and their relations
(e.g., a person is in a room at a given time). In contrast, current robots'
internal representations still provide a partial and fragmented understanding
of the environment, either in the form of a sparse or dense set of geometric
primitives (e.g., points, lines, planes, voxels) or as a collection of objects.
This paper attempts to reduce the gap between robot and human perception by
introducing a novel representation, a 3D Dynamic Scene Graph(DSG), that
seamlessly captures metric and semantic aspects of a dynamic environment. A DSG
is a layered graph where nodes represent spatial concepts at different levels
of abstraction, and edges represent spatio-temporal relations among nodes. Our
second contribution is Kimera, the first fully automatic method to build a DSG
from visual-inertial data. Kimera includes state-of-the-art techniques for
visual-inertial SLAM, metric-semantic 3D reconstruction, object localization,
human pose and shape estimation, and scene parsing. Our third contribution is a
comprehensive evaluation of Kimera in real-life datasets and photo-realistic
simulations, including a newly released dataset, uHumans2, which simulates a
collection of crowded indoor and outdoor scenes. Our evaluation shows that
Kimera achieves state-of-the-art performance in visual-inertial SLAM, estimates
an accurate 3D metric-semantic mesh model in real-time, and builds a DSG of a
complex indoor environment with tens of objects and humans in minutes. Our
final contribution shows how to use a DSG for real-time hierarchical semantic
path-planning. The core modules in Kimera are open-source.Comment: 34 pages, 25 figures, 9 tables. arXiv admin note: text overlap with
arXiv:2002.0628