1,005 research outputs found
Surface Modeling and Analysis Using Range Images: Smoothing, Registration, Integration, and Segmentation
This dissertation presents a framework for 3D reconstruction and scene analysis, using a set of range images. The motivation for developing this framework came from the needs to reconstruct the surfaces of small mechanical parts in reverse engineering tasks, build a virtual environment of indoor and outdoor scenes, and understand 3D images.
The input of the framework is a set of range images of an object or a scene captured by range scanners. The output is a triangulated surface that can be segmented into meaningful parts. A textured surface can be reconstructed if color images are provided. The framework consists of surface smoothing, registration, integration, and segmentation.
Surface smoothing eliminates the noise present in raw measurements from range scanners. This research proposes area-decreasing flow that is theoretically identical to the mean curvature flow. Using area-decreasing flow, there is no need to estimate the curvature value and an optimal step size of the flow can be obtained. Crease edges and sharp corners are preserved by an adaptive scheme.
Surface registration aligns measurements from different viewpoints in a common coordinate system. This research proposes a new surface representation scheme named point fingerprint. Surfaces are registered by finding corresponding point pairs in an overlapping region based on fingerprint comparison.
Surface integration merges registered surface patches into a whole surface. This research employs an implicit surface-based integration technique. The proposed algorithm can generate watertight models by space carving or filling the holes based on volumetric interpolation. Textures from different views are integrated inside a volumetric grid. Surface segmentation is useful to decompose CAD models in reverse engineering tasks and help object recognition in a 3D scene. This research proposes a watershed-based surface mesh segmentation approach. The new algorithm accurately segments the plateaus by geodesic erosion using fast marching method.
The performance of the framework is presented using both synthetic and real world data from different range scanners. The dissertation concludes by summarizing the development of the framework and then suggests future research topics
Multimodal Three Dimensional Scene Reconstruction, The Gaussian Fields Framework
The focus of this research is on building 3D representations of real world scenes and objects using different imaging sensors. Primarily range acquisition devices (such as laser scanners and stereo systems) that allow the recovery of 3D geometry, and multi-spectral image sequences including visual and thermal IR images that provide additional scene characteristics. The crucial technical challenge that we addressed is the automatic point-sets registration task. In this context our main contribution is the development of an optimization-based method at the core of which lies a unified criterion that solves simultaneously for the dense point correspondence and transformation recovery problems. The new criterion has a straightforward expression in terms of the datasets and the alignment parameters and was used primarily for 3D rigid registration of point-sets. However it proved also useful for feature-based multimodal image alignment. We derived our method from simple Boolean matching principles by approximation and relaxation. One of the main advantages of the proposed approach, as compared to the widely used class of Iterative Closest Point (ICP) algorithms, is convexity in the neighborhood of the registration parameters and continuous differentiability, allowing for the use of standard gradient-based optimization techniques. Physically the criterion is interpreted in terms of a Gaussian Force Field exerted by one point-set on the other. Such formulation proved useful for controlling and increasing the region of convergence, and hence allowing for more autonomy in correspondence tasks. Furthermore, the criterion can be computed with linear complexity using recently developed Fast Gauss Transform numerical techniques. In addition, we also introduced a new local feature descriptor that was derived from visual saliency principles and which enhanced significantly the performance of the registration algorithm. The resulting technique was subjected to a thorough experimental analysis that highlighted its strength and showed its limitations. Our current applications are in the field of 3D modeling for inspection, surveillance, and biometrics. However, since this matching framework can be applied to any type of data, that can be represented as N-dimensional point-sets, the scope of the method is shown to reach many more pattern analysis applications
DeepRM: Deep Recurrent Matching for 6D Pose Refinement
Precise 6D pose estimation of rigid objects from RGB images is a critical but challenging task in robotics and augmented reality. To address this problem, we propose DeepRM, a novel recurrent network architecture for 6D pose refinement. DeepRM leverages initial coarse pose estimates to render synthetic images of target objects. The rendered images are then matched with the observed images to predict a rigid transform for updating the previous pose estimate. This process is repeated to incrementally refine the estimate at each iteration. LSTM units are used to propagate information through each refinement step, significantly improving overall performance. In contrast to many 2-stage Perspective-n-Point based solutions, DeepRM is trained end-to-end, and uses a scalable backbone that can be tuned via a single parameter for accuracy and efficiency. During training, a multi-scale optical flow head is added to predict the optical flow between the observed and synthetic images. Optical flow prediction stabilizes the training process, and enforces the learning of features that are relevant to the task of pose estimation. Our results demonstrate that DeepRM achieves state-of-the-art performance on two widely accepted challenging datasets
Lidar-based scene understanding for autonomous driving using deep learning
With over 1.35 million fatalities related to traffic accidents worldwide, autonomous driving was foreseen at the beginning of this century as a feasible solution to improve security in our roads. Nevertheless, it is meant to disrupt our transportation paradigm, allowing to reduce congestion, pollution, and costs, while increasing the accessibility, efficiency, and reliability of the transportation for both people and goods. Although some advances have gradually been transferred into commercial vehicles in the way of Advanced Driving Assistance Systems (ADAS) such as adaptive cruise control, blind spot detection or automatic parking, however, the technology is far from mature. A full understanding of the scene is actually needed so that allowing the vehicles to be aware of the surroundings, knowing the existing elements of the scene, as well as their motion, intentions and interactions.
In this PhD dissertation, we explore new approaches for understanding driving scenes from 3D LiDAR point clouds by using Deep Learning methods. To this end, in Part I we analyze the scene from a static perspective using independent frames to detect the neighboring vehicles. Next, in Part II we develop new ways for understanding the dynamics of the scene. Finally, in Part III we apply all the developed methods to accomplish higher level challenges such as segmenting moving obstacles while obtaining their rigid motion vector over the ground.
More specifically, in Chapter 2 we develop a 3D vehicle detection pipeline based on a multi-branch deep-learning architecture and propose a Front (FR-V) and a Bird’s Eye view (BE-V) as 2D representations of the 3D point cloud to serve as input for training our models. Later on, in Chapter 3 we apply and further test this method on two real uses-cases, for pre-filtering moving
obstacles while creating maps to better localize ourselves on subsequent days, as well as for vehicle tracking. From the dynamic perspective, in Chapter 4 we learn from the 3D point cloud a novel dynamic feature that resembles optical flow from RGB images. For that, we develop a new approach to leverage RGB optical flow as pseudo ground truth for training purposes but allowing the use of only 3D LiDAR data at inference time. Additionally, in Chapter 5 we explore the benefits of combining classification and regression learning problems to face the optical flow estimation task in a joint coarse-and-fine manner. Lastly, in Chapter 6 we gather the previous methods and demonstrate that with these independent tasks we can guide the learning of higher challenging problems such as segmentation and motion estimation of moving vehicles from our own moving perspective.Con más de 1,35 millones de muertes por accidentes de tráfico en el mundo, a principios de siglo se predijo que la conducción autónoma serÃa una solución viable para mejorar la seguridad en nuestras carreteras. Además la conducción autónoma está destinada a cambiar nuestros paradigmas de transporte, permitiendo reducir la congestión del tráfico, la contaminación y el coste, a la vez que aumentando la accesibilidad, la eficiencia y confiabilidad del transporte tanto de personas como de mercancÃas. Aunque algunos avances, como el control de crucero adaptativo, la detección de puntos ciegos o el estacionamiento automático, se han transferido gradualmente a vehÃculos comerciales en la forma de los Sistemas Avanzados de Asistencia a la Conducción (ADAS), la tecnologÃa aún no ha alcanzado el suficiente grado de madurez. Se necesita una comprensión completa de la escena para que los vehÃculos puedan entender el entorno, detectando los elementos presentes, asà como su movimiento, intenciones e interacciones. En la presente tesis doctoral, exploramos nuevos enfoques para comprender escenarios de conducción utilizando nubes de puntos en 3D capturadas con sensores LiDAR, para lo cual empleamos métodos de aprendizaje profundo. Con este fin, en la Parte I analizamos la escena desde una perspectiva estática para detectar vehÃculos. A continuación, en la Parte II, desarrollamos nuevas formas de entender las dinámicas del entorno. Finalmente, en la Parte III aplicamos los métodos previamente desarrollados para lograr desafÃos de nivel superior, como segmentar obstáculos dinámicos a la vez que estimamos su vector de movimiento sobre el suelo. EspecÃficamente, en el CapÃtulo 2 detectamos vehÃculos en 3D creando una arquitectura de aprendizaje profundo de dos ramas y proponemos una vista frontal (FR-V) y una vista de pájaro (BE-V) como representaciones 2D de la nube de puntos 3D que sirven como entrada para entrenar nuestros modelos. Más adelante, en el CapÃtulo 3 aplicamos y probamos aún más este método en dos casos de uso reales, tanto para filtrar obstáculos en movimiento previamente a la creación de mapas sobre los que poder localizarnos mejor en los dÃas posteriores, como para el seguimiento de vehÃculos. Desde la perspectiva dinámica, en el CapÃtulo 4 aprendemos de la nube de puntos en 3D una caracterÃstica dinámica novedosa que se asemeja al flujo óptico sobre imágenes RGB. Para ello, desarrollamos un nuevo enfoque que aprovecha el flujo óptico RGB como pseudo muestras reales para entrenamiento, usando solo information 3D durante la inferencia. Además, en el CapÃtulo 5 exploramos los beneficios de combinar los aprendizajes de problemas de clasificación y regresión para la tarea de estimación de flujo óptico de manera conjunta. Por último, en el CapÃtulo 6 reunimos los métodos anteriores y demostramos que con estas tareas independientes podemos guiar el aprendizaje de problemas de más alto nivel, como la segmentación y estimación del movimiento de vehÃculos desde nuestra propia perspectivaAmb més d’1,35 milions de morts per accidents de trà nsit al món, a principis de segle es va
predir que la conducció autònoma es convertiria en una solució viable per millorar la seguretat
a les nostres carreteres. D’altra banda, la conducció autònoma està destinada a canviar els
paradigmes del transport, fent possible aixà reduir la densitat del trà nsit, la contaminació i
el cost, alhora que augmentant l’accessibilitat, l’eficiència i la confiança del transport tant de
persones com de mercaderies. Encara que alguns avenços, com el control de creuer adaptatiu,
la detecció de punts cecs o l’estacionament automà tic, s’han transferit gradualment a vehicles
comercials en forma de Sistemes Avançats d’Assistència a la Conducció (ADAS), la tecnologia
encara no ha arribat a aconseguir el grau suficient de maduresa. És necessà ria, doncs, una
total comprensió de l’escena de manera que els vehicles puguin entendre l’entorn, detectant els
elements presents, aixà com el seu moviment, intencions i interaccions.
A la present tesi doctoral, explorem nous enfocaments per tal de comprendre les diferents
escenes de conducció utilitzant núvols de punts en 3D capturats amb sensors LiDAR, mitjançant
l’ús de mètodes d’aprenentatge profund. Amb aquest objectiu, a la Part I analitzem l’escena des
d’una perspectiva està tica per a detectar vehicles. A continuació, a la Part II, desenvolupem
noves formes d’entendre les dinà miques de l’entorn. Finalment, a la Part III apliquem els
mètodes prèviament desenvolupats per a aconseguir desafiaments d’un nivell superior, com, per
exemple, segmentar obstacles dinà mics al mateix temps que estimem el seu vector de moviment
respecte al terra.
Concretament, al CapÃtol 2 detectem vehicles en 3D creant una arquitectura d’aprenentatge
profund amb dues branques, i proposem una vista frontal (FR-V) i una vista d’ocell (BE-V)
com a representacions 2D del núvol de punts 3D que serveixen com a punt de partida per
entrenar els nostres models. Més endavant, al CapÃtol 3 apliquem i provem de nou aquest
mètode en dos casos d’ús reals, tant per filtrar obstacles en moviment prèviament a la creació
de mapes en els quals poder localitzar-nos millor en dies posteriors, com per dur a terme
el seguiment de vehicles. Des de la perspectiva dinà mica, al CapÃtol 4 aprenem una nova
caracterÃstica dinà mica del núvol de punts en 3D que s’assembla al flux òptic sobre imatges
RGB. Per a fer-ho, desenvolupem un nou enfocament que aprofita el flux òptic RGB com pseudo
mostres reals per a entrenament, utilitzant només informació 3D durant la inferència. Després,
al CapÃtol 5 explorem els beneficis que s’obtenen de combinar els aprenentatges de problemes
de classificació i regressió per la tasca d’estimació de flux òptic de manera conjunta. Finalment,
al CapÃtol 6 posem en comú els mètodes anteriors i demostrem que mitjançant aquests processos
independents podem abordar l’aprenentatge de problemes més complexos, com la segmentació
i estimació del moviment de vehicles des de la nostra pròpia perspectiva
Perception systems for robust autonomous navigation in natural environments
2022 Spring.Includes bibliographical references.As assistive robotics continues to develop thanks to the rapid advances of artificial intelligence, smart sensors, Internet of Things, and robotics, the industry began introducing robots to perform various functions that make humans' lives more comfortable and enjoyable. While the principal purpose of deploying robots has been productivity enhancement, their usability has widely expanded. Examples include assisting people with disabilities (e.g., Toyota's Human Support Robot), providing driver-less transportation (e.g., Waymo's driver-less cars), and helping with tedious house chores (e.g., iRobot). The challenge in these applications is that the robots have to function appropriately under continuously changing environments, harsh real-world conditions, deal with significant amounts of noise and uncertainty, and operate autonomously without the intervention or supervision of an expert. To meet these challenges, a robust perception system is vital. This dissertation casts light on the perception component of autonomous mobile robots and highlights their major capabilities, and analyzes the factors that affect their performance. In short, the developed approaches in this dissertation cover the following four topics: (1) learning the detection and identification of objects in the environment in which the robot is operating, (2) estimating the 6D pose of objects of interest to the robot, (3) studying the importance of the tracking information in the motion prediction module, and (4) analyzing the performance of three motion prediction methods, comparing their performances, and highlighting their strengths and weaknesses. All techniques developed in this dissertation have been implemented and evaluated on popular public benchmarks. Extensive experiments have been conducted to analyze and validate the properties of the developed methods and demonstrate this dissertation's conclusions on the robustness, performance, and utility of the proposed approaches for intelligent mobile robots
DeepSketch2Face: A Deep Learning Based Sketching System for 3D Face and Caricature Modeling
Face modeling has been paid much attention in the field of visual computing.
There exist many scenarios, including cartoon characters, avatars for social
media, 3D face caricatures as well as face-related art and design, where
low-cost interactive face modeling is a popular approach especially among
amateur users. In this paper, we propose a deep learning based sketching system
for 3D face and caricature modeling. This system has a labor-efficient
sketching interface, that allows the user to draw freehand imprecise yet
expressive 2D lines representing the contours of facial features. A novel CNN
based deep regression network is designed for inferring 3D face models from 2D
sketches. Our network fuses both CNN and shape based features of the input
sketch, and has two independent branches of fully connected layers generating
independent subsets of coefficients for a bilinear face representation. Our
system also supports gesture based interactions for users to further manipulate
initial face models. Both user studies and numerical results indicate that our
sketching system can help users create face models quickly and effectively. A
significantly expanded face database with diverse identities, expressions and
levels of exaggeration is constructed to promote further research and
evaluation of face modeling techniques.Comment: 12 pages, 16 figures, to appear in SIGGRAPH 201
MV-ROPE: Multi-view Constraints for Robust Category-level Object Pose and Size Estimation
We propose a novel framework for RGB-based category-level 6D object pose and
size estimation. Our approach relies on the prediction of normalized object
coordinate space (NOCS), which serves as an efficient and effective object
canonical representation that can be extracted from RGB images. Unlike previous
approaches that heavily relied on additional depth readings as input, our
novelty lies in leveraging multi-view information, which is commonly available
in practical scenarios where a moving camera continuously observes the
environment. By introducing multi-view constraints, we can obtain accurate
camera pose and depth estimation from a monocular dense SLAM framework.
Additionally, by incorporating constraints on the camera relative pose, we can
apply trimming strategies and robust pose averaging on the multi-view object
poses, resulting in more accurate and robust estimations of category-level
object poses even in the absence of direct depth readings. Furthermore, we
introduce a novel NOCS prediction network that significantly improves
performance. Our experimental results demonstrate the strong performance of our
proposed method, even comparable to state-of-the-art RGB-D methods across
public dataset sequences. Additionally, we showcase the generalization ability
of our method by evaluating it on self-collected datasets
- …