    This dissertation addresses the difficulties of semantic segmentation when dealing with an extensive collection of images and 3D point clouds. Due to the ubiquity of digital cameras that help capture the world around us, as well as the advanced scanning techniques that are able to record 3D replicas of real cities, the sheer amount of visual data available presents many opportunities for both academic research and industrial applications. But the mere quantity of data also poses a tremendous challenge. In particular, the problem of distilling useful information from such a large repository of visual data has attracted ongoing interests in the fields of computer vision and data mining. Structural Semantics are fundamental to understanding both natural and man-made objects. Buildings, for example, are like languages in that they are made up of repeated structures or patterns that can be captured in images. In order to find these recurring patterns in images, I present an unsupervised frequent visual pattern mining approach that goes beyond co-location to identify spatially coherent visual patterns, regardless of their shape, size, locations and orientation. First, my approach categorizes visual items from scale-invariant image primitives with similar appearance using a suite of polynomial-time algorithms that have been designed to identify consistent structural associations among visual items, representing frequent visual patterns. After detecting repetitive image patterns, I use unsupervised and automatic segmentation of the identified patterns to generate more semantically meaningful representations. The underlying assumption is that pixels capturing the same portion of image patterns are visually consistent, while pixels that come from different backdrops are usually inconsistent. I further extend this approach to perform automatic segmentation of foreground objects from an Internet photo collection of landmark locations. New scanning technologies have successfully advanced the digital acquisition of large-scale urban landscapes. In addressing semantic segmentation and reconstruction of this data using LiDAR point clouds and geo-registered images of large-scale residential areas, I develop a complete system that simultaneously uses classification and segmentation methods to first identify different object categories and then apply category-specific reconstruction techniques to create visually pleasing and complete scene models

    Ship Multimodel 3D Reconstruction and Corrosion Detection

    3D reconstruction has been an area of increased interest due to the current higher demand in applications, such as virtual realities, 3D mapping, medical imaging, and many others. Although, there are still many problems associated with reconstructing a real-life object, such as capturing occluded zones, noise, and processing time. Furthermore, as deep learning technologies advance, there has been a growing interest in using such methods to replace human-driven tasks, namely corrosion inspection, as it decreases the risk of injury of the inspector, it is more efficient due to less time taken, and is cost-saving. This dissertation proposes a method for reconstructing a 3D model of ships using aerial RGB images and terrestrial RGB-D images, along with a system capable of detecting the corroded parts of the ship and highlighting them in the model. Using two different sensors in two different ground planes mitigates some of the occlusion problems and increases the final model’s accuracy. The current dissertation also aims to pick the methods that have the best trade-off between accuracy and computational speed. The final model can be advantageous for corrosion inspectors, as they will have the model of the ship, as well as the corroded zones which, with that information, can choose the steps to take next without the need to manually inspect the ship or even be in the same site as the ship. The final model is a fusion of three different 3D models. The model obtained from RGB images exploits Structure from Motion algorithm which recovers the 3D aspect of the ship from 2D images. As for the remaining models, RGB-D images were used in conjunction with the Open3D library to create 3D structures from both sides of the ship. The corrosion classifier model was trained in Google Colab and achieved an accuracy of 97.44 % on the test dataset. The images used to create the SfM 3D model were each divided into a total of 40 regions and fed into the classifier to simulate a less concise image detection algorithm instead of an image classification algorithm. The results were encoded into the 3D model, highlighting the corroded zones.A reconstrução 3D tem sido uma área com crescente interesse devido à maior demanda em aplicações como realidade virtual, mapeamento 3D, imagens médicas e muitos outros. Embora, existem ainda muitos problemas associados à reconstrução 3D de um objeto real. Exemplos desses são a captura de zonas oclusas, o ruído e o tempo de processamento necessário para efetuar a reconstrução. Adicionalmente, com o avanço das tecnologias de deep learning, tem havido um acrescido interesse em usar ditos métodos para substituir tarefas realizadas por humanos como, por exemplo, a inspeção de corrosão, pois diminui o risco de lesões ao inspetor, tem maior eficiência devido a um menor tempo gasto, e economiza os custos. Esta dissertação propõe um método de reconstrução de um modelo 3D de navios, utilizando imagens RGB aéreas e imagens RGB-D terrestres, juntamente com um sistema capaz de detetar as zonas com corrosão no navio e destacá-las no modelo. O uso de dois sensores diferentes em dois meios diferentes atenuará alguns dos problemas de oclusão e aumentará a precisão do modelo final. A presente dissertação também visa escolher os métodos que apresentam o melhor compromisso entre precisão e velocidade de processamento. O modelo final poderá ser vantajoso para os inspetores de corrosão, pois terão o modelo do navio, bem como as zonas com corrosão que, com essa informação, poderão escolher quais os passos a seguir, sem a necessidade de inspecionar manualmente o navio ou mesmo deslocar-se para o local do navio. O modelo final é uma fusão de três modelos 3D diferentes. O modelo obtido a partir de imagens RGB tirou partido do algoritmo Structure from Motion, que recupera o aspeto 3D do navio a partir de imagens 2D. Quanto aos modelos restantes, as imagens RGB-D foram utilizadas em conjunto com a biblioteca Open3D para criar estruturas 3D de ambos os lados do navio. O modelo de classificação de corrosão foi treinado em ambiente Google Colab e alcançou uma exatidão de 97.44% no dataset de teste. As imagens usadas para criar o modelo SfM 3D foram, cada uma, fracionadas num total de 40 regiões e dadas ao modelo de classificação com o intuito de simularum modelo de deteção de imagem menos conciso em vez de um modelo de classificação de imagem. Os resultados foram codificados no modelo 3D, destacando as zonas com corrosão

    Multi-target tracking using appearance models for identity maintenance

    This thesis considers perception systems for urban environments. It focuses on the task of tracking dynamic objects and in particular on methods that can maintain the identities of targets through periods of ambiguity. Examples of such ambiguous situations occur when targets interact with each other, or when they are occluded by other objects or the environment. With the development of self driving cars, the push for autonomous delivery of packages, and an increasing use of technology for security, surveillance and public-safety applications, robust perception in crowded urban spaces is more important than ever before. A critical part of perception systems is the ability to understand the motion of objects in a scene. Tracking strategies that merge closely-spaced targets together into groups have been shown to offer improved robustness, but in doing so sacrifice the concept of target identity. Additionally, the primary sensor used for the tracking task may not provide the information required to reason about the identity of individual objects. There are three primary contributions in this work. The first is the development of 3D lidar tracking methods with improved ability to track closely-spaced targets and that can determine when target identities have become ambiguous. Secondly, this thesis defines appearance models suitable for the task of determining the identities of previously-observed targets, which may include the use of data from additional sensing modalities. The final contribution of this work is the combination of lidar tracking and appearance modelling, to enable the clarification of target identities in the presence of ambiguities caused by scene complexity. The algorithms presented in this work are validated on both carefully controlled and unconstrained datasets. The experiments show that in complex dynamic scenes with interacting targets, the proposed methods achieve significant improvements in tracking performance

    Rich probabilistic models for semantic labeling

    Das Ziel dieser Monographie ist es die Methoden und Anwendungen des semantischen Labelings zu erforschen. Unsere Beiträge zu diesem sich rasch entwickelten Thema sind bestimmte Aspekte der Modellierung und der Inferenz in probabilistischen Modellen und ihre Anwendungen in den interdisziplinären Bereichen der Computer Vision sowie medizinischer Bildverarbeitung und Fernerkundung

    Sensor Fusion for Object Detection and Tracking in Autonomous Vehicles

    Autonomous driving vehicles depend on their perception system to understand the environment and identify all static and dynamic obstacles surrounding the vehicle. The perception system in an autonomous vehicle uses the sensory data obtained from different sensor modalities to understand the environment and perform a variety of tasks such as object detection and object tracking. Combining the outputs of different sensors to obtain a more reliable and robust outcome is called sensor fusion. This dissertation studies the problem of sensor fusion for object detection and object tracking in autonomous driving vehicles and explores different approaches for utilizing deep neural networks to accurately and efficiently fuse sensory data from different sensing modalities. In particular, this dissertation focuses on fusing radar and camera data for 2D and 3D object detection and object tracking tasks. First, the effectiveness of radar and camera fusion for 2D object detection is investigated by introducing a radar region proposal algorithm for generating object proposals in a two-stage object detection network. The evaluation results show significant improvement in speed and accuracy compared to a vision-based proposal generation method. Next, radar and camera fusion is used for the task of joint object detection and depth estimation where the radar data is used in conjunction with image features to generate object proposals, but also provides accurate depth estimation for the detected objects in the scene. A fusion algorithm is also proposed for 3D object detection where where the depth and velocity data obtained from the radar is fused with the camera images to detect objects in 3D and also accurately estimate their velocities without requiring any temporal information. Finally, radar and camera sensor fusion is used for 3D multi-object tracking by introducing an end-to-end trainable and online network capable of tracking objects in real-time

    Computer Vision Problems in 3D Plant Phenotyping

    In recent years, there has been significant progress in Computer Vision based plant phenotyping (quantitative analysis of biological properties of plants) technologies. Traditional methods of plant phenotyping are destructive, manual and error prone. Due to non-invasiveness and non-contact properties as well as increased accuracy, imaging techniques are becoming state-of-the-art in plant phenotyping. Among several parameters of plant phenotyping, growth analysis is very important for biological inference. Automating the growth analysis can result in accelerating the throughput in crop production. This thesis contributes to the automation of plant growth analysis. First, we present a novel system for automated and non-invasive/non-contact plant growth measurement. We exploit the recent advancements of sophisticated robotic technologies and near infrared laser scanners to build a 3D imaging system and use state-of-the-art Computer Vision algorithms to fully automate growth measurement. We have set up a gantry robot system having 7 degrees of freedom hanging from the roof of a growth chamber. The payload is a range scanner, which can measure dense depth maps (raw 3D coordinate points in mm) on the surface of an object (the plant). The scanner can be moved around the plant to scan from different viewpoints by programming the robot with a specific trajectory. The sequence of overlapping images can be aligned to obtain a full 3D structure of the plant in raw point cloud format, which can be triangulated to obtain a smooth surface (triangular mesh), enclosing the original plant. We show the capability of the system to capture the well known diurnal pattern of plant growth computed from the surface area and volume of the plant meshes for a number of plant species. Second, we propose a technique to detect branch junctions in plant point cloud data. We demonstrate that using these junctions as feature points, the correspondence estimation can be formulated as a subgraph matching problem, and better matching results than state-of-the-art can be achieved. Also, this idea removes the requirement of a priori knowledge about rotational angles between adjacent scanning viewpoints imposed by the original registration algorithm for complex plant data. Before, this angle information had to be approximately known. Third, we present an algorithm to classify partially occluded leaves by their contours. In general, partial contour matching is a NP-hard problem. We propose a suboptimal matching solution and show that our method outperforms state-of-the-art on 3 public leaf datasets. We anticipate using this algorithm to track growing segmented leaves in our plant range data, even when a leaf becomes partially occluded by other plant matter over time. Finally, we perform some experiments to demonstrate the capability and limitations of the system and highlight the future research directions for Computer Vision based plant phenotyping

    Computer Vision Problems in 3D Plant Phenotyping

    Multi Sensor Multi Object Tracking in Autonomous Vehicles

    Indiana University-Purdue University Indianapolis (IUPUI)Self driving cars becoming more popular nowadays, which transport with it's own intelligence and take appropriate actions at adequate time. Safety is the key factor in driving environment. A simple fail of action can cause many fatalities. Computer Vision has major part in achieving this, it help the autonomous vehicle to perceive the surroundings. Detection is a very popular technique in helping to capture the surrounding for an autonomous car. At the same time tracking also has important role in this by providing dynamic of detected objects. Autonomous cars combine a variety of sensors such as RADAR, LiDAR, sonar, GPS, odometry and inertial measurement units to perceive their surroundings. Driver-assistive technologies like Adaptive Cruise Control, Forward Collision Warning system (FCW) and Collision Mitigation by Breaking (CMbB) ensure safety while driving. Perceiving the information from environment include setting up sensors on the car. These sensors will collect the data it sees and this will be further processed for taking actions. The sensor system can be a single sensor or multiple sensor. Different sensors have different strengths and weaknesses which makes the combination of them important for technologies like Autonomous Driving. Each sensor will have a limit of accuracy on it's readings, so multi sensor system can help to overcome this defects. This thesis is an attempt to develop a multi sensor multi object tracking method to perceive the surrounding of the ego vehicle. When the Object detection gives information about the presence of objects in a frame, Object Tracking goes beyond simple observation to more useful action of monitoring objects. The experimental results conducted on KITTI dataset indicate that our proposed state estimation system for Multi Object Tracking works well in various challenging environments

    Point Cloud Processing for Environmental Analysis in Autonomous Driving using Deep Learning

    Autonomous self-driving cars need a very precise perception system of their environment, working for every conceivable scenario. Therefore, different kinds of sensor types, such as lidar scanners, are in use. This thesis contributes highly efficient algorithms for 3D object recognition to the scientific community. It provides a Deep Neural Network with specific layers and a novel loss to safely localize and estimate the orientation of objects from point clouds originating from lidar sensors. First, a single-shot 3D object detector is developed that outputs dense predictions in only one forward pass. Next, this detector is refined by fusing complementary semantic features from cameras and joint probabilistic tracking to stabilize predictions and filter outliers. The last part presents an evaluation of data from automotive-grade lidar scanners. A Generative Adversarial Network is also being developed as an alternative for target-specific artificial data generation.One of the main objectives of leading automotive companies is autonomous self-driving cars. They need a very precise perception system of their environment, working for every conceivable scenario. Therefore, different kinds of sensor types are in use. Besides cameras, lidar scanners became very important. The development in that field is significant for future applications and system integration because lidar offers a more accurate depth representation, independent from environmental illumination. Especially algorithms and machine learning approaches, including Deep Learning and Artificial Intelligence based on raw laser scanner data, are very important due to the long range and three-dimensional resolution of the measured point clouds. Consequently, a broad field of research with many challenges and unsolved tasks has been established. This thesis aims to address this deficit and contribute highly efficient algorithms for 3D object recognition to the scientific community. It provides a Deep Neural Network with specific layers and a novel loss to safely localize and estimate the orientation of objects from point clouds. First, a single shot 3D object detector is developed that outputs dense predictions in only one forward pass. Next, this detector is refined by fusing complementary semantic features from cameras and a joint probabilistic tracking to stabilize predictions and filter outliers. In the last part, a concept for deployment into an existing test vehicle focuses on the semi-automated generation of a suitable dataset. Subsequently, an evaluation of data from automotive-grade lidar scanners is presented. A Generative Adversarial Network is also being developed as an alternative for target-specific artificial data generation. Experiments on the acquired application-specific and benchmark datasets show that the presented methods compete with a variety of state-of-the-art algorithms while being trimmed down to efficiency for use in self-driving cars. Furthermore, they include an extensive set of standard evaluation metrics and results to form a solid baseline for future research.Eines der Hauptziele führender Automobilhersteller sind autonome Fahrzeuge. Sie benötigen ein sehr präzises System für die Wahrnehmung der Umgebung, dass für jedes denkbare Szenario überall auf der Welt funktioniert. Daher sind verschiedene Arten von Sensoren im Einsatz, sodass neben Kameras u. a. auch Lidar Sensoren ein wichtiger Bestandteil sind. Die Entwicklung auf diesem Gebiet ist für künftige Anwendungen von höchster Bedeutung, da Lidare eine genauere, von der Umgebungsbeleuchtung unabhängige, Tiefendarstellung bieten. Insbesondere Algorithmen und maschinelle Lernansätze wie Deep Learning, die Rohdaten über Lernzprozesse direkt verarbeiten können, sind aufgrund der großen Reichweite und der dreidimensionalen Auflösung der gemessenen Punktwolken sehr wichtig. Somit hat sich ein weites Forschungsfeld mit vielen Herausforderungen und ungelösten Problemen etabliert. Diese Arbeit zielt darauf ab, dieses Defizit zu verringern und effiziente Algorithmen zur 3D-Objekterkennung zu entwickeln. Sie stellt ein tiefes Neuronales Netzwerk mit spezifischen Schichten und einer neuartigen Fehlerfunktion zur sicheren Lokalisierung und Schätzung der Orientierung von Objekten aus Punktwolken bereit. Zunächst wird ein 3D-Detektor entwickelt, der in nur einem Vorwärtsdurchlauf aus einer Punktwolke alle Objekte detektiert. Anschließend wird dieser Detektor durch die Fusion von komplementären semantischen Merkmalen aus Kamerabildern und einem gemeinsamen probabilistischen Tracking verfeinert, um die Detektionen zu stabilisieren und Ausreißer zu filtern. Im letzten Teil wird ein Konzept für den Einsatz in einem bestehenden Testfahrzeug vorgestellt, das sich auf die halbautomatische Generierung eines geeigneten Datensatzes konzentriert. Hierbei wird eine Auswertung auf Daten von Automotive-Lidaren vorgestellt. Als Alternative zur zielgerichteten künstlichen Datengenerierung wird ein weiteres generatives Neuronales Netzwerk untersucht. Experimente mit den erzeugten anwendungsspezifischen- und Benchmark-Datensätzen zeigen, dass sich die vorgestellten Methoden mit dem Stand der Technik messen können und gleichzeitig auf Effizienz für den Einsatz in selbstfahrenden Autos optimiert sind. Darüber hinaus enthalten sie einen umfangreichen Satz an Evaluierungsmetriken und -ergebnissen, die eine solide Grundlage für die zukünftige Forschung bilden