151 research outputs found

    Minimizing Supervision for Vision-Based Perception and Control in Autonomous Driving

    Get PDF
    The research presented in this dissertation focuses on reducing the need for supervision in two tasks related to autonomous driving: end-to-end steering and free space segmentation. For end-to-end steering, we devise a new regularization technique which relies on pixel-relevance heatmaps to force the steering model to focus on lane markings. This improves performance across a variety of offline metrics. In relation to this work, we publicly release the RoboBus dataset, which consists of extensive driving data recorded using a commercial bus on a cross-border public transport route on the Luxembourgish-French border. We also tackle pseudo-supervised free space segmentation from three different angles: (1) we propose a Stochastic Co-Teaching training scheme that explicitly attempts to filter out the noise in pseudo-labels, (2) we study the impact of self-training and of different data augmentation techniques, (3) we devise a novel pseudo-label generation method based on road plane distance estimation from approximate depth maps. Finally, we investigate semi-supervised free space estimation and find that combining our techniques with a restricted subset of labeled samples results in substantial improvements in IoU, Precision and Recall

    Computer Vision-based Monitoring of Harvest Quality

    Get PDF

    Exploring the Potential of Convolutional Neural Networks in Healthcare Engineering for Skin Disease Identification

    Get PDF
    Skin disorders affect millions of individuals worldwide, underscoring the urgency of swift and accurate detection for optimal treatment outcomes. Convolutional Neural Networks (CNNs) have emerged as valuable assets for automating the identification of skin ailments. This paper conducts an exhaustive examination of the latest advancements in CNN-driven skin condition detection. Within dermatological applications, CNNs proficiently analyze intricate visual motifs and extricate distinctive features from skin imaging datasets. By undergoing training on extensive data repositories, CNNs proficiently classify an array of skin maladies such as melanoma, psoriasis, eczema, and acne. The paper spotlights pivotal progressions in CNN-centered skin ailment diagnosis, encompassing diverse CNN architectures, refinement methodologies, and data augmentation tactics. Moreover, the integration of transfer learning and ensemble approaches has further amplified the efficacy of CNN models. Despite their substantial potential, there exist pertinent challenges. The comprehensive portrayal of skin afflictions and the mitigation of biases mandate access to extensive and varied data pools. The quest for comprehending the decision-making processes propelling CNN models remains an ongoing endeavor. Ethical quandaries like algorithmic predisposition and data privacy also warrant significant consideration. By meticulously scrutinizing the evolutions, obstacles, and potential of CNN-oriented skin disorder diagnosis, this critique provides invaluable insights to researchers and medical professionals. It underscores the importance of precise and efficacious diagnostic instruments in ameliorating patient outcomes and curbing healthcare expenditures

    Deep Learning for 3D Information Extraction from Indoor and Outdoor Point Clouds

    Get PDF
    This thesis focuses on the challenges and opportunities that come with deep learning in the extraction of 3D information from point clouds. To achieve this, 3D information such as point-based or object-based attributes needs to be extracted from highly-accurate and information-rich 3D data, which are commonly collected by LiDAR or RGB-D cameras from real-world environments. Driven by the breakthroughs brought by deep learning techniques and the accessibility of reliable 3D datasets, 3D deep learning frameworks have been investigated with a string of empirical successes. However, two main challenges lead to the complexity of deep learning based per-point labeling and object detection in real scenes. First, the variation of sensing conditions and unconstrained environments result in unevenly distributed point clouds with various geometric patterns and incomplete shapes. Second, the irregular data format and the requirements for both accurate and efficient algorithms pose problems for deep learning models. To deal with the above two challenges, this doctoral dissertation mainly considers the following four features when constructing 3D deep models for point-based or object-based information extraction: (1) the exploration of geometric correlations between local points when defining convolution kernels, (2) the hierarchical local and global feature learning within an end-to-end trainable framework, (3) the relation feature learning from nearby objects, and (4) 2D image leveraging for 3D object detection from point clouds. Correspondingly, this doctoral thesis proposes a set of deep learning frameworks to deal with the 3D information extraction specific for scene segmentation and object detection from indoor and outdoor point clouds. Firstly, an end-to-end geometric graph convolution architecture on the graph representation of a point cloud is proposed for semantic scene segmentation. Secondly, a 3D proposal-based object detection framework is constructed to extract the geometric information of objects and relation features among proposals for bounding box reasoning. Thirdly, a 2D-driven approach is proposed to detect 3D objects from point clouds in indoor and outdoor scenes. Both semantic features from 2D images and the context information in 3D space are explicitly exploited to enhance the 3D detection performance. Qualitative and quantitative experiments compared with existing state-of-the-art models on indoor and outdoor datasets demonstrate the effectiveness of the proposed frameworks. A list of remaining challenges and future research issues that help to advance the development of deep learning approaches for the extraction of 3D information from point clouds are addressed at the end of this thesis

    Multi-task near-field perception for autonomous driving using surround-view fisheye cameras

    Get PDF
    Die Bildung der Augen führte zum Urknall der Evolution. Die Dynamik änderte sich von einem primitiven Organismus, der auf den Kontakt mit der Nahrung wartete, zu einem Organismus, der durch visuelle Sensoren gesucht wurde. Das menschliche Auge ist eine der raffiniertesten Entwicklungen der Evolution, aber es hat immer noch Mängel. Der Mensch hat über Millionen von Jahren einen biologischen Wahrnehmungsalgorithmus entwickelt, der in der Lage ist, Autos zu fahren, Maschinen zu bedienen, Flugzeuge zu steuern und Schiffe zu navigieren. Die Automatisierung dieser Fähigkeiten für Computer ist entscheidend für verschiedene Anwendungen, darunter selbstfahrende Autos, Augmented Realität und architektonische Vermessung. Die visuelle Nahfeldwahrnehmung im Kontext von selbstfahrenden Autos kann die Umgebung in einem Bereich von 0 - 10 Metern und 360° Abdeckung um das Fahrzeug herum wahrnehmen. Sie ist eine entscheidende Entscheidungskomponente bei der Entwicklung eines sichereren automatisierten Fahrens. Jüngste Fortschritte im Bereich Computer Vision und Deep Learning in Verbindung mit hochwertigen Sensoren wie Kameras und LiDARs haben ausgereifte Lösungen für die visuelle Wahrnehmung hervorgebracht. Bisher stand die Fernfeldwahrnehmung im Vordergrund. Ein weiteres wichtiges Problem ist die begrenzte Rechenleistung, die für die Entwicklung von Echtzeit-Anwendungen zur Verfügung steht. Aufgrund dieses Engpasses kommt es häufig zu einem Kompromiss zwischen Leistung und Laufzeiteffizienz. Wir konzentrieren uns auf die folgenden Themen, um diese anzugehen: 1) Entwicklung von Nahfeld-Wahrnehmungsalgorithmen mit hoher Leistung und geringer Rechenkomplexität für verschiedene visuelle Wahrnehmungsaufgaben wie geometrische und semantische Aufgaben unter Verwendung von faltbaren neuronalen Netzen. 2) Verwendung von Multi-Task-Learning zur Überwindung von Rechenengpässen durch die gemeinsame Nutzung von initialen Faltungsschichten zwischen den Aufgaben und die Entwicklung von Optimierungsstrategien, die die Aufgaben ausbalancieren.The formation of eyes led to the big bang of evolution. The dynamics changed from a primitive organism waiting for the food to come into contact for eating food being sought after by visual sensors. The human eye is one of the most sophisticated developments of evolution, but it still has defects. Humans have evolved a biological perception algorithm capable of driving cars, operating machinery, piloting aircraft, and navigating ships over millions of years. Automating these capabilities for computers is critical for various applications, including self-driving cars, augmented reality, and architectural surveying. Near-field visual perception in the context of self-driving cars can perceive the environment in a range of 0 - 10 meters and 360° coverage around the vehicle. It is a critical decision-making component in the development of safer automated driving. Recent advances in computer vision and deep learning, in conjunction with high-quality sensors such as cameras and LiDARs, have fueled mature visual perception solutions. Until now, far-field perception has been the primary focus. Another significant issue is the limited processing power available for developing real-time applications. Because of this bottleneck, there is frequently a trade-off between performance and run-time efficiency. We concentrate on the following issues in order to address them: 1) Developing near-field perception algorithms with high performance and low computational complexity for various visual perception tasks such as geometric and semantic tasks using convolutional neural networks. 2) Using Multi-Task Learning to overcome computational bottlenecks by sharing initial convolutional layers between tasks and developing optimization strategies that balance tasks

    Image segmentation, evaluation, and applications

    Get PDF
    This thesis aims to advance research in image segmentation by developing robust techniques for evaluating image segmentation algorithms. The key contributions of this work are as follows. First, we investigate the characteristics of existing measures for supervised evaluation of automatic image segmentation algorithms. We show which of these measures is most effective at distinguishing perceptually accurate image segmentation from inaccurate segmentation. We then apply these measures to evaluating four state-of-the-art automatic image segmentation algorithms, and establish which best emulates human perceptual grouping. Second, we develop a complete framework for evaluating interactive segmentation algorithms by means of user experiments. Our system comprises evaluation measures, ground truth data, and implementation software. We validate our proposed measures by showing their correlation with perceived accuracy. We then use our framework to evaluate four popular interactive segmentation algorithms, and demonstrate their performance. Finally, acknowledging that user experiments are sometimes prohibitive in practice, we propose a method of evaluating interactive segmentation by algorithmically simulating the user interactions. We explore four strategies for this simulation, and demonstrate that the best of these produces results very similar to those from the user experiments

    Development of Machine Learning Based Analytical Tools for Pavement Performance Assessment and Crack Detection

    Get PDF
    Pavement Management System (PMS) analytical tools mainly consist of pavement condition investigation and evaluation tools, pavement condition rating and assessment tools, pavement performance prediction tools, treatment prioritizations and implementation tools. The effectiveness of a PMS highly depends on the efficiency and reliability of its pavement condition evaluation tools. Traditionally, pavement condition investigation and evaluation practices are based on manual distress surveys and performance level assessments, which have been blamed for low efficiency low reliability. Those kinds of manually surveys are labor intensive and unsafe due to proximity to live traffic conditions. Meanwhile, the accuracy can be lower due to the subjective nature of the evaluators. Considering these factors, semiautomated and automated pavement condition evaluation tools had been developed for several years. In current years, it is undoubtable that highly advanced computerized technologies have resulted successful applications in diverse engineering fields. Therefore, these techniques can be successfully incorporated into pavement condition evaluation distress detection, the analytical tools can improve the performance of existing PMSs. Hence, this research aims to bridge the gaps between highly advanced Machine Learning Techniques (MLTs) and the existing analytical tools of current PMSs. The research outputs intend to provide pavement condition evaluation tools that meet the requirement of high efficiency, accuracy, and reliability. To achieve the objectives of this research, six pavement damage condition and performance evaluation methodologies are developed. The roughness condition of pavement surface directly influences the riding quality of the users. International Roughness Index (IRI) is used worldwide by research institutions, pavement condition evaluation and management agencies to evaluate the roughness condition of the pavement. IRI is a time-dependent variable which generally tends to increase with the increase of the pavement service life. In this consideration, a multi-granularity fuzzy time series analysis based IRI prediction model is developed. Meanwhile, Particle Swarm Optimization (PSO) method is used for model optimization to obtain satisfactory IRI prediction results. Historical IRI data extracted from the InfoPave website have been used for training and testing the model. Experiment results proved the effectiveness of this method. Automated pavement condition evaluation tools can provide overall performance indices, which can then be used for treatment planning. The calculations of those performance indices are required for surface distress level and roughness condition evaluations. However, pavement surface roughness conditions are hard to obtain from surface image indicators. With this consideration, an image indicators-based pavement roughness and the overall performance prediction tools are developed. The state-of-the-art machine learning technique, XGBoost, is utilized as the main method in model training, validating and testing. In order to find the dominant image indicators that influence the pavement roughness condition and the overall performance conditions, the comprehensive pavement performance evaluation data collected by ARAN 900 are analyzed. Back Propagation Neural Network (BPNN) is used to develop the performance prediction models. On this basis, the mean important values (MIVs) for each input factor are calculated to evaluate the contributions of the input indicators. It has been observed that indicators of the wheel path cracking have the highest MIVs, which emphasizes the importance of cracking-focused maintenance treatments. The same issue is also found that current automated pavement condition evaluation systems only include the analysis of pavement surface distresses, without considering the structural capacity of the actual pavement. Hence, the structural performance analysis-based pavement performance prediction tools are developed using the Support Vector Machines (SVMs). To guarantee the overall performance of the proposed methodologies, heuristic methods including Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) are selected to optimize the model. The experiments results show a promising future of machine learning based pavement structural performance prediction. Automated pavement condition analyzers usually detect pavement surface distress through the collected pavement surface images. Then, distress types, severities, quantities, and other parameters are calculated for the overall performance index calculation. Cracks are one of the most important pavement surface distresses that should be quantified. Traditional approaches are less accurate and efficient in locating, counting and quantifying various types of cracks initialed on the pavement surface. An integrated Crack Deep Net (CrackDN) is developed based on deep learning technologies. Through model training, validation and testing, it has proved that CrackDN can detect pavement surface cracks on complex background with high accuracy. Moreover, the combination of box-level pavement crack locating, and pixel-level crack calculation can achieve comprehensive crack analysis. Thereby, more effective maintenance treatments can be assigned. Hence, a methodology regarding pixel-level crack detection which is called CrackU-net, is proposed. CrackU-net is composed of several convolutional, maxpooling, and up-convolutional layers. The model is developed based on the innovations of deep learning-based segmentation. Pavement crack data are collected by multiple devices, including automated pavement condition survey vehicles, smartphones, and action cameras. The proposed CrackU-net is tested on a separate crack image set which has not been used for training the model. The results demonstrate a promising future of use in the PMSs. Finally, the proposed toolboxes are validated through comparative experiments in terms of accuracy (precision, recall, and F-measure) and error levels. The accuracies of all those models are higher than 0.9 and the errors are lower than 0.05. Meanwhile, the findings of this research suggest that the wheel path cracking should be a priority when conducting maintenance activity planning. Benefiting from the highly advanced machine learning technologies, pavement roughness condition and the overall performance levels have a promising future of being predicted by extraction of the image indicators. Moreover, deep learning methods can be utilized to achieve both box-level and pixel-level pavement crack detection with satisfactory performance. Therefore, it is suggested that those state-of-the-art toolboxes be integrated into current PMSs to upgrade their service levels
    corecore