32 research outputs found

    Sematic understanding of large-scale outdoor web images: From emotion recognition to scene classification

    Get PDF
    Facial expression recognition and scene-based image clustering are very popular topics in the fields of human-computer interaction and computer vision. Their relationship has been rarely investigated but is a very attractive topic that has many potential applications, such as landscape design, instructions for vacation choices, or plant layout design in the public space. In this research, we use the existing deep learning algorithms to study two issues, i.e., facial expression recognition and scene-based image clustering for large scale outdoor web images. This research paves a path for a future attempt that explores their relationship in real-world images. First, we concentrate on emotion recognition and investigate the performance of the well-known algorithms including Visual Geometry Group Network (VGG network) and Residual Net (ResNet) on the emotions in images captured from a public park. Then we introduce some approaches to address the challenges of the occluded or children's faces. Our proposed pre-processing schemes not only allow the algorithm to detect more faces but also to increase the rate of recognition accuracy under the complex environment. We also investigate the visual analysis of landscape by introducing a set of scene labels for a large set of natural scene images collected from an online source. Then the weakly supervised method - Curriculum Net is applied for scene labeling of our dataset. In Curriculum Net, the training dataset is split into two parts, clean (easy) and noisy (hard) datasets by using a Density Peak Clustering algorithm, from which Curriculum Net is trained from easy to hard data. Particularly, we adopt a more effective density clustering method, Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN), to improve the clean-noisy separation of training images that leads to the improved scene labeling performance. By summarizing the work in emotion recognition and scene-based image clustering, we prepare the future research to reveal the relationship between the two aspects in real-world scenarios

    Visual and Camera Sensors

    Get PDF
    This book includes 13 papers published in Special Issue ("Visual and Camera Sensors") of the journal Sensors. The goal of this Special Issue was to invite high-quality, state-of-the-art research papers dealing with challenging issues in visual and camera sensors

    Performance and Power Optimization of Multi-kernel Applications on Multi-FPGA Platforms

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Point Cloud Processing for Environmental Analysis in Autonomous Driving using Deep Learning

    Get PDF
    Autonomous self-driving cars need a very precise perception system of their environment, working for every conceivable scenario. Therefore, different kinds of sensor types, such as lidar scanners, are in use. This thesis contributes highly efficient algorithms for 3D object recognition to the scientific community. It provides a Deep Neural Network with specific layers and a novel loss to safely localize and estimate the orientation of objects from point clouds originating from lidar sensors. First, a single-shot 3D object detector is developed that outputs dense predictions in only one forward pass. Next, this detector is refined by fusing complementary semantic features from cameras and joint probabilistic tracking to stabilize predictions and filter outliers. The last part presents an evaluation of data from automotive-grade lidar scanners. A Generative Adversarial Network is also being developed as an alternative for target-specific artificial data generation.One of the main objectives of leading automotive companies is autonomous self-driving cars. They need a very precise perception system of their environment, working for every conceivable scenario. Therefore, different kinds of sensor types are in use. Besides cameras, lidar scanners became very important. The development in that field is significant for future applications and system integration because lidar offers a more accurate depth representation, independent from environmental illumination. Especially algorithms and machine learning approaches, including Deep Learning and Artificial Intelligence based on raw laser scanner data, are very important due to the long range and three-dimensional resolution of the measured point clouds. Consequently, a broad field of research with many challenges and unsolved tasks has been established. This thesis aims to address this deficit and contribute highly efficient algorithms for 3D object recognition to the scientific community. It provides a Deep Neural Network with specific layers and a novel loss to safely localize and estimate the orientation of objects from point clouds. First, a single shot 3D object detector is developed that outputs dense predictions in only one forward pass. Next, this detector is refined by fusing complementary semantic features from cameras and a joint probabilistic tracking to stabilize predictions and filter outliers. In the last part, a concept for deployment into an existing test vehicle focuses on the semi-automated generation of a suitable dataset. Subsequently, an evaluation of data from automotive-grade lidar scanners is presented. A Generative Adversarial Network is also being developed as an alternative for target-specific artificial data generation. Experiments on the acquired application-specific and benchmark datasets show that the presented methods compete with a variety of state-of-the-art algorithms while being trimmed down to efficiency for use in self-driving cars. Furthermore, they include an extensive set of standard evaluation metrics and results to form a solid baseline for future research.Eines der Hauptziele führender Automobilhersteller sind autonome Fahrzeuge. Sie benötigen ein sehr präzises System für die Wahrnehmung der Umgebung, dass für jedes denkbare Szenario überall auf der Welt funktioniert. Daher sind verschiedene Arten von Sensoren im Einsatz, sodass neben Kameras u. a. auch Lidar Sensoren ein wichtiger Bestandteil sind. Die Entwicklung auf diesem Gebiet ist für künftige Anwendungen von höchster Bedeutung, da Lidare eine genauere, von der Umgebungsbeleuchtung unabhängige, Tiefendarstellung bieten. Insbesondere Algorithmen und maschinelle Lernansätze wie Deep Learning, die Rohdaten über Lernzprozesse direkt verarbeiten können, sind aufgrund der großen Reichweite und der dreidimensionalen Auflösung der gemessenen Punktwolken sehr wichtig. Somit hat sich ein weites Forschungsfeld mit vielen Herausforderungen und ungelösten Problemen etabliert. Diese Arbeit zielt darauf ab, dieses Defizit zu verringern und effiziente Algorithmen zur 3D-Objekterkennung zu entwickeln. Sie stellt ein tiefes Neuronales Netzwerk mit spezifischen Schichten und einer neuartigen Fehlerfunktion zur sicheren Lokalisierung und Schätzung der Orientierung von Objekten aus Punktwolken bereit. Zunächst wird ein 3D-Detektor entwickelt, der in nur einem Vorwärtsdurchlauf aus einer Punktwolke alle Objekte detektiert. Anschließend wird dieser Detektor durch die Fusion von komplementären semantischen Merkmalen aus Kamerabildern und einem gemeinsamen probabilistischen Tracking verfeinert, um die Detektionen zu stabilisieren und Ausreißer zu filtern. Im letzten Teil wird ein Konzept für den Einsatz in einem bestehenden Testfahrzeug vorgestellt, das sich auf die halbautomatische Generierung eines geeigneten Datensatzes konzentriert. Hierbei wird eine Auswertung auf Daten von Automotive-Lidaren vorgestellt. Als Alternative zur zielgerichteten künstlichen Datengenerierung wird ein weiteres generatives Neuronales Netzwerk untersucht. Experimente mit den erzeugten anwendungsspezifischen- und Benchmark-Datensätzen zeigen, dass sich die vorgestellten Methoden mit dem Stand der Technik messen können und gleichzeitig auf Effizienz für den Einsatz in selbstfahrenden Autos optimiert sind. Darüber hinaus enthalten sie einen umfangreichen Satz an Evaluierungsmetriken und -ergebnissen, die eine solide Grundlage für die zukünftige Forschung bilden

    Multimodal perception for autonomous driving

    Get PDF
    Mención Internacional en el título de doctorAutonomous driving is set to play an important role among intelligent transportation systems in the coming decades. The advantages of its large-scale implementation –reduced accidents, shorter commuting times, or higher fuel efficiency– have made its development a priority for academia and industry. However, there is still a long way to go to achieve full self-driving vehicles, capable of dealing with any scenario without human intervention. To this end, advances in control, navigation and, especially, environment perception technologies are yet required. In particular, the detection of other road users that may interfere with the vehicle’s trajectory is a key element, since it allows to model the current traffic situation and, thus, to make decisions accordingly. The objective of this thesis is to provide solutions to some of the main challenges of on-board perception systems, such as extrinsic calibration of sensors, object detection, and deployment on real platforms. First, a calibration method for obtaining the relative transformation between pairs of sensors is introduced, eliminating the complex manual adjustment of these parameters. The algorithm makes use of an original calibration pattern and supports LiDARs, and monocular and stereo cameras. Second, different deep learning models for 3D object detection using LiDAR data in its bird’s eye view projection are presented. Through a novel encoding, the use of architectures tailored to image detection is proposed to process the 3D information of point clouds in real time. Furthermore, the effectiveness of using this projection together with image features is analyzed. Finally, a method to mitigate the accuracy drop of LiDARbased detection networks when deployed in ad-hoc configurations is introduced. For this purpose, the simulation of virtual signals mimicking the specifications of the desired real device is used to generate new annotated datasets that can be used to train the models. The performance of the proposed methods is evaluated against other existing alternatives using reference benchmarks in the field of computer vision (KITTI and nuScenes) and through experiments in open traffic with an automated vehicle. The results obtained demonstrate the relevance of the presented work and its suitability for commercial use.La conducción autónoma está llamada a jugar un papel importante en los sistemas inteligentes de transporte de las próximas décadas. Las ventajas de su implementación a larga escala –disminución de accidentes, reducción del tiempo de trayecto, u optimización del consumo– han convertido su desarrollo en una prioridad para la academia y la industria. Sin embargo, todavía hay un largo camino por delante hasta alcanzar una automatización total, capaz de enfrentarse a cualquier escenario sin intervención humana. Para ello, aún se requieren avances en las tecnologías de control, navegación y, especialmente, percepción del entorno. Concretamente, la detección de otros usuarios de la carretera que puedan interferir en la trayectoria del vehículo es una pieza fundamental para conseguirlo, puesto que permite modelar el estado actual del tráfico y tomar decisiones en consecuencia. El objetivo de esta tesis es aportar soluciones a algunos de los principales retos de los sistemas de percepción embarcados, como la calibración extrínseca de los sensores, la detección de objetos, y su despliegue en plataformas reales. En primer lugar, se introduce un método para la obtención de la transformación relativa entre pares de sensores, eliminando el complejo ajuste manual de estos parámetros. El algoritmo hace uso de un patrón de calibración propio y da soporte a cámaras monoculares, estéreo, y LiDAR. En segundo lugar, se presentan diferentes modelos de aprendizaje profundo para la detección de objectos en 3D utilizando datos de escáneres LiDAR en su proyección en vista de pájaro. A través de una nueva codificación, se propone la utilización de arquitecturas de detección en imagen para procesar en tiempo real la información tridimensional de las nubes de puntos. Además, se analiza la efectividad del uso de esta proyección junto con características procedentes de imágenes. Por último, se introduce un método para mitigar la pérdida de precisión de las redes de detección basadas en LiDAR cuando son desplegadas en configuraciones ad-hoc. Para ello, se plantea la simulación de señales virtuales con las características del modelo real que se quiere utilizar, generando así nuevos conjuntos anotados para entrenar los modelos. El rendimiento de los métodos propuestos es evaluado frente a otras alternativas existentes haciendo uso de bases de datos de referencia en el campo de la visión por computador (KITTI y nuScenes), y mediante experimentos en tráfico abierto empleando un vehículo automatizado. Los resultados obtenidos demuestran la relevancia de los trabajos presentados y su viabilidad para un uso comercial.Programa de Doctorado en Ingeniería Eléctrica, Electrónica y Automática por la Universidad Carlos III de MadridPresidente: Jesús García Herrero.- Secretario: Ignacio Parra Alonso.- Vocal: Gustavo Adolfo Peláez Coronad

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters

    Evaluering av maskinlæringsmetoder for automatisk tumorsegmentering

    Get PDF
    The definition of target volumes and organs at risk (OARs) is a critical part of radiotherapy planning. In routine practice, this is typically done manually by clinical experts who contour the structures in medical images prior to dosimetric planning. This is a time-consuming and labor-intensive task. Moreover, manual contouring is inherently a subjective task and substantial contour variability can occur, potentially impacting on radiotherapy treatment and image-derived biomarkers. Automatic segmentation (auto-segmentation) of target volumes and OARs has the potential to save time and resources while reducing contouring variability. Recently, auto-segmentation of OARs using machine learning methods has been integrated into the clinical workflow by several institutions and such tools have been made commercially available by major vendors. The use of machine learning methods for auto-segmentation of target volumes including the gross tumor volume (GTV) is less mature at present but is the focus of extensive ongoing research. The primary aim of this thesis was to investigate the use of machine learning methods for auto-segmentation of the GTV in medical images. Manual GTV contours constituted the ground truth in the analyses. Volumetric overlap and distance-based metrics were used to quantify auto-segmentation performance. Four different image datasets were evaluated. The first dataset, analyzed in papers I–II, consisted of positron emission tomography (PET) and contrast-enhanced computed tomography (ceCT) images of 197 patients with head and neck cancer (HNC). The ceCT images of this dataset were also included in paper IV. Two datasets were analyzed separately in paper III, namely (i) PET, ceCT, and low-dose CT (ldCT) images of 86 patients with anal cancer (AC), and (ii) PET, ceCT, ldCT, and T2 and diffusion-weighted (T2W and DW, respectively) MR images of a subset (n = 36) of the aforementioned AC patients. The last dataset consisted of ceCT images of 36 canine patients with HNC and was analyzed in paper IV. In paper I, three approaches to auto-segmentation of the GTV in patients with HNC were evaluated and compared, namely conventional PET thresholding, classical machine learning algorithms, and deep learning using a 2-dimensional (2D) U-Net convolutional neural network (CNN). For the latter two approaches the effect of imaging modality on auto-segmentation performance was also assessed. Deep learning based on multimodality PET/ceCT image input resulted in superior agreement with the manual ground truth contours, as quantified by geometric overlap and distance-based performance evaluation metrics calculated on a per patient basis. Moreover, only deep learning provided adequate performance for segmentation based solely on ceCT images. For segmentation based on PET-only, all three approaches provided adequate segmentation performance, though deep learning ranked first, followed by classical machine learning, and PET thresholding. In paper II, deep learning-based auto-segmentation of the GTV in patients with HNC using a 2D U-Net architecture was evaluated more thoroughly by introducing new structure-based performance evaluation metrics and including qualitative expert evaluation of the resulting auto-segmentation quality. As in paper I, multimodal PET/ceCT image input provided superior segmentation performance, compared to the single modality CNN models. The structure-based metrics showed quantitatively that the PET signal was vital for the sensitivity of the CNN models, as the superior PET/ceCT-based model identified 86 % of all malignant GTV structures whereas the ceCT-based model only identified 53 % of these structures. Furthermore, the majority of the qualitatively evaluated auto-segmentations (~ 90 %) generated by the best PET/ceCT-based CNN were given a quality score corresponding to substantial clinical value. Based on papers I and II, deep learning with multimodality PET/ceCT image input would be the recommended approach for auto-segmentation of the GTV in human patients with HNC. In paper III, deep learning-based auto-segmentation of the GTV in patients with AC was evaluated for the first time, using a 2D U-Net architecture. Furthermore, an extensive comparison of the impact of different single modality and multimodality combinations of PET, ceCT, ldCT, T2W, and/or DW image input on quantitative auto-segmentation performance was conducted. For both the 86-patient and 36-patient datasets, the models based on PET/ceCT provided the highest mean overlap with the manual ground truth contours. For this task, however, comparable auto-segmentation quality was obtained for solely ceCT-based CNN models. The CNN model based solely on T2W images also obtained acceptable auto-segmentation performance and was ranked as the second-best single modality model for the 36-patient dataset. These results indicate that deep learning could prove a versatile future tool for auto-segmentation of the GTV in patients with AC. Paper IV investigated for the first time the applicability of deep learning-based auto-segmentation of the GTV in canine patients with HNC, using a 3-dimensional (3D) U-Net architecture and ceCT image input. A transfer learning approach where CNN models were pre-trained on the human HNC data and subsequently fine-tuned on canine data was compared to training models from scratch on canine data. These two approaches resulted in similar auto-segmentation performances, which on average was comparable to the overlap metrics obtained for ceCT-based auto-segmentation in human HNC patients. Auto-segmentation in canine HNC patients appeared particularly promising for nasal cavity tumors, as the average overlap with manual contours was 25 % higher for this subgroup, compared to the average for all included tumor sites. In conclusion, deep learning with CNNs provided high-quality GTV autosegmentations for all datasets included in this thesis. In all cases, the best-performing deep learning models resulted in an average overlap with manual contours which was comparable to the reported interobserver agreements between human experts performing manual GTV contouring for the given cancer type and imaging modality. Based on these findings, further investigation of deep learning-based auto-segmentation of the GTV in the given diagnoses would be highly warranted.Definisjon av målvolum og risikoorganer er en kritisk del av planleggingen av strålebehandling. I praksis gjøres dette vanligvis manuelt av kliniske eksperter som tegner inn strukturenes konturer i medisinske bilder før dosimetrisk planlegging. Dette er en tids- og arbeidskrevende oppgave. Manuell inntegning er også subjektiv, og betydelig variasjon i inntegnede konturer kan forekomme. Slik variasjon kan potensielt påvirke strålebehandlingen og bildebaserte biomarkører. Automatisk segmentering (auto-segmentering) av målvolum og risikoorganer kan potensielt spare tid og ressurser samtidig som konturvariasjonen reduseres. Autosegmentering av risikoorganer ved hjelp av maskinlæringsmetoder har nylig blitt implementert som del av den kliniske arbeidsflyten ved flere helseinstitusjoner, og slike verktøy er kommersielt tilgjengelige hos store leverandører av medisinsk teknologi. Auto-segmentering av målvolum inkludert tumorvolumet gross tumor volume (GTV) ved hjelp av maskinlæringsmetoder er per i dag mindre teknologisk modent, men dette området er fokus for omfattende pågående forskning. Hovedmålet med denne avhandlingen var å undersøke bruken av maskinlæringsmetoder for auto-segmentering av GTV i medisinske bilder. Manuelle GTVinntegninger utgjorde grunnsannheten (the ground truth) i analysene. Mål på volumetrisk overlapp og avstand mellom sanne og predikerte konturer ble brukt til å kvantifisere kvaliteten til de automatisk genererte GTV-konturene. Fire forskjellige bildedatasett ble evaluert. Det første datasettet, analysert i artikkel I–II, bestod av positronemisjonstomografi (PET) og kontrastforsterkede computertomografi (ceCT) bilder av 197 pasienter med hode/halskreft. ceCT-bildene i dette datasettet ble også inkludert i artikkel IV. To datasett ble analysert separat i artikkel III, nemlig (i) PET, ceCT og lavdose CT (ldCT) bilder av 86 pasienter med analkreft, og (ii) PET, ceCT, ldCT og T2- og diffusjonsvektet (henholdsvis T2W og DW) MR-bilder av en undergruppe (n = 36) av de ovennevnte analkreftpasientene. Det siste datasettet, som bestod av ceCT-bilder av 36 hunder med hode/halskreft, ble analysert i artikkel IV

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters
    corecore