1,797 research outputs found

    Visual Semantic Embedding Model based on DeViSE for medical imaging

    Get PDF
    Dissertação de mestrado em Informatics EngineeringDuring the last decades, artificial intelligence algorithms have been evolving to the point that they can achieve some amazing results like, identify and navigate roads, identify fraudulent transactions, personalize crops to individual conditions, discover new consumer trends, predict personalized health outcomes, optimize merchandising strategies, predict maintenance, optimize pricing and scheduling in real-time, diagnose diseases, among many others. However, although it can do all of that, it needs all the data to be correctly label, in other words, it can not, for example, diagnose a disease, such as a stroke, if it does not know what a stroke is, so if the algorithm has never been trained to identify strokes a new algorithm has to be created or the current one has to be retrained, similar issues happen in the other examples. This work focuses on this problem and tries to solve it by using a related in a high dimensional vector space, called semantic space, where the knowledge from known classes can be transferred to unknown classes.Durante as últimas décadas, os algoritmos de inteligência artificial têm evoluído ao ponto de alcançarem resultados incríveis, como identificar e navegar estradas, identificar transações fraudulentas, personalizar colheitas para condições individuais, descobrir novas tendências de consumo, prever resultados de saúde personalizados, otimizar merchandising estratégias, prever manutenções, otimizar preços e agendamentos em tempo real, diagnosticar doenças, entre muitos outros. Porém, embora possa fazer tudo isso, precisa que todos os dados sejam identificados corretamente, ou seja, não pode, por exemplo, diagnosticar uma doença, como um acidente vascular cerebral, se não souber o que é um AVC, portanto, se o algoritmo nunca foi treinado para identificar AVC’s um novo algoritmo precisa de ser criado ou o atual de ser retreinado, problemas semelhantes acontecem nos outros exemplos. Esta tese foca-se neste problema e tenta resolvê-lo usando um espaço vetorial relacionado de alta dimensão, denominado espaço semântico, onde o conhecimento de classes conhecidas pode ser transferido para classes desconhecidas

    Map-Guided Curriculum Domain Adaptation and Uncertainty-Aware Evaluation for Semantic Nighttime Image Segmentation

    Full text link
    We address the problem of semantic nighttime image segmentation and improve the state-of-the-art, by adapting daytime models to nighttime without using nighttime annotations. Moreover, we design a new evaluation framework to address the substantial uncertainty of semantics in nighttime images. Our central contributions are: 1) a curriculum framework to gradually adapt semantic segmentation models from day to night through progressively darker times of day, exploiting cross-time-of-day correspondences between daytime images from a reference map and dark images to guide the label inference in the dark domains; 2) a novel uncertainty-aware annotation and evaluation framework and metric for semantic segmentation, including image regions beyond human recognition capability in the evaluation in a principled fashion; 3) the Dark Zurich dataset, comprising 2416 unlabeled nighttime and 2920 unlabeled twilight images with correspondences to their daytime counterparts plus a set of 201 nighttime images with fine pixel-level annotations created with our protocol, which serves as a first benchmark for our novel evaluation. Experiments show that our map-guided curriculum adaptation significantly outperforms state-of-the-art methods on nighttime sets both for standard metrics and our uncertainty-aware metric. Furthermore, our uncertainty-aware evaluation reveals that selective invalidation of predictions can improve results on data with ambiguous content such as our benchmark and profit safety-oriented applications involving invalid inputs.Comment: IEEE T-PAMI 202

    Deep Domain Adaptation for Detecting Bomb Craters in Aerial Images

    Get PDF
    The aftermath of air raids can still be seen for decades after the devastating events. Unexploded ordnance (UXO) is an immense danger to human life and the environment. Through the assessment of wartime images, experts can infer the occurrence of a dud. The current manual analysis process is expensive and time-consuming, thus automated detection of bomb craters by using deep learning is a promising way to improve the UXO disposal process. However, these methods require a large amount of manually labeled training data. This work leverages domain adaptation with moon surface images to address the problem of automated bomb crater detection with deep learning under the constraint of limited training data. This paper contributes to both academia and practice (1) by providing a solution approach for automated bomb crater detection with limited training data and (2) by demonstrating the usability and associated challenges of using synthetic images for domain adaptation

    Deep Domain Adaptation for Detecting Bomb Craters in Aerial Images

    Get PDF
    The aftermath of air raids can still be seen for decades after the devastating events. Unexploded ordnance (UXO) is an immense danger to human life and the environment. Through the assessment of wartime images, experts can infer the occurrence of a dud. The current manual analysis process is expensive and time-consuming, thus automated detection of bomb craters by using deep learning is a promising way to improve the UXO disposal process. However, these methods require a large amount of manually labeled training data. This work leverages domain adaptation with moon surface images to address the problem of automated bomb crater detection with deep learning under the constraint of limited training data. This paper contributes to both academia and practice (1) by providing a solution approach for automated bomb crater detection with limited training data and (2) by demonstrating the usability and associated challenges of using synthetic images for domain adaptation.Comment: 56th Annual Hawaii International Conference on System Sciences (HICSS-56

    Photo Enhancement On Mobile Devices Using Deep Neural Networks

    Get PDF
    In recent years, the return of the usage of Artificial Neural Networks has lead to the greatest improvements in the field of Artificial Intelligence, due to the huge diversity of different applications that deep learning models has in a large variety of research fields, and also the evolution of information processing systems capacity. This thesis aims to study which deep neural networks models are most suitable for photo enhancement, to generate images with certain desired characteristics. Model selection has been done by comparing the both supervised, Convolutional Neural Networks, and unsupervised models, Generative Adversarial Networks. It has been demonstrated that Generative Adversarial Networks have great potential by showing results that compete with the state of the art. The chosen model is a Generative Adversarial model which outperforms the rest in terms of a combination of enhancement quality and time taken in the process. Moreover, since the model is compatible with mobile devices it has been integrated and evaluated in a BQ smartphone, to proof its viability on mobile devices.Doble Grado en Ingeniería Informática y Administración de Empresa

    Visibility in underwater robotics: Benchmarking and single image dehazing

    Get PDF
    Dealing with underwater visibility is one of the most important challenges in autonomous underwater robotics. The light transmission in the water medium degrades images making the interpretation of the scene difficult and consequently compromising the whole intervention. This thesis contributes by analysing the impact of the underwater image degradation in commonly used vision algorithms through benchmarking. An online framework for underwater research that makes possible to analyse results under different conditions is presented. Finally, motivated by the results of experimentation with the developed framework, a deep learning solution is proposed capable of dehazing a degraded image in real time restoring the original colors of the image.Una de las dificultades más grandes de la robótica autónoma submarina es lidiar con la falta de visibilidad en imágenes submarinas. La transmisión de la luz en el agua degrada las imágenes dificultando el reconocimiento de objetos y en consecuencia la intervención. Ésta tesis se centra en el análisis del impacto de la degradación de las imágenes submarinas en algoritmos de visión a través de benchmarking, desarrollando un entorno de trabajo en la nube que permite analizar los resultados bajo diferentes condiciones. Teniendo en cuenta los resultados obtenidos con este entorno, se proponen métodos basados en técnicas de aprendizaje profundo para mitigar el impacto de la degradación de las imágenes en tiempo real introduciendo un paso previo que permita recuperar los colores originales

    Head motion tracking in 3D space for drivers

    Get PDF
    Ce travail présente un système de vision par ordinateur capable de faire un suivi du mouvement en 3D de la tête d’une personne dans le cadre de la conduite automobile. Ce système de vision par ordinateur a été conçu pour faire partie d'un système intégré d’analyse du comportement des conducteurs tout en remplaçant des équipements et des accessoires coûteux, qui sont utilisés pour faire le suivi du mouvement de la tête, mais sont souvent encombrants pour le conducteur. Le fonctionnement du système est divisé en quatre étapes : l'acquisition d'images, la détection de la tête, l’extraction des traits faciaux, la détection de ces traits faciaux et la reconstruction 3D des traits faciaux qui sont suivis. Premièrement, dans l'étape d'acquisition d'images, deux caméras monochromes synchronisées sont employées pour former un système stéréoscopique qui facilitera plus tard la reconstruction 3D de la tête. Deuxièmement, la tête du conducteur est détectée pour diminuer la dimension de l’espace de recherche. Troisièmement, après avoir obtenu une paire d’images de deux caméras, l'étape d'extraction des traits faciaux suit tout en combinant les algorithmes de traitement d'images et la géométrie épipolaire pour effectuer le suivi des traits faciaux qui, dans notre cas, sont les deux yeux et le bout du nez du conducteur. Quatrièmement, dans une étape de détection des traits faciaux, les résultats 2D du suivi sont consolidés par la combinaison d'algorithmes de réseau de neurones et la géométrie du visage humain dans le but de filtrer les mauvais résultats. Enfin, dans la dernière étape, le modèle 3D de la tête est reconstruit grâce aux résultats 2D du suivi et ceux du calibrage stéréoscopique des caméras. En outre, on détermine les mesures 3D selon les six axes de mouvement connus sous le nom de degrés de liberté de la tête (longitudinal, vertical, latéral, roulis, tangage et lacet). La validation des résultats est effectuée en exécutant nos algorithmes sur des vidéos préenregistrés des conducteurs utilisant un simulateur de conduite afin d'obtenir des mesures 3D avec notre système et par la suite, à les comparer et les valider plus tard avec des mesures 3D fournies par un dispositif pour le suivi de mouvement installé sur la tête du conducteur.This work presents a computer vision module capable of tracking the head motion in 3D space for drivers. This computer vision module was designed to be part of an integrated system to analyze the behaviour of the drivers by replacing costly equipments and accessories that track the head of a driver but are often cumbersome for the user. The vision module operates in five stages: image acquisition, head detection, facial features extraction, facial features detection, and 3D reconstruction of the facial features that are being tracked. Firstly, in the image acquisition stage, two synchronized monochromatic cameras are used to set up a stereoscopic system that will later make the 3D reconstruction of the head simpler. Secondly the driver’s head is detected to reduce the size of the search space for finding facial features. Thirdly, after obtaining a pair of images from the two cameras, the facial features extraction stage follows by combining image processing algorithms and epipolar geometry to track the chosen features that, in our case, consist of the two eyes and the tip of the nose. Fourthly, in a detection stage, the 2D tracking results are consolidated by combining a neural network algorithm and the geometry of the human face to discriminate erroneous results. Finally, in the last stage, the 3D model of the head is reconstructed from the 2D tracking results (e.g. tracking performed in each image independently) and calibration of the stereo pair. In addition 3D measurements according to the six axes of motion known as degrees of freedom of the head (longitudinal, vertical and lateral, roll, pitch and yaw) are obtained. The validation of the results is carried out by running our algorithms on pre-recorded video sequences of drivers using a driving simulator in order to obtain 3D measurements to be compared later with the 3D measurements provided by a motion tracking device installed on the driver’s head
    corecore