44 research outputs found

    Improving Cancer Classification With Domain Adaptation Techniques

    Get PDF
    Background: As the quantity and complexity of oncological data continue to increase, machine learning (ML) has become an important tool in helping clinicians better understand malignancies and provide personalized care. Diagnostic image analysis, in particular, has benefited from the advent of ML methods to improve image classification and generate prognostic information from imaging collected in routine clinical practice [1-3]. Deep learning, a subset of ML, has especially achieved remarkable performance in medical imaging, including segmentation [4, 5], object detection, classification [6], and diagnosis [7]. Despite the notable success of deep learning computer vision models on oncologic imaging data, recent studies have identified notable weaknesses in deep learning models used on diagnostic images. Specifically, deep learning models have difficulty generalizing to data that was not well represented during training. One potential solution is the use of domain adaptation (DA) techniques, which improve the generalizability of a deep learning model trained on one domain to better generalize to data of a target domain. Techniques: In this study, we explain the efficacy of four common DA techniques – MMD, CORAL, iDANN, and AdaBN - used on deep learning models trained on common diagnostic imaging modalities in oncology. We used two datasets of mammographic imaging and CT scans to test the prediction accuracy of models using each of these DA techniques and compared them to the performance of transfer learning. Results: In the mammographic imaging data, MMD, CORAL, and iDANN increased the target test accuracy for all four domains. MMD increased target accuracies by 3.6 - 45%, CORAL by 4- 48%, and iDANN by 1.5-49.4%. For the CT scan dataset, MMD, CORAL, and iDANN increased the target test accuracy for all domains. MMD increased the target accuracy by 2.0 – 13.9%, CORAL by 2.4 - 15.8%, and iDANN by 2.0 – 11.1%. in both the mammographic imaging data and the CT scans, AdaBN performed worse or comparably to transfer learning. Conclusion: We found that DA techniques significantly improve model performance and generalizability. These findings suggest that there’s potential to further study how multiple DA techniques can work together and that these can potentially help us create more robust, generalizable models

    Semantic Segmentation from Sparse Labeling Using Multi-Level Superpixels

    Get PDF
    Semantic segmentation is a challenging problemthat can benefit numerous robotics applications, since it pro-vides information about the content at every image pixel.Solutions to this problem have recently witnessed a boost onperformance and results thanks to deep learning approaches.Unfortunately, common deep learning models for semanticsegmentation present several challenges which hinder real lifeapplicability in many domains. A significant challenge is theneed of pixel level labeling on large amounts of trainingimages to be able to train those models, which implies avery high cost. This work proposes and validates a simplebut effective approach to train dense semantic segmentationmodels from sparsely labeled data. Labeling only a few pixelsper image reduces the human interaction required. We findmany available datasets, e.g., environment monitoring data, thatprovide this kind of sparse labeling. Our approach is basedon augmenting the sparse annotation to a dense one with theproposed adaptive superpixel segmentation propagation. Weshow that this label augmentation enables effective learning ofstate-of-the-art segmentation models, getting similar results tothose models trained with dense ground-truth

    Segmentación semántica con modelos de deep learning y etiquetados no densos

    Get PDF
    La segmentación semántica es un problema muy estudiado dentro del campo de la visión por computador que consiste en la clasificación de imágenes a nivel de píxel. Es decir, asignar una etiqueta o valor a cada uno de los píxeles de la imagen. Tiene aplicaciones muy variadas, que van desde interpretar el contenido de escenas urbanas para tareas de conducción automática hasta aplicaciones médicas que ayuden al médico a analizar la información del paciente para realizar un diagnóstico o operaciones. Como en muchos otros problemas y tareas relacionados con la visión por computador, en los últimos años se han propuesto y demostrado grandes avances en los métodos para segmentación semántica gracias, en gran parte, al reciente auge de los métodos basados en aprendizaje profundo o deep learning.\\ A pesar de que en los últimos años se están realizando mejoras constantes, los modelos de \textit{deep learning} para segmentación semántica %así como otras áreas, tienen un problema presentan un reto que dificulta su aplicabilidad a problemas de la vida real: necesitan grandes cantidades de anotaciones para entrenar los modelos. Esto es muy costoso, sobre todo porque en este caso hay que realizarlo a nivel de píxel. Muchos conjuntos de datos reales, por ejemplo datos adquiridos para tareas de monitorización del medio ambiente (grabaciones de entornos naturales, imágenes de satélite) generalmente presentan tan solo unos pocos píxeles etiquetados por imagen, que suelen venir de algunos clicks de un experto, para indicar ciertas zonas de interés en esas imágenes. Este tipo de etiquetado hace %imposible que sea muy complicado el entrenamiento de modelos densos que permitan procesar y obtener de manera automática una mayor cantidad de información de todos estos conjuntos de datos.\\ El objetivo de este trabajo es proponer nuevos métodos para resolver este problema. La idea principal es utilizar una segmentación inicial de la imagen multi-nivel de la imagen para propagar la poca información disponible. Este enfoque novedoso permite aumentar la anotación, y demostramos que pese a ser algo ruidosa, permite aprender de manera efectiva un modelo que obtenga la segmentación deseada. Este método es aplicable a cualquier tipo de dispersión de las anotaciones, siendo independiente del número de píxeles anotados. Las principales tareas desarrolladas en este proyecto son: -Estudio del estado del arte en técnicas de segmentación semántica (la mayoría basadas en técnicas de deep learning) -Propuesta y evaluación de métodos para aumentar (propagar) las etiquetas de las imágenes de entrenamiento cuando estas son dispersas y escasas -Diseño y evaluación de las arquitecturas de redes neuronales más adecuadas para resolver este problema Para validar nuestras propuestas, nos centramos en un caso de aplicación en imágenes submarinas, capturadas para monitorización de las zonas de barreras de coral. También demostramos que el método propuesto se puede aplicar a otro tipo de imágenes, como imágenes aéreas, imágenes multiespectrales y conjuntos de datos de segmentación de instancias

    Combine and conquer: representation learning from multmodal data

    Get PDF
    Supervised learning, which involves training a model using a labelled dataset, is becoming less popular due to its high cost and various issues with generalization and robustness. This is unsurprising, as data such as images and language are complex and cannot be accurately represented by a single label. When models are trained using this method, they often learn features that spuriously correlate with the label, resulting in poor performance when deployed in the real world. This thesis explores representation learning using multiple sources of data, such as images and language or photos and sketches. We demonstrate through both generative and discriminative models that extracting common abstract concepts between multiple modalities or domains can lead to more accurate and generalisable representations. In addition, we investigate ways to improve the data efficiency of these models, including using fewer multimodal pairs through contrastivestyle objectives and generating multimodal pairs through masked image modeling. Finally, we systematically evaluate the robustness of different learning objectives on distribution shift tasks in order to understand their usefulness in the real world

    Make the U in UDA Matter: Invariant Consistency Learning for Unsupervised Domain Adaptation

    Full text link
    Domain Adaptation (DA) is always challenged by the spurious correlation between domain-invariant features (e.g., class identity) and domain-specific features (e.g., environment) that does not generalize to the target domain. Unfortunately, even enriched with additional unsupervised target domains, existing Unsupervised DA (UDA) methods still suffer from it. This is because the source domain supervision only considers the target domain samples as auxiliary data (e.g., by pseudo-labeling), yet the inherent distribution in the target domain -- where the valuable de-correlation clues hide -- is disregarded. We propose to make the U in UDA matter by giving equal status to the two domains. Specifically, we learn an invariant classifier whose prediction is simultaneously consistent with the labels in the source domain and clusters in the target domain, hence the spurious correlation inconsistent in the target domain is removed. We dub our approach "Invariant CONsistency learning" (ICON). Extensive experiments show that ICON achieves the state-of-the-art performance on the classic UDA benchmarks: Office-Home and VisDA-2017, and outperforms all the conventional methods on the challenging WILDS 2.0 benchmark. Codes are in https://github.com/yue-zhongqi/ICON.Comment: Accepted by NeurIPS 202

    Detection of central serous retinopathy using deep learning through retinal images

    Get PDF
    The human eye is responsible for the visual reorganization of objects in the environment. The eye is divided into different layers and front/back areas; however, the most important part is the retina, responsible for capturing light and generating electrical impulses for further processing in the brain. Several manual and automated methods have been proposed to detect retinal diseases, though these techniques are time-consuming, inefficient, and unpleasant for patients. This research proposes a deep learning-based CSR detection employing two imaging techniques: OCT and fundus photography. These input images are manually augmented before classification, followed by training of DarkNet and DenseNet networks through both datasets. Moreover, pre-trained DarkNet and DenseNet classifiers are modified according to the need. Finally, the performance of both networks on their datasets is compared using evaluation parameters. After several experiments, the best accuracy of 99.78%, the sensitivity of 99.6%, specificity of 100%, and the F1 score of 99.52% were achieved through OCT images using the DenseNet network. The experimental results demonstrate that the proposed model is effective and efficient for CSR detection using the OCT dataset and suitable for deployment in clinical applications.This work was supported by the Riphah Artificial Intelligence Research (RAIR) Lab, Riphah International University, Faisalabad Campus, Pakistan. Open Access funding provided by the Qatar National Library. Qatar National Library and Qatar University Internal Grant IRCC-2021–010 funded this work

    Data-Efficient Machine Learning with Focus on Transfer Learning

    Get PDF
    Machine learning (ML) has attracted a significant amount of attention from the artifi- cial intelligence community. ML has shown state-of-art performance in various fields, such as signal processing, healthcare system, and natural language processing (NLP). However, most conventional ML algorithms suffer from three significant difficulties: 1) insufficient high-quality training data, 2) costly training process, and 3) domain dis- crepancy. Therefore, it is important to develop solutions for these problems, so the future of ML will be more sustainable. Recently, a new concept, data-efficient ma- chine learning (DEML), has been proposed to deal with the current bottlenecks of ML. Moreover, transfer learning (TL) has been considered as an effective solution to address the three shortcomings of conventional ML. Furthermore, TL is one of the most active areas in the DEML. Over the past ten years, significant progress has been made in TL. In this dissertation, I propose to address the three problems by developing a software- oriented framework and TL algorithms. Firstly, I introduce a DEML framework and a evaluation system. Moreover, I present two novel TL algorithms and applications on real-world problems. Furthermore, I will first present the first well-defined DEML framework and introduce how it can address the challenges in ML. After that, I will give an updated overview of the state-of-the-art and open challenges in the TL. I will then introduce two novel algorithms for two of the most challenging TL topics: distant domain TL and cross-modality TL (image-text). A detailed algorithm introduction and preliminary results on real-world applications (Covid-19 diagnosis and image clas- sification) will be presented. Then, I will discuss the current trends in TL algorithms and real-world applications. Lastly, I will present the conclusion and future research directions

    Semantic Segmentation for Real-World Applications

    Get PDF
    En visión por computador, la comprensión de escenas tiene como objetivo extraer información útil de una escena a partir de datos de sensores. Por ejemplo, puede clasificar toda la imagen en una categoría particular o identificar elementos importantes dentro de ella. En este contexto general, la segmentación semántica proporciona una etiqueta semántica a cada elemento de los datos sin procesar, por ejemplo, a todos los píxeles de la imagen o, a todos los puntos de la nube de puntos. Esta información es esencial para muchas aplicaciones de visión por computador, como conducción, aplicaciones médicas o robóticas. Proporciona a los ordenadores una comprensión sobre el entorno que es necesaria para tomar decisiones autónomas.El estado del arte actual de la segmentación semántica está liderado por métodos de aprendizaje profundo supervisados. Sin embargo, las condiciones del mundo real presentan varias restricciones para la aplicación de estos modelos de segmentación semántica. Esta tesis aborda varios de estos desafíos: 1) la cantidad limitada de datos etiquetados disponibles para entrenar modelos de aprendizaje profundo, 2) las restricciones de tiempo y computación presentes en aplicaciones en tiempo real y/o en sistemas con poder computacional limitado, y 3) la capacidad de realizar una segmentación semántica cuando se trata de sensores distintos de la cámara RGB estándar.Las aportaciones principales en esta tesis son las siguientes:1. Un método nuevo para abordar el problema de los datos anotados limitados para entrenar modelos de segmentación semántica a partir de anotaciones dispersas. Los modelos de aprendizaje profundo totalmente supervisados lideran el estado del arte, pero mostramos cómo entrenarlos usando solo unos pocos píxeles etiquetados. Nuestro enfoque obtiene un rendimiento similar al de los modelos entrenados con imágenescompletamente etiquetadas. Demostramos la relevancia de esta técnica en escenarios de monitorización ambiental y en dominios más generales.2. También tratando con datos de entrenamiento limitados, proponemos un método nuevo para segmentación semántica semi-supervisada, es decir, cuando solo hay una pequeña cantidad de imágenes completamente etiquetadas y un gran conjunto de datos sin etiquetar. La principal novedad de nuestro método se basa en el aprendizaje por contraste. Demostramos cómo el aprendizaje por contraste se puede aplicar a la tarea de segmentación semántica y mostramos sus ventajas, especialmente cuando la disponibilidad de datos etiquetados es limitada logrando un nuevo estado del arte.3. Nuevos modelos de segmentación semántica de imágenes eficientes. Desarrollamos modelos de segmentación semántica que son eficientes tanto en tiempo de ejecución, requisitos de memoria y requisitos de cálculo. Algunos de nuestros modelos pueden ejecutarse en CPU a altas velocidades con alta precisión. Esto es muy importante para configuraciones y aplicaciones reales, ya que las GPU de gama alta nosiempre están disponibles.4. Nuevos métodos de segmentación semántica con sensores no RGB. Proponemos un método para la segmentación de nubes de puntos LiDAR que combina operaciones de aprendizaje eficientes tanto en 2D como en 3D. Logra un rendimiento de segmentación excepcional a velocidades realmente rápidas. También mostramos cómo mejorar la robustez de estos modelos al abordar el problema de sobreajuste y adaptaciónde dominio. Además, mostramos el primer trabajo de segmentación semántica con cámaras de eventos, haciendo frente a la falta de datos etiquetados.Estas contribuciones aportan avances significativos en el campo de la segmentación semántica para aplicaciones del mundo real. Para una mayor contribución a la comunidad cientfíica, hemos liberado la implementación de todas las soluciones propuestas.----------------------------------------In computer vision, scene understanding aims at extracting useful information of a scene from raw sensor data. For instance, it can classify the whole image into a particular category (i.e. kitchen or living room) or identify important elements within it (i.e., bottles, cups on a table or surfaces). In this general context, semantic segmentation provides a semantic label to every single element of the raw data, e.g., to all image pixels or to all point cloud points.This information is essential for many applications relying on computer vision, such as AR, driving, medical or robotic applications. It provides computers with understanding about the environment needed to make autonomous decisions, or detailed information to people interacting with the intelligent systems. The current state of the art for semantic segmentation is led by supervised deep learning methods.However, real-world scenarios and conditions introduce several challenges and restrictions for the application of these semantic segmentation models. This thesis tackles several of these challenges, namely, 1) the limited amount of labeled data available for training deep learning models, 2) the time and computation restrictions present in real time applications and/or in systems with limited computational power, such as a mobile phone or an IoT node, and 3) the ability to perform semantic segmentation when dealing with sensors other than the standard RGB camera.The general contributions presented in this thesis are following:A novel approach to address the problem of limited annotated data to train semantic segmentation models from sparse annotations. Fully supervised deep learning models are leading the state-of-the-art, but we show how to train them by only using a few sparsely labeled pixels in the training images. Our approach obtains similar performance than models trained with fully-labeled images. We demonstrate the relevance of this technique in environmental monitoring scenarios, where it is very common to have sparse image labels provided by human experts, as well as in more general domains. Also dealing with limited training data, we propose a novel method for semi-supervised semantic segmentation, i.e., when there is only a small number of fully labeled images and a large set of unlabeled data. We demonstrate how contrastive learning can be applied to the semantic segmentation task and show its advantages, especially when the availability of labeled data is limited. Our approach improves state-of-the-art results, showing the potential of contrastive learning in this task. Learning from unlabeled data opens great opportunities for real-world scenarios since it is an economical solution. Novel efficient image semantic segmentation models. We develop semantic segmentation models that are efficient both in execution time, memory requirements, and computation requirements. Some of our models able to run in CPU at high speed rates with high accuracy. This is very important for real set-ups and applications since high-end GPUs are not always available. Building models that consume fewer resources, memory and time, would increase the range of applications that can benefit from them. Novel methods for semantic segmentation with non-RGB sensors.We propose a novel method for LiDAR point cloud segmentation that combines efficient learning operations both in 2D and 3D. It surpasses state-of-the-art segmentation performance at really fast rates. We also show how to improve the robustness of these models tackling the overfitting and domain adaptation problem. Besides, we show the first work for semantic segmentation with event-based cameras, coping with the lack of labeled data. To increase the impact of this contributions and ease their application in real-world settings, we have made available an open-source implementation of all proposed solutions to the scientific community.<br /
    corecore