1,201 research outputs found

    Weakly supervised human skin segmentation using guidance attention mechanisms

    Get PDF
    Human skin segmentation is a crucial task in computer vision and biometric systems, yet it poses several challenges such as variability in skin colour, pose, and illumination. This paper presents a robust data-driven skin segmentation method for a single image that addresses these challenges through the integration of contextual information and efficient network design. In addition to robustness and accuracy, the integration into real-time systems requires a careful balance between computational power, speed, and performance. The proposed method incorporates two attention modules, Body Attention and Skin Attention, that utilize contextual information to improve segmentation results. These modules draw attention to the desired areas, focusing on the body boundaries and skin pixels, respectively. Additionally, an efficient network architecture is employed in the encoder part to minimize computational power while retaining high performance. To handle the issue of noisy labels in skin datasets, the proposed method uses a weakly supervised training strategy, relying on the Skin Attention module. The results of this study demonstrate that the proposed method is comparable to, or outperforms, state-of-the-art methods on benchmark datasets.This work is part of the visuAAL project on Privacy-Aware and Acceptable Video-Based Technologies and Services for Active and Assisted Living (https://www.visuaal-itn.eu/). This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 861091

    Medical Image Segmentation Review: The success of U-Net

    Full text link
    Automatic medical image segmentation is a crucial topic in the medical domain and successively a critical counterpart in the computer-aided diagnosis paradigm. U-Net is the most widespread image segmentation architecture due to its flexibility, optimized modular design, and success in all medical image modalities. Over the years, the U-Net model achieved tremendous attention from academic and industrial researchers. Several extensions of this network have been proposed to address the scale and complexity created by medical tasks. Addressing the deficiency of the naive U-Net model is the foremost step for vendors to utilize the proper U-Net variant model for their business. Having a compendium of different variants in one place makes it easier for builders to identify the relevant research. Also, for ML researchers it will help them understand the challenges of the biological tasks that challenge the model. To address this, we discuss the practical aspects of the U-Net model and suggest a taxonomy to categorize each network variant. Moreover, to measure the performance of these strategies in a clinical application, we propose fair evaluations of some unique and famous designs on well-known datasets. We provide a comprehensive implementation library with trained models for future research. In addition, for ease of future studies, we created an online list of U-Net papers with their possible official implementation. All information is gathered in https://github.com/NITR098/Awesome-U-Net repository.Comment: Submitted to the IEEE Transactions on Pattern Analysis and Machine Intelligence Journa

    Lightweight real-time hand segmentation leveraging MediaPipe landmark detection

    Get PDF
    Producción CientíficaReal-time hand segmentation is a key process in applications that require human–computer interaction, such as gesture rec- ognition or augmented reality systems. However, the infinite shapes and orientations that hands can adopt, their variability in skin pigmentation and the self-occlusions that continuously appear in images make hand segmentation a truly complex problem, especially with uncontrolled lighting conditions and backgrounds. The development of robust, real-time hand segmentation algorithms is essential to achieve immersive augmented reality and mixed reality experiences by correctly interpreting collisions and occlusions. In this paper, we present a simple but powerful algorithm based on the MediaPipe Hands solution, a highly optimized neural network. The algorithm processes the landmarks provided by MediaPipe using morphological and logical operators to obtain the masks that allow dynamic updating of the skin color model. Different experiments were carried out comparing the influence of the color space on skin segmentation, with the CIELab color space chosen as the best option. An average intersection over union of 0.869 was achieved on the demanding Ego2Hands dataset running at 90 frames per second on a conventional computer without any hardware acceleration. Finally, the proposed seg- mentation procedure was implemented in an augmented reality application to add hand occlusion for improved user immer- sion. An open-source implementation of the algorithm is publicly available at https://github.com/itap-robotica-medica/light weight-hand-segmentation.Ministerio de Ciencia e Innovación (under Grant Agreement No. RTC2019-007350-1)Publicación en abierto financiada por el Consorcio de Bibliotecas Universitarias de Castilla y León (BUCLE), con cargo al Programa Operativo 2014ES16RFOP009 FEDER 2014-2020 DE CASTILLA Y LEÓN, Actuación:20007-CL - Apoyo Consorcio BUCL

    An attention residual u-net with differential preprocessing and geometric postprocessing: Learning how to segment vasculature including intracranial aneurysms

    Get PDF
    Objective Intracranial aneurysms (IA) are lethal, with high morbidity and mortality rates. Reliable, rapid, and accurate segmentation of IAs and their adjacent vasculature from medical imaging data is important to improve the clinical management of patients with IAs. However, due to the blurred boundaries and complex structure of IAs and overlapping with brain tissue or other cerebral arteries, image segmentation of IAs remains challenging. This study aimed to develop an attention residual U-Net (ARU-Net) architecture with differential preprocessing and geometric postprocessing for automatic segmentation of IAs and their adjacent arteries in conjunction with 3D rotational angiography (3DRA) images. Methods The proposed ARU-Net followed the classic U-Net framework with the following key enhancements. First, we preprocessed the 3DRA images based on boundary enhancement to capture more contour information and enhance the presence of small vessels. Second, we introduced the long skip connections of the attention gate at each layer of the fully convolutional decoder-encoder structure to emphasize the field of view (FOV) for IAs. Third, residual-based short skip connections were also embedded in each layer to implement in-depth supervision to help the network converge. Fourth, we devised a multiscale supervision strategy for independent prediction at different levels of the decoding path, integrating multiscale semantic information to facilitate the segmentation of small vessels. Fifth, the 3D conditional random field (3DCRF) and 3D connected component optimization (3DCCO) were exploited as postprocessing to optimize the segmentation results. Results Comprehensive experimental assessments validated the effectiveness of our ARU-Net. The proposed ARU-Net model achieved comparable or superior performance to the state-of-the-art methods through quantitative and qualitative evaluations. Notably, we found that ARU-Net improved the identification of arteries connecting to an IA, including small arteries that were hard to recognize by other methods. Consequently, IA geometries segmented by the proposed ARU-Net model yielded superior performance during subsequent computational hemodynamic studies (also known as patient-specific computational fluid dynamics [CFD] simulations). Furthermore, in an ablation study, the five key enhancements mentioned above were confirmed. Conclusions The proposed ARU-Net model can automatically segment the IAs in 3DRA images with relatively high accuracy and potentially has significant value for clinical computational hemodynamic analysis

    ConvMixerSeg: Weakly Supervised Semantic Segmentation for CT Liver Images

    Get PDF
    The predictive power of modern deep learning approaches is posed to revolutionize the medical imaging field, however, their usefulness and applicability are severely limited by the lack of well annotated data. Liver segmentation in CT images is an application that could benefit particularly well from less data hungry methods and potentially lead to better liver volume estimation and tumor detection. To this end, we propose a new semantic segmentation model called ConvMixerSeg and experimentally show that it outperforms an FCN with a ResNet-50 backbone when trained to segment livers on a subset of the Liver Tumor Segmentation Benchmark data set (LiTS). We have further developed a novel Class Activation Map (CAM) based method to train semantic segmentation models with image level labels without adding parameters. The proposed CAM method includes a Neighborhood Correlation Enforcement module using Gaussian smoothing that reduces part domination and prediction noise. Additionally, our experiments show that the proposed CAM method outperforms the original CAM method for both classification and segmentation with high statistical significance given the same ConvMixerSeg backbone

    Deep Learning Models For Biomedical Data Analysis

    Get PDF
    The field of biomedical data analysis is a vibrant area of research dedicated to extracting valuable insights from a wide range of biomedical data sources, including biomedical images and genomics data. The emergence of deep learning, an artificial intelligence approach, presents significant prospects for enhancing biomedical data analysis and knowledge discovery. This dissertation focused on exploring innovative deep-learning methods for biomedical image processing and gene data analysis. During the COVID-19 pandemic, biomedical imaging data, including CT scans and chest x-rays, played a pivotal role in identifying COVID-19 cases by categorizing patient chest x-ray outcomes as COVID-19-positive or negative. While supervised deep learning methods have effectively recognized COVID-19 patterns in chest x-ray datasets, the availability of annotated training data remains limited. To address this challenge, the thesis introduced a semi-supervised deep learning model named ssResNet, built upon the Residual Neural Network (ResNet) architecture. The model combines supervised and unsupervised paths, incorporating a weighted supervised loss function to manage data imbalance. The strategies to diminish prediction uncertainty in deep learning models for critical applications like medical image processing is explore. It achieves this through an ensemble deep learning model, integrating bagging deep learning and model calibration techniques. This ensemble model not only boosts biomedical image segmentation accuracy but also reduces prediction uncertainty, as validated on a comprehensive chest x-ray image segmentation dataset. Furthermore, the thesis introduced an ensemble model integrating Proformer and ensemble learning methodologies. This model constructs multiple independent Proformers for predicting gene expression, their predictions are combined through weighted averaging to generate final predictions. Experimental outcomes underscore the efficacy of this ensemble model in enhancing prediction performance across various metrics. In conclusion, this dissertation advances biomedical data analysis by harnessing the potential of deep learning techniques. It devises innovative approaches for processing biomedical images and gene data. By leveraging deep learning\u27s capabilities, this work paves the way for further progress in biomedical data analytics and its applications within clinical contexts. Index Terms- biomedical data analysis, COVID-19, deep learning, ensemble learning, gene data analytics, medical image segmentation, prediction uncertainty, Proformer, Residual Neural Network (ResNet), semi-supervised learning

    An empirical study on ensemble of segmentation approaches

    Get PDF
    Riconoscere oggetti all’interno delle immagini richiede delle abilità complesse che richiedono una conoscenza del contesto e la capacità di identificare i bordi degli oggetti stessi. Nel campo della computer vision, questo compito è chiamato segmentazione semantica e riguarda la classificazione di ogni pixel all’interno di un’immagine. Tale compito è di primaria importanza in molti scenari reali: nei veicoli autonomi, dove permette l’identificazione degli oggetti che circondano il veicolo, o nella diagnosi medica, in cui migliora la capacità di identificare patologie pericolose e quindi mitigare il rischio di serie conseguenze. In questo studio, proponiamo un nuovo modello per un multiclassificatore in grado di risolvere il compito di segmentazione semantica. Il modello si basa su reti neurali convoluzionali (CNN) e transformers. Un multiclassificatore usa diversi modelli le cui stime vengono aggregate così da ottenere l’output del sistema di multiclassificazione. Le prestazioni e la qualità delle previsioni dell’ensemble sono fortemente connessi ad alcuni fattori, tra cui il più importante è la diversità tra i singoli modelli. Nell’approccio qui proposto, abbiamo ottenuto questo risultato adottando diverse loss functions e testando con diversi metodi di data augmentation. Abbiamo sviluppato questo metodo combinando DeepLabV3+, HarDNet-MSEG e dei Pyramid Vision Transformers (PVT). La soluzione qui sviluppata è stata poi esaminata mediante un’ampia valutazione empirica in 5 diversi scenari: rilevamento di polipi, rilevamento della pelle, riconoscimento di leucociti, rilevamento di microorganismi e riconoscimento di farfalle. Il modello fornisce dei risultati che sono allo stato dell’arte. Tutte le risorse sono disponibili online all’indirizzo https://github.com/AlbertoFormaggio1/Ensemble-Of-Segmentation.Recognizing objects in images requires complex skills that involve knowledge about the context and the ability to identify the borders of the objects. In computer vision, this task is called semantic segmentation and it pertains to the classification of each pixel in an image. The task is of main importance in many real-life scenarios: in autonomous vehicles, it allows the identification of objects surrounding the vehicle; in medical diagnosis, it improves the ability of early detecting dangerous pathologies and thus to mitigate the risk of serious consequences. In this work, we propose a new ensemble method able to solve the semantic segmentation task. The model is based on convolutional neural networks (CNNs) and transformers. An ensemble uses many different models whose predictions are aggregated to form the output of the ensemble system. The performance and quality of the ensemble prediction are strongly connected with some factors, one of the most important is the diversity among individual models. In our approach, this is enforced by adopting different loss functions and testing different data augmentation. We developed the proposed method by combining DeepLabV3+, HarDNet-MSEG, and Pyramid Vision Transformers. The developed solution was then assessed through an extensive empirical evaluation in five different scenarios: polyp detection, skin detection, leukocytes recognition, environmental microorganism detection, and butterfly recognition. The model provides state-of-the-art results. All resources will be available online at https://github.com/AlbertoFormaggio1/Ensemble-Of-Segmentation

    Weather Image Generation using a Generative Adversarial Network

    Get PDF
    This thesis will inspect, if coupling a simple U-Net segmentation model with an image-to-image transformation Generative Adversarial Network, CycleGAN, will improve data augmentation result compared to sole CycleGAN. To evaluate the proposed method, a dataset consisting of weather images of different weather conditions and corresponding segmentation masks is used. Furthermore, we investigate the performance of different pre-trained CNNs in the encoder part of the U-Net model. The main goal is to provide a solution for generating data to be used in future data augmentation projects for real applications. Images of the proposed segmentation and CycleGAN model pipeline will be evaluated with Fréchet Inception Distance metric, and compared to sole CycleGAN results. The results indicate that there is an increase in generated image quality by coupling a segmentation model with a generator of CycleGAN, at least with the used dataset. Additional improvements might be achieved by adding an attention model to the pipeline or changing the segmentation or generative adversarial network model architectures.Tämä tutkielma selvittää, tuottaako yksinkertaisen U-Net segmentaatiomallin yhdistäminen kuvasta-kuvaan generatiiviseen vastakkaisverkkoon, CycleGANiin, parempia tuloksia kuin pelkkä CycleGAN. Esitetyn ratkaisun arvioimiseksi käytetään sääkuvista ja niitä vastaavista segmentaatioleimoista koostuvaa datasettiä. Lisäksi tutkimme, paljonko eroavaisuuksia esiopetetuilla CNN:llä on U-Net arkkitehtuurin enkooderissa suorituskyvyn osalta. Tutkielman päätavoite on tuottaa ratkaisu uuden datan generoimiseksi reaalimailman sovelluskohteisiin. Ehdotetun segmentaatio- ja CycleGAN-mallista koostuvan liukuhihnan suorituskyky arvioidaan Fréchetin aloitusetäisyys-menetelmällä, jota myös verrataan pelkällä CycleGANilla saatuihin tuloksiin. Tutkielman tulokset implikoivat, että kuvanlaatu nousee esitettyä liukuhihnaa käyttämällä ainakin kyseessä olevalla datasetillä. Lisäparannuksia voi saada aikaan liukuhihnaan erillisen huomiomallin tai muuttamalla segmentaatio- tai generatiivisen vastakkaisverkon arkkitehtuuri
    • …
    corecore