231 research outputs found

    Saliency maps on image hierarchies

    Get PDF
    © 2015 Elsevier B.V. All rights reserved. In this paper we propose two saliency models for salient object segmentation based on a hierarchical image segmentation, a tree-like structure that represents regions at different scales from the details to the whole image (e.g. gPb-UCM, BPT). The first model is based on a hierarchy of image partitions. The saliency at each level is computed on a region basis, taking into account the contrast between regions. The maps obtained for the different partitions are then integrated into a final saliency map. The second model directly works on the structure created by the segmentation algorithm, computing saliency at each node and integrating these cues in a straightforward manner into a single saliency map. We show that the proposed models produce high quality saliency maps. Objective evaluation demonstrates that the two methods achieve state-of-the-art performance in several benchmark datasets.Peer ReviewedPostprint (author's final draft

    A morphological approach for segmentation and tracking of human faces

    Get PDF
    A new technique for segmenting and tracking human faces in video sequences is presented. The technique relies on morphological tools such as using connected operators to extract the connected component that more likely belongs to a face, and partition projection to track this component through the sequence. A binary partition tree (BPT) is used to implement the connected operator. The BPT is constructed based on the chrominance criteria and its nodes are analyzed so that the selected node maximizes an estimation of the likelihood of being part of a face. The tracking is performed using a partition projection approach. Images are divided into face and non-face parts, which are tracked through the sequence. The technique has been successfully assessed using several test sequences from the MPEG-4 (raw format) and the MPEG-7 databases (MPEG-1 format).Peer ReviewedPostprint (published version

    Monte-Carlo sampling applied to multiple instance learning for whole slide image classification

    Get PDF
    In this paper we propose a patch sampling strategy based on sequential Monte-Carlo methods for Whole Slide Image classification in the context of Multiple Instance Learning and show its capability to achieve high generalization performance on the differentiation between sun exposed and not sun exposed pieces of skin tissue.Postprint (published version

    Brain MRI super-resolution using generative adversarial networks

    Get PDF
    In this work we propose an adversarial learning approach to generate high resolution MRI scans from low resolution images. The architecture, based on the SRGAN model, adopts 3D convolutions to exploit volumetric information. For the discriminator, the adversarial loss uses least squares in order to stabilize the training. For the generator, the loss function is a combination of a least squares adversarial loss and a content term based on mean square error and image gradients in order to improve the quality of the generated images. We explore different solutions for the up sampling phase. We present promising results that improve classical interpolation, showing the potential of the approach for 3D medical imaging super-resolution.Postprint (published version

    Action tube extraction based 3D-CNN for RGB-D action recognition

    Get PDF
    In this paper we propose a novel action tube extractor for RGB-D action recognition in trimmed videos. The action tube extractor takes as input a video and outputs an action tube. The method consists of two parts: spatial tube extraction and temporal sampling. The first part is built upon MobileNet-SSD and its role is to define the spatial region where the action takes place. The second part is based on the structural similarity index (SSIM) and is designed to remove frames without obvious motion from the primary action tube. The final extracted action tube has two benefits: 1) a higher ratio of ROI (subjects of action) to background; 2) most frames contain obvious motion change. We propose to use a two-stream (RGB and Depth) I3D architecture as our 3D-CNN model. Our approach outperforms the state-of-the-art methods on the OA and NTU RGB-D datasets. © 2018 IEEE.Peer ReviewedPostprint (published version

    MRI brain tumor segmentation and uncertainty estimation using 3D-UNet architectures

    Get PDF
    Automation of brain tumor segmentation in 3D magnetic resonance images (MRIs) is key to assess the diagnostic and treatment of the disease. In recent years, convolutional neural networks (CNNs) have shown improved results in the task. However, high memory consumption is still a problem in 3D-CNNs. Moreover, most methods do not include uncertainty information, which is especially critical in medical diagnosis. This work studies 3D encoder-decoder architectures trained with patch-based techniques to reduce memory consumption and decrease the effect of unbalanced data. The different trained models are then used to create an ensemble that leverages the properties of each model, thus increasing the performance. We also introduce voxel-wise uncertainty information, both epistemic and aleatoric using test-time dropout (TTD) and data-augmentation (TTA) respectively. In addition, a hybrid approach is proposed that helps increase the accuracy of the segmentation. The model and uncertainty estimation measurements proposed in this work have been used in the BraTS’20 Challenge for task 1 and 3 regarding tumor segmentation and uncertainty estimation.This work has been partially supported by the project MALEGRA TEC2016-75976-R financed by the Spanish Ministerio de Economía y Competitividad.Peer ReviewedPostprint (published version

    Self-supervised graph representations of WSIs

    Get PDF
    In this manuscript we propose a framework for the analysis of whole slide images (WSI) on the cell entity space with self-supervised deep learning on graphs and explore its representation quality at different levels of application. It consists of a two step process in which the cell level analysis is performed locally, by clusters of nearby cells that can be seen as small regions of the image, in order to learn representations that capture the cell environment and distribution. In a second stage, a WSI graph is generated with these regions as nodes and the representations learned as initial node embeddings. The graph is leveraged for a downstream task, region of interest (ROI) detection addressed as a graph clustering. The representations outperform the evaluation baselines at both levels of application, which has been carried out predicting whether a cell, or region, is tumor or not based on its learned representations with a logistic regressor.This work has been supported by the Spanish Research Agency (AEI) under project PID2020- 116907RB-I00 of the call MCIN/ AEI /10.13039/501100011033 and the FI-AGAUR grant funded by Direcció General de Recerca (DGR) of Departament de Recerca i Universitats (REU) of the Generalitat de Catalunya.Peer ReviewedPostprint (published version

    Picking groups instead of samples: a close look at Static Pool-based Meta-Active Learning

    Get PDF
    ©2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Active Learning techniques are used to tackle learning problems where obtaining training labels is costly. In this work we use Meta-Active Learning to learn to select a subset of samples from a pool of unsupervised input for further annotation. This scenario is called Static Pool-based Meta-Active Learning. We propose to extend existing approaches by performing the selection in a manner that, unlike previous works, can handle the selection of each sample based on the whole selected subset.Peer ReviewedPostprint (author's final draft

    BCN20000: dermoscopic lesions in the wild

    Get PDF
    This article summarizes the BCN20000 dataset, composed of 19424 dermoscopic images of skin lesions captured from 2010 to 2016 in the facilities of the Hospital Clínic in Barcelona. With this dataset, we aim to study the problem of unconstrained classification of dermoscopic images of skin cancer, including lesions found in hard-to-diagnose locations (nails and mucosa), large lesions which do not fit in the aperture of the dermoscopy device, and hypo-pigmented lesions. The BCN20000 will be provided to the participants of the ISIC Challenge 2019 [8], where they will be asked to train algorithms to classify dermoscopic images of skin cancer automatically.Peer ReviewedPreprin

    SEG-ESRGAN: A multi-task network for super-resolution and semantic segmentation of remote sensing images

    Get PDF
    The production of highly accurate land cover maps is one of the primary challenges in remote sensing, which depends on the spatial resolution of the input images. Sometimes, high-resolution imagery is not available or is too expensive to cover large areas or to perform multitemporal analysis. In this context, we propose a multi-task network to take advantage of the freely available Sentinel-2 imagery to produce a super-resolution image, with a scaling factor of 5, and the corresponding high-resolution land cover map. Our proposal, named SEG-ESRGAN, consists of two branches: the super-resolution branch, that produces Sentinel-2 multispectral images at 2 m resolution, and an encoder–decoder architecture for the semantic segmentation branch, that generates the enhanced land cover map. From the super-resolution branch, several skip connections are retrieved and concatenated with features from the different stages of the encoder part of the segmentation branch, promoting the flow of meaningful information to boost the accuracy in the segmentation task. Our model is trained with a multi-loss approach using a novel dataset to train and test the super-resolution stage, which is developed from Sentinel-2 and WorldView-2 image pairs. In addition, we generated a dataset with ground-truth labels for the segmentation task. To assess the super-resolution improvement, the PSNR, SSIM, ERGAS, and SAM metrics were considered, while to measure the classification performance, we used the IoU, confusion matrix and the F1-score. Experimental results demonstrate that the SEG-ESRGAN model outperforms different full segmentation and dual network models (U-Net, DeepLabV3+, HRNet and Dual_DeepLab), allowing the generation of high-resolution land cover maps in challenging scenarios using Sentinel-2 10 m bands.This work was funded by the Spanish Agencia Estatal de Investigación (AEI) under projects ARTEMISAT-2 (CTM 2016-77733-R), PID2020-117142GB-I00 and PID2020-116907RB-I00 (MCIN/AEI call 10.13039/501100011033). L.S. would like to acknowledge the BECAL (Becas Carlos Antonio López) scholarship for the financial support.Peer ReviewedPostprint (published version
    corecore