9,447 research outputs found

    Learning to Segment Breast Biopsy Whole Slide Images

    Full text link
    We trained and applied an encoder-decoder model to semantically segment breast biopsy images into biologically meaningful tissue labels. Since conventional encoder-decoder networks cannot be applied directly on large biopsy images and the different sized structures in biopsies present novel challenges, we propose four modifications: (1) an input-aware encoding block to compensate for information loss, (2) a new dense connection pattern between encoder and decoder, (3) dense and sparse decoders to combine multi-level features, (4) a multi-resolution network that fuses the results of encoder-decoders run on different resolutions. Our model outperforms a feature-based approach and conventional encoder-decoders from the literature. We use semantic segmentations produced with our model in an automated diagnosis task and obtain higher accuracies than a baseline approach that employs an SVM for feature-based segmentation, both using the same segmentation-based diagnostic features.Comment: Added more WSI images in appendi

    Large-scale Land Cover Classification in GaoFen-2 Satellite Imagery

    Full text link
    Many significant applications need land cover information of remote sensing images that are acquired from different areas and times, such as change detection and disaster monitoring. However, it is difficult to find a generic land cover classification scheme for different remote sensing images due to the spectral shift caused by diverse acquisition condition. In this paper, we develop a novel land cover classification method that can deal with large-scale data captured from widely distributed areas and different times. Additionally, we establish a large-scale land cover classification dataset consisting of 150 Gaofen-2 imageries as data support for model training and performance evaluation. Our experiments achieve outstanding classification accuracy compared with traditional methods.Comment: IGARSS'18 conference pape

    Automatic annotation for weakly supervised learning of detectors

    Get PDF
    PhDObject detection in images and action detection in videos are among the most widely studied computer vision problems, with applications in consumer photography, surveillance, and automatic media tagging. Typically, these standard detectors are fully supervised, that is they require a large body of training data where the locations of the objects/actions in images/videos have been manually annotated. With the emergence of digital media, and the rise of high-speed internet, raw images and video are available for little to no cost. However, the manual annotation of object and action locations remains tedious, slow, and expensive. As a result there has been a great interest in training detectors with weak supervision where only the presence or absence of object/action in image/video is needed, not the location. This thesis presents approaches for weakly supervised learning of object/action detectors with a focus on automatically annotating object and action locations in images/videos using only binary weak labels indicating the presence or absence of object/action in images/videos. First, a framework for weakly supervised learning of object detectors in images is presented. In the proposed approach, a variation of multiple instance learning (MIL) technique for automatically annotating object locations in weakly labelled data is presented which, unlike existing approaches, uses inter-class and intra-class cue fusion to obtain the initial annotation. The initial annotation is then used to start an iterative process in which standard object detectors are used to refine the location annotation. Finally, to ensure that the iterative training of detectors do not drift from the object of interest, a scheme for detecting model drift is also presented. Furthermore, unlike most other methods, our weakly supervised approach is evaluated on data without manual pose (object orientation) annotation. Second, an analysis of the initial annotation of objects, using inter-class and intra-class cues, is carried out. From the analysis, a new method based on negative mining (NegMine) is presented for the initial annotation of both object and action data. The NegMine based approach is a much simpler formulation using only inter-class measure and requires no complex combinatorial optimisation but can still meet or outperform existing approaches including the previously pre3 sented inter-intra class cue fusion approach. Furthermore, NegMine can be fused with existing approaches to boost their performance. Finally, the thesis will take a step back and look at the use of generic object detectors as prior knowledge in weakly supervised learning of object detectors. These generic object detectors are typically based on sampling saliency maps that indicate if a pixel belongs to the background or foreground. A new approach to generating saliency maps is presented that, unlike existing approaches, looks beyond the current image of interest and into images similar to the current image. We show that our generic object proposal method can be used by itself to annotate the weakly labelled object data with surprisingly high accuracy

    Multi-atlas label fusion by using supervised local weighting for brain image segmentation

    Get PDF
    La segmentación automática de estructuras de interés en imágenes de resonancia magnética cerebral requiere esfuerzos significantes, debido a las formas complicadas, el bajo contraste y la variabilidad anatómica. Un aspecto que reduce el desempeño de la segmentación basada en múltiples atlas es la suposición de correspondencias uno-a-uno entre los voxeles objetivo y los del atlas. Para mejorar el desempeño de la segmentación, las metodologías de fusión de etiquetas incluyen información espacial y de intensidad a través de estrategias de votación ponderada a nivel de voxel. Aunque los pesos se calculan para un conjunto de atlas predefinido, estos no son muy eficientes en etiquetar estructuras intrincadas, ya que la mayoría de las formas de los tejidos no se distribuyen uniformemente en las imágenes. Este artículo propone una metodología de extracción de características a nivel de voxel basado en la combinación lineal de las intensidades de un parche. Hasta el momento, este es el primer intento de extraer características locales maximizando la función de alineamiento de kernel centralizado, buscando construir representaciones discriminativas, superar la complejidad de las estructuras, y reducir la influencia de los artefactos. Para validar los resultados, la estrategia de segmentación propuesta se compara contra la segmentación Bayesiana y la fusión de etiquetas basada en parches en tres bases de datos diferentes. Respecto del índice de similitud Dice, nuestra propuesta alcanza el más alto acierto (90.3% en promedio) con suficiente robusticidad ante los artefactos y respetabilidad apropiada.The automatic segmentation of interest structures is devoted to the morphological analysis of brain magnetic resonance imaging volumes. It demands significant efforts due to its complicated shapes and since it lacks contrast between tissues and intersubject anatomical variability. One aspect that reduces the accuracy of the multi-atlasbased segmentation is the label fusion assumption of one-to-one correspondences between targets and atlas voxels. To improve the performance of brain image segmentation, label fusion approaches include spatial and intensity information by using voxel-wise weighted voting strategies. Although the weights are assessed for a predefined atlas set, they are not very efficient for labeling intricate structures since most tissue shapes are not uniformly distributed in the images. This paper proposes a methodology of voxel-wise feature extraction based on the linear combination of patch intensities. As far as we are concerned, this is the first attempt to locally learn the features by maximizing the centered kernel alignment function. Our methodology aims to build discriminative representations, deal with complex structures, and reduce the image artifacts. The result is an enhanced patch-based segmentation of brain images. For validation, the proposed brain image segmentation approach is compared against Bayesian-based and patch-wise label fusion on three different brain image datasets. In terms of the determined Dice similarity index, our proposal shows the highest segmentation accuracy (90.3% on average); it presents sufficient artifact robustness, and provides suitable repeatability of the segmentation results

    Multimodal non-linear latent semantic method for information retrieval

    Get PDF
    La búsqueda y recuperación de datos multimodales es una importante tarea dentro del campo de búsqueda y recuperación de información, donde las consultas y los elementos de la base de datos objetivo están representados por un conjunto de modalidades, donde cada una de ellas captura un aspecto de un fenómeno de interés. Cada modalidad contiene información complementaria y común a otras modalidades. Con el fin de tomar ventaja de la información adicional distribuida a través de las distintas modalidades han sido desarrollados muchos algoritmos y métodos que utilizan las propiedades estadísticas en los datos multimodales para encontrar correlaciones implícitas, otros aprenden a calcular distancias heterogéneas, otros métodos aprenden a proyectar los datos desde el espacio de entrada hasta un espacio semántico común, donde las diferentes modalidades son comparables y se puede construir un ranking a partir de ellas. En esta tesis se presenta el diseño de un sistema para la búsqueda y recuperación de información multimodal que aprende varias proyecciones no lineales a espacios semánticos latentes donde las distintas modalidades son representadas en conjunto y es posible realizar comparaciones y medidas de similitud para construir rankings multimodales. Adicionalmente se propone un método kernelizado para la proyección de datos a un espacio semántico latente usando la información de las etiquetas como método de supervisión para construir índice multimodal que integra los datos multimodales y la información de las etiquetas; este método puede proyectar los datos a tres diferentes espacios semánticos donde varias configuraciones de búsqueda y recuperación de información pueden ser aplicadas. El sistema y el método propuestos fueron evaluados en un conjunto de datos compuesto por casos médicos, donde cada caso consta de una imagen de tejido prostático, un reporte de texto del patólogo y un valor de Gleason score como etiqueta de supervisión. Combinando la información multimodal y la información en las etiquetas se generó un índice multimodal que se utilizó para realizar la tarea de búsqueda y recuperación de información por contenido obteniendo resultados sobresalientes. Las proyecciones no-lineales permiten al modelo una mayor flexibilidad y capacidad de representación. Sin embargo calcular estas proyecciones no-lineales en un conjunto de datos enorme es computacionalmente costoso, para reducir este costo y habilitar el modelo para procesar datos a gran escala, la técnica del budget fue utilizada, mostrando un buen compromiso entre efectividad y velocidad.Multimodal information retrieval is an information retrieval sub-task where queries and database target elements are composed of several modalities or views. A modality is a representation of complex phenomena, captured and measured by different sensors or information sources, each one encodes some information about it. Each modality representation contains complementary and shared information about the phenomenon of interest, this additional information can be used to improve the information retrieval process. Several methods have been developed to take advantage of additional information distributed across different modalities. Some of them exploit statistical properties in multimodal data to find correlations and implicit relationships, others learn heterogeneous distance functions, and others learn linear and non-linear projections that transform data from the original input space to a common latent semantic space where different modalities are comparable. In spite of the attention dedicated to this issue, multimodal information retrieval is still an open problem. This thesis presents a multimodal information retrieval system designed to learn several mapping functions to transform multimodal data to a latent semantic space, where different modalities are combined and can be compared to build a multimodal ranking and perform a multimodal information retrieval task. Additionally, a multimodal kernelized latent semantic embedding method is proposed to construct a supervised multimodal index, integrating multimodal data and label supervision. This method can perform mappings to three different spaces where some information retrieval task setups can be performed. The proposed system and method were evaluated in a multimodal medical case-based retrieval task where data is composed of whole-slide images of prostate tissue samples, pathologist’s text report and Gleason score as a supervised label. Multimodal data and labels were combined to produce a multimodal index. This index was used to retrieve multimodal information and achieves outstanding results compared with previous works on this topic. Non-linear mappings provide more flexibility and representation capacity to the proposed model. However, constructing the non-linear mapping in a large dataset using kernel methods can be computationally costly. To reduce the cost and allow large scale applications, the budget technique was introduced, showing good performance between speed and effectiveness.COLCIENCIASJóvenes investigadores 761/2016Línea de investigación: Ciencias de la computaciónMaestrí
    corecore