440 research outputs found

    Classification and Retrieval of Digital Pathology Scans: A New Dataset

    Full text link
    In this paper, we introduce a new dataset, \textbf{Kimia Path24}, for image classification and retrieval in digital pathology. We use the whole scan images of 24 different tissue textures to generate 1,325 test patches of size 1000×\times1000 (0.5mm×\times0.5mm). Training data can be generated according to preferences of algorithm designer and can range from approximately 27,000 to over 50,000 patches if the preset parameters are adopted. We propose a compound patch-and-scan accuracy measurement that makes achieving high accuracies quite challenging. In addition, we set the benchmarking line by applying LBP, dictionary approach and convolutional neural nets (CNNs) and report their results. The highest accuracy was 41.80\% for CNN.Comment: Accepted for presentation at Workshop for Computer Vision for Microscopy Image Analysis (CVMI 2017) @ CVPR 2017, Honolulu, Hawai

    Multi-Magnification Search in Digital Pathology

    Get PDF
    This research study investigates the effect of magnification on content-based image search in digital pathology archives and proposes to use multi-magnification image representation. Image search in large archives of digital pathology slides provides researchers and medical professionals with an opportunity to match records of current and past patients and learn from evidently diagnosed and treated cases. When working with microscopes, pathologists switch between different magnification levels while examining tissue specimens to find and evaluate various morphological features. Inspired by the conventional pathology workflow, this thesis investigates several magnification levels in digital pathology and their combinations to minimize the gap between AI-enabled image search methods and clinical settings. This thesis suggests two approaches for combining magnification levels and compares their performance. The first approach obtains a single-vector deep feature representation for a WSI, whereas the second approach works with a multi-vector deep feature representation. The proposed content-based searching framework does not rely on any pixel-level annotation and potentially applies to millions of unlabelled (raw) WSIs. This thesis proposes using binary masks generated by U-Net as the primary step of patch preparation to locating tissue regions in a WSI. As a part of this thesis, a multi-magnification dataset of histopathology patches is created by applying the proposed patch preparation method on more than 8,000 WSIs of TCGA repository. The performance of both MMS methods is evaluated by investigating the top three most similar WSIs to a query WSI found by the search. The search is considered successful if two out of three matched cases have the same malignancy subtype as the query WSI. Experimental search results across tumors of several anatomical sites at different magnification levels, i.e., 20×, 10×, and 5× magnifications and their combinations, are reported in this thesis. The experiments verify that cell-level information at the highest magnification is essential for searching for diagnostic purposes. In contrast, low-magnification information may improve this assessment depending on the tumor type. Both proposed search methods generally performed more accurately at 20× magnification or the combination of the 20× magnification with 10×, 5×, or both. The multi-magnification searching approach achieved up to 11% increase in F1-score for searching among some tumor types, including the urinary tract and brain tumor subtypes compared to the single-magnification image search

    Multimodal non-linear latent semantic method for information retrieval

    Get PDF
    La búsqueda y recuperación de datos multimodales es una importante tarea dentro del campo de búsqueda y recuperación de información, donde las consultas y los elementos de la base de datos objetivo están representados por un conjunto de modalidades, donde cada una de ellas captura un aspecto de un fenómeno de interés. Cada modalidad contiene información complementaria y común a otras modalidades. Con el fin de tomar ventaja de la información adicional distribuida a través de las distintas modalidades han sido desarrollados muchos algoritmos y métodos que utilizan las propiedades estadísticas en los datos multimodales para encontrar correlaciones implícitas, otros aprenden a calcular distancias heterogéneas, otros métodos aprenden a proyectar los datos desde el espacio de entrada hasta un espacio semántico común, donde las diferentes modalidades son comparables y se puede construir un ranking a partir de ellas. En esta tesis se presenta el diseño de un sistema para la búsqueda y recuperación de información multimodal que aprende varias proyecciones no lineales a espacios semánticos latentes donde las distintas modalidades son representadas en conjunto y es posible realizar comparaciones y medidas de similitud para construir rankings multimodales. Adicionalmente se propone un método kernelizado para la proyección de datos a un espacio semántico latente usando la información de las etiquetas como método de supervisión para construir índice multimodal que integra los datos multimodales y la información de las etiquetas; este método puede proyectar los datos a tres diferentes espacios semánticos donde varias configuraciones de búsqueda y recuperación de información pueden ser aplicadas. El sistema y el método propuestos fueron evaluados en un conjunto de datos compuesto por casos médicos, donde cada caso consta de una imagen de tejido prostático, un reporte de texto del patólogo y un valor de Gleason score como etiqueta de supervisión. Combinando la información multimodal y la información en las etiquetas se generó un índice multimodal que se utilizó para realizar la tarea de búsqueda y recuperación de información por contenido obteniendo resultados sobresalientes. Las proyecciones no-lineales permiten al modelo una mayor flexibilidad y capacidad de representación. Sin embargo calcular estas proyecciones no-lineales en un conjunto de datos enorme es computacionalmente costoso, para reducir este costo y habilitar el modelo para procesar datos a gran escala, la técnica del budget fue utilizada, mostrando un buen compromiso entre efectividad y velocidad.Multimodal information retrieval is an information retrieval sub-task where queries and database target elements are composed of several modalities or views. A modality is a representation of complex phenomena, captured and measured by different sensors or information sources, each one encodes some information about it. Each modality representation contains complementary and shared information about the phenomenon of interest, this additional information can be used to improve the information retrieval process. Several methods have been developed to take advantage of additional information distributed across different modalities. Some of them exploit statistical properties in multimodal data to find correlations and implicit relationships, others learn heterogeneous distance functions, and others learn linear and non-linear projections that transform data from the original input space to a common latent semantic space where different modalities are comparable. In spite of the attention dedicated to this issue, multimodal information retrieval is still an open problem. This thesis presents a multimodal information retrieval system designed to learn several mapping functions to transform multimodal data to a latent semantic space, where different modalities are combined and can be compared to build a multimodal ranking and perform a multimodal information retrieval task. Additionally, a multimodal kernelized latent semantic embedding method is proposed to construct a supervised multimodal index, integrating multimodal data and label supervision. This method can perform mappings to three different spaces where some information retrieval task setups can be performed. The proposed system and method were evaluated in a multimodal medical case-based retrieval task where data is composed of whole-slide images of prostate tissue samples, pathologist’s text report and Gleason score as a supervised label. Multimodal data and labels were combined to produce a multimodal index. This index was used to retrieve multimodal information and achieves outstanding results compared with previous works on this topic. Non-linear mappings provide more flexibility and representation capacity to the proposed model. However, constructing the non-linear mapping in a large dataset using kernel methods can be computationally costly. To reduce the cost and allow large scale applications, the budget technique was introduced, showing good performance between speed and effectiveness.COLCIENCIASJóvenes investigadores 761/2016Línea de investigación: Ciencias de la computaciónMaestrí

    Towards a Visual-Language Foundation Model for Computational Pathology

    Full text link
    The accelerated adoption of digital pathology and advances in deep learning have enabled the development of powerful models for various pathology tasks across a diverse array of diseases and patient cohorts. However, model training is often difficult due to label scarcity in the medical domain and the model's usage is limited by the specific task and disease for which it is trained. Additionally, most models in histopathology leverage only image data, a stark contrast to how humans teach each other and reason about histopathologic entities. We introduce CONtrastive learning from Captions for Histopathology (CONCH), a visual-language foundation model developed using diverse sources of histopathology images, biomedical text, and notably over 1.17 million image-caption pairs via task-agnostic pretraining. Evaluated on a suite of 13 diverse benchmarks, CONCH can be transferred to a wide range of downstream tasks involving either or both histopathology images and text, achieving state-of-the-art performance on histology image classification, segmentation, captioning, text-to-image and image-to-text retrieval. CONCH represents a substantial leap over concurrent visual-language pretrained systems for histopathology, with the potential to directly facilitate a wide array of machine learning-based workflows requiring minimal or no further supervised fine-tuning

    Radon Projections as Image Descriptors for Content-Based Retrieval of Medical Images

    Get PDF
    Clinical analysis and medical diagnosis of diverse diseases adopt medical imaging techniques to empower specialists to perform their tasks by visualizing internal body organs and tissues for classifying and treating diseases at an early stage. Content-Based Image Retrieval (CBIR) systems are a set of computer vision techniques to retrieve similar images from a large database based on proper image representations. Particularly in radiology and histopathology, CBIR is a promising approach to effectively screen, understand, and retrieve images with similar level of semantic descriptions from a database of previously diagnosed cases to provide physicians with reliable assistance for diagnosis, treatment planning and research. Over the past decade, the development of CBIR systems in medical imaging has expedited due to the increase in digitized modalities, an increase in computational efficiency (e.g., availability of GPUs), and progress in algorithm development in computer vision and artificial intelligence. Hence, medical specialists may use CBIR prototypes to query similar cases from a large image database based solely on the image content (and no text). Understanding the semantics of an image requires an expressive descriptor that has the ability to capture and to represent unique and invariant features of an image. Radon transform, one of the oldest techniques widely used in medical imaging, can capture the shape of organs in form of a one-dimensional histogram by projecting parallel rays through a two-dimensional object of concern at a specific angle. In this work, the Radon transform is re-designed to (i) extract features and (ii) generate a descriptor for content-based retrieval of medical images. Radon transform is applied to feed a deep neural network instead of raw images in order to improve the generalization of the network. Specifically, the framework is composed of providing Radon projections of an image to a deep autoencoder, from which the deepest layer is isolated and fed into a multi-layer perceptron for classification. This approach enables the network to (a) train much faster as the Radon projections are computationally inexpensive compared to raw input images, and (b) perform more accurately as Radon projections can make more pronounced and salient features to the network compared to raw images. This framework is validated on a publicly available radiography data set called "Image Retrieval in Medical Applications" (IRMA), consisting of 12,677 train and 1,733 test images, for which an classification accuracy of approximately 82% is achieved, outperforming all autoencoder strategies reported on the Image Retrieval in Medical Applications (IRMA) dataset. The classification accuracy is calculated by dividing the total IRMA error, a calculation outlined by the authors of the data set, with the total number of test images. Finally, a compact handcrafted image descriptor based on Radon transform was designed in this work that is called "Forming Local Intersections of Projections" (FLIP). The FLIP descriptor has been designed, through numerous experiments, for representing histopathology images. The FLIP descriptor is based on Radon transform wherein parallel projections are applied in a local 3x3 neighborhoods with 2 pixel overlap of gray-level images (staining of histopathology images is ignored). Using four equidistant projection directions in each window, the characteristics of the neighborhood is quantified by taking an element-wise minimum between each adjacent projection in each window. Thereafter, the FLIP histogram (descriptor) for each image is constructed. A multi-resolution FLIP (mFLIP) scheme is also proposed which is observed to outperform many state-of-the-art methods, among others deep features, when applied on the histopathology data set KIMIA Path24. Experiments show a total classification accuracy of approximately 72% using SVM classification, which surpasses the current benchmark of approximately 66% on the KIMIA Path24 data set

    Content-based Image Retrieval of Gigapixel Histopathology Scans: A Comparative Study of Convolution Neural Network, Local Binary Pattern, and Bag of visual Words

    Get PDF
    The state-of-the-art image analysis algorithms offer a unique opportunity to extract semantically meaningful features from medical images. The advantage of this approach is automation in terms of content-based image retrieval (CBIR) of medical images. Such an automation leads to more reliable diagnostic decisions by clinicians as the direct beneficiary of these algorithms. Digital pathology (DP), or whole slide imaging (WSI), is a new avenue for image-based diagnosis in histopathology. WSI technology enables the digitization of traditional glass slides to ultra high-resolution digital images (or digital slides). Digital slides are more commonly used for CBIR research than other modalities of medical images due to their enormous size, increasing adoption among hospitals, and their various benefits offered to pathologists (e.g., digital telepathology). Pathology laboratories are under constant pressure to meet increasingly complex demands from hospitals. Many diseases (such as cancer) continue to grow which creates a pressing need to utilize existing innovative machine learning schemes to harness the knowledge contained in digital slides for more effective and efficient histopathology. This thesis provides a qualitative assessment of three popular image analysis techniques, namely Local Binary Pattern (LBP), Bag of visual Words (BoW), and Convolution Neural Networks (CNN) in their abilities to extract the discriminative features from gigapixel histopathology images. LBP and BoW are well-established techniques used in different image analysis problems. Over the last 5-10 years, CNN has become a frequent research topic in computer vision. CNN offers a domain-agnostic approach for the automatic extraction of discriminative image features, used for either classification or retrieval purposes. Therefore, it is imperative that this thesis gives more emphasis to CNN as a viable approach for the analysis of DP images. A new dataset, Kimia Path24 is specially designed and developed to facilitate the research in classification and CBIR of DP images. Kimia Path24 is used to measure the quality of image features extracted from LBP, BoW, and CNN; resulting in the best accuracy values of 41.33%, 54.67%, and 56.98% respectively. The results are somewhat surprising, suggesting that the handcrafted feature extraction algorithm, i.e., LBP can reach very close to the deep features extracted from CNN. It is unanticipated, considering that CNN requires much more computational resources and efforts for designing and fine-tuning. One of the conclusions is that CNN needs to be trained for the problem with a large number of training images to realize its comprehensive benefits. However, there are many situations where large, balanced, and the labeled dataset is not available; one such area is histopathology at present
    corecore