752 research outputs found
Towards a Visual-Language Foundation Model for Computational Pathology
The accelerated adoption of digital pathology and advances in deep learning
have enabled the development of powerful models for various pathology tasks
across a diverse array of diseases and patient cohorts. However, model training
is often difficult due to label scarcity in the medical domain and the model's
usage is limited by the specific task and disease for which it is trained.
Additionally, most models in histopathology leverage only image data, a stark
contrast to how humans teach each other and reason about histopathologic
entities. We introduce CONtrastive learning from Captions for Histopathology
(CONCH), a visual-language foundation model developed using diverse sources of
histopathology images, biomedical text, and notably over 1.17 million
image-caption pairs via task-agnostic pretraining. Evaluated on a suite of 13
diverse benchmarks, CONCH can be transferred to a wide range of downstream
tasks involving either or both histopathology images and text, achieving
state-of-the-art performance on histology image classification, segmentation,
captioning, text-to-image and image-to-text retrieval. CONCH represents a
substantial leap over concurrent visual-language pretrained systems for
histopathology, with the potential to directly facilitate a wide array of
machine learning-based workflows requiring minimal or no further supervised
fine-tuning
Multimodal non-linear latent semantic method for information retrieval
La búsqueda y recuperación de datos multimodales es una importante tarea dentro del campo de búsqueda y recuperación de información, donde las consultas y los elementos de la base de datos objetivo están representados por un conjunto de modalidades, donde cada una de ellas captura un aspecto de un fenómeno de interés. Cada modalidad contiene información complementaria y común a otras modalidades. Con el fin de tomar ventaja de
la información adicional distribuida a través de las distintas modalidades han sido desarrollados muchos algoritmos y métodos que utilizan las propiedades estadísticas en los datos multimodales para encontrar correlaciones implícitas, otros aprenden a calcular distancias heterogéneas, otros métodos aprenden a proyectar los datos desde el espacio de entrada hasta un espacio semántico común, donde las diferentes modalidades son comparables y se puede construir un ranking a partir de ellas. En esta tesis se presenta el diseño de un sistema para la búsqueda y recuperación de información multimodal que aprende varias proyecciones no lineales a espacios semánticos
latentes donde las distintas modalidades son representadas en conjunto y es posible realizar comparaciones y medidas de similitud para construir rankings multimodales. Adicionalmente se propone un método kernelizado para la proyección de datos a un espacio semántico latente usando la información de las etiquetas como método de supervisión para construir
índice multimodal que integra los datos multimodales y la información de las etiquetas; este método puede proyectar los datos a tres diferentes espacios semánticos donde varias configuraciones de búsqueda y recuperación de información pueden ser aplicadas. El sistema y el método propuestos fueron evaluados en un conjunto de datos compuesto por casos médicos, donde cada caso consta de una imagen de tejido prostático, un reporte de
texto del patólogo y un valor de Gleason score como etiqueta de supervisión. Combinando la información multimodal y la información en las etiquetas se generó un índice multimodal
que se utilizó para realizar la tarea de búsqueda y recuperación de información por contenido obteniendo resultados sobresalientes. Las proyecciones no-lineales permiten al modelo una mayor flexibilidad y capacidad de representación. Sin embargo calcular estas proyecciones no-lineales en un conjunto de datos enorme es computacionalmente costoso, para reducir este costo y habilitar el modelo para procesar datos a gran escala, la técnica del budget fue utilizada, mostrando un buen compromiso entre efectividad y velocidad.Multimodal information retrieval is an information retrieval sub-task where queries and database target elements are composed of several modalities or views. A modality is a representation of complex phenomena, captured and measured by different sensors or information sources, each one encodes some information about it. Each modality representation contains complementary and shared information about the phenomenon of interest,
this additional information can be used to improve the information retrieval process. Several methods have been developed to take advantage of additional information distributed across different modalities. Some of them exploit statistical properties in multimodal data to find correlations and implicit relationships, others learn heterogeneous distance functions, and others learn linear and non-linear projections that transform data from the original input space to a common latent semantic space where different modalities are comparable. In spite of the attention dedicated to this issue, multimodal information retrieval is still an open problem. This thesis presents a multimodal information retrieval system designed to learn several mapping functions to transform multimodal data to a latent semantic space, where different modalities are combined and can be compared to build a multimodal ranking and perform a multimodal information retrieval task. Additionally, a multimodal kernelized latent semantic embedding method is proposed to construct a supervised multimodal index, integrating
multimodal data and label supervision. This method can perform mappings to three different spaces where some information retrieval task setups can be performed.
The proposed system and method were evaluated in a multimodal medical case-based retrieval task where data is composed of whole-slide images of prostate tissue samples, pathologist’s text report and Gleason score as a supervised label. Multimodal data and labels were combined to produce a multimodal index. This index was used to retrieve multimodal information and achieves outstanding results compared with previous works on this topic. Non-linear mappings provide more flexibility and representation capacity to the proposed model. However, constructing the non-linear mapping in a large dataset using kernel methods can be computationally costly. To reduce the cost and allow large scale applications, the budget technique was introduced, showing good performance between speed and effectiveness.COLCIENCIASJóvenes investigadores 761/2016Línea de investigación: Ciencias de la computaciónMaestrí
Quilt-1M: One Million Image-Text Pairs for Histopathology
Recent accelerations in multi-modal applications have been made possible with
the plethora of image and text data available online. However, the scarcity of
analogous data in the medical field, specifically in histopathology, has halted
comparable progress. To enable similar representation learning for
histopathology, we turn to YouTube, an untapped resource of videos, offering
hours of valuable educational histopathology videos from expert
clinicians. From YouTube, we curate Quilt: a large-scale vision-language
dataset consisting of image and text pairs. Quilt was automatically
curated using a mixture of models, including large language models, handcrafted
algorithms, human knowledge databases, and automatic speech recognition. In
comparison, the most comprehensive datasets curated for histopathology amass
only around K samples. We combine Quilt with datasets from other sources,
including Twitter, research papers, and the internet in general, to create an
even larger dataset: Quilt-1M, with M paired image-text samples, marking it
as the largest vision-language histopathology dataset to date. We demonstrate
the value of Quilt-1M by fine-tuning a pre-trained CLIP model. Our model
outperforms state-of-the-art models on both zero-shot and linear probing tasks
for classifying new histopathology images across diverse patch-level
datasets of different sub-pathologies and cross-modal retrieval tasks
Quantitative analysis with machine learning models for multi-parametric brain imaging data
Gliomas are considered to be the most common primary adult malignant brain tumor. With the dramatic increases in computational power and improvements in image analysis algorithms, computer-aided medical image analysis has been introduced into clinical applications. Precision tumor grading and genotyping play an indispensable role in clinical diagnosis, treatment and prognosis. Gliomas diagnostic procedures include histopathological imaging tests, molecular imaging scans and tumor grading. Pathologic review of tumor morphology in histologic sections is the traditional method for cancer classification and grading, yet human study has limitations that can result in low reproducibility and inter-observer agreement. Compared with histopathological images, Magnetic resonance (MR) imaging present the different structure and functional features, which might serve as noninvasive surrogates for tumor genotypes. Therefore, computer-aided image analysis has been adopted in clinical application, which might partially overcome these shortcomings due to its capacity to quantitatively and reproducibly measure multilevel features on multi-parametric medical information. Imaging features obtained from a single modal image do not fully represent the disease, so quantitative imaging features, including morphological, structural, cellular and molecular level features, derived from multi-modality medical images should be integrated into computer-aided medical image analysis. The image quality differentiation between multi-modality images is a challenge in the field of computer-aided medical image analysis. In this thesis, we aim to integrate the quantitative imaging data obtained from multiple modalities into mathematical models of tumor prediction response to achieve additional insights into practical predictive value. Our major contributions in this thesis are: 1. Firstly, to resolve the imaging quality difference and observer-dependent in histological image diagnosis, we proposed an automated machine-learning brain tumor-grading platform to investigate contributions of multi-parameters from multimodal data including imaging parameters or features from Whole Slide Images (WSI) and the proliferation marker KI-67. For each WSI, we extract both visual parameters such as morphology parameters and sub-visual parameters including first-order and second-order features. A quantitative interpretable machine learning approach (Local Interpretable Model-Agnostic Explanations) was followed to measure the contribution of features for single case. Most grading systems based on machine learning models are considered “black boxes,” whereas with this system the clinically trusted reasoning could be revealed. The quantitative analysis and explanation may assist clinicians to better understand the disease and accordingly to choose optimal treatments for improving clinical outcomes. 2. Based on the automated brain tumor-grading platform we propose, multimodal Magnetic Resonance Images (MRIs) have been introduced in our research. A new imaging–tissue correlation based approach called RA-PA-Thomics was proposed to predict the IDH genotype. Inspired by the concept of image fusion, we integrate multimodal MRIs and the scans of histopathological images for indirect, fast, and cost saving IDH genotyping. The proposed model has been verified by multiple evaluation criteria for the integrated data set and compared to the results in the prior art. The experimental data set includes public data sets and image information from two hospitals. Experimental results indicate that the model provided improves the accuracy of glioma grading and genotyping
Foundational Models in Medical Imaging: A Comprehensive Survey and Future Vision
Foundation models, large-scale, pre-trained deep-learning models adapted to a
wide range of downstream tasks have gained significant interest lately in
various deep-learning problems undergoing a paradigm shift with the rise of
these models. Trained on large-scale dataset to bridge the gap between
different modalities, foundation models facilitate contextual reasoning,
generalization, and prompt capabilities at test time. The predictions of these
models can be adjusted for new tasks by augmenting the model input with
task-specific hints called prompts without requiring extensive labeled data and
retraining. Capitalizing on the advances in computer vision, medical imaging
has also marked a growing interest in these models. To assist researchers in
navigating this direction, this survey intends to provide a comprehensive
overview of foundation models in the domain of medical imaging. Specifically,
we initiate our exploration by providing an exposition of the fundamental
concepts forming the basis of foundation models. Subsequently, we offer a
methodical taxonomy of foundation models within the medical domain, proposing a
classification system primarily structured around training strategies, while
also incorporating additional facets such as application domains, imaging
modalities, specific organs of interest, and the algorithms integral to these
models. Furthermore, we emphasize the practical use case of some selected
approaches and then discuss the opportunities, applications, and future
directions of these large-scale pre-trained models, for analyzing medical
images. In the same vein, we address the prevailing challenges and research
pathways associated with foundational models in medical imaging. These
encompass the areas of interpretability, data management, computational
requirements, and the nuanced issue of contextual comprehension.Comment: The paper is currently in the process of being prepared for
submission to MI
Rotation-Agnostic Image Representation Learning for Digital Pathology
This paper addresses complex challenges in histopathological image analysis
through three key contributions. Firstly, it introduces a fast patch selection
method, FPS, for whole-slide image (WSI) analysis, significantly reducing
computational cost while maintaining accuracy. Secondly, it presents PathDino,
a lightweight histopathology feature extractor with a minimal configuration of
five Transformer blocks and only 9 million parameters, markedly fewer than
alternatives. Thirdly, it introduces a rotation-agnostic representation
learning paradigm using self-supervised learning, effectively mitigating
overfitting. We also show that our compact model outperforms existing
state-of-the-art histopathology-specific vision transformers on 12 diverse
datasets, including both internal datasets spanning four sites (breast, liver,
skin, and colorectal) and seven public datasets (PANDA, CAMELYON16, BRACS,
DigestPath, Kather, PanNuke, and WSSS4LUAD). Notably, even with a training
dataset of 6 million histopathology patches from The Cancer Genome Atlas
(TCGA), our approach demonstrates an average 8.5% improvement in patch-level
majority vote performance. These contributions provide a robust framework for
enhancing image analysis in digital pathology, rigorously validated through
extensive evaluation. Project Page: https://rhazeslab.github.io/PathDino-Page/Comment: 23 pages, 10 figures, 18 tables. Histopathological Image Analysi
Feature Fusion of Raman Chemical Imaging and Digital Histopathology using Machine Learning for Prostate Cancer Detection
The diagnosis of prostate cancer is challenging due to the heterogeneity of its presentations, leading to the over diagnosis and treatment of non-clinically important disease. Accurate diagnosis can directly benefit a patient’s quality of life and prognosis. Towards addressing this issue, we present a learning model for the automatic identification of prostate cancer. While many prostate cancer studies have adopted Raman spectroscopy approaches, none have utilised the combination of Raman Chemical Imaging (RCI) and other imaging modalities. This study uses multimodal images formed from stained Digital Histopathology (DP) and unstained RCI. The approach was developed and tested on a set of 178 clinical samples from 32 patients, containing a range of non-cancerous, Gleason grade 3 (G3) and grade 4 (G4) tissue microarray samples. For each histological sample, there is a pathologist labelled DP - RCI image pair. The hypothesis tested was whether multimodal image models can outperform single modality baseline models in terms of diagnostic accuracy. Binary non-cancer/cancer models and the more challenging G3/G4 differentiation were investigated. Regarding G3/G4 classification, the multimodal approach achieved a sensitivity of 73.8% and specificity of 88.1% while the baseline DP model showed a sensitivity and specificity of 54.1% and 84.7% respectively. The multimodal approach demonstrated a statistically significant 12.7% AUC advantage over the baseline with a value of 85.8% compared to 73.1%, also outperforming models based solely on RCI and median Raman spectra. Feature fusion of DP and RCI does not improve the more trivial task of tumour identification but does deliver an observed advantage in G3/G4 discrimination. Building on these promising findings, future work could include the acquisition of larger datasets for enhanced model generalization
Feature Fusion of Raman Chemical Imaging and Digital Histopathology using Machine Learning for Prostate Cancer Detection
The diagnosis of prostate cancer is challenging due to the heterogeneity of
its presentations, leading to the over diagnosis and treatment of
non-clinically important disease. Accurate diagnosis can directly benefit a
patient's quality of life and prognosis. Towards addressing this issue, we
present a learning model for the automatic identification of prostate cancer.
While many prostate cancer studies have adopted Raman spectroscopy approaches,
none have utilised the combination of Raman Chemical Imaging (RCI) and other
imaging modalities. This study uses multimodal images formed from stained
Digital Histopathology (DP) and unstained RCI. The approach was developed and
tested on a set of 178 clinical samples from 32 patients, containing a range of
non-cancerous, Gleason grade 3 (G3) and grade 4 (G4) tissue microarray samples.
For each histological sample, there is a pathologist labelled DP - RCI image
pair. The hypothesis tested was whether multimodal image models can outperform
single modality baseline models in terms of diagnostic accuracy. Binary
non-cancer/cancer models and the more challenging G3/G4 differentiation were
investigated. Regarding G3/G4 classification, the multimodal approach achieved
a sensitivity of 73.8% and specificity of 88.1% while the baseline DP model
showed a sensitivity and specificity of 54.1% and 84.7% respectively. The
multimodal approach demonstrated a statistically significant 12.7% AUC
advantage over the baseline with a value of 85.8% compared to 73.1%, also
outperforming models based solely on RCI and median Raman spectra. Feature
fusion of DP and RCI does not improve the more trivial task of tumour
identification but does deliver an observed advantage in G3/G4 discrimination.
Building on these promising findings, future work could include the acquisition
of larger datasets for enhanced model generalization.Comment: 19 pages, 8 tables, 18 figure
- …