195 research outputs found

    HistoMIL: A Python package for training multiple instance learning models on histopathology slides

    Get PDF
    Hematoxylin and eosin (H&E) stained slides are widely used in disease diagnosis. Remarkable advances in deep learning have made it possible to detect complex molecular patterns in these histopathology slides, suggesting automated approaches could help inform pathologists’ decisions. Multiple instance learning (MIL) algorithms have shown promise in this context, outperforming transfer learning (TL) methods for various tasks, but their implementation and usage remains complex. We introduce HistoMIL, a Python package designed to streamline the implementation, training and inference process of MIL-based algorithms for computational pathologists and biomedical researchers. It integrates a self-supervised learning module for feature encoding, and a full pipeline encompassing TL and three MIL algorithms: ABMIL, DSMIL, and TransMIL. The PyTorch Lightning framework enables effortless customization and algorithm implementation. We illustrate HistoMIL's capabilities by building predictive models for 2,487 cancer hallmark genes on breast cancer histology slides, achieving AUROC performances of up to 85%

    Visualization and analysis strategies for dynamic gene-phenotype relationships and their biological interpretation

    Get PDF
    The complexity of biological systems is one of their most fascinating and, at the same time, most cryptic aspects. Despite the progress of technology that has enabled measuring biological parameters at deeper levels of detail in time and space, the ability to decipher meaning from these large amounts of heterogeneous data is limited. In order to address this challenge, both analysis and visualization strategies need to be adapted to handle this complexity. At system-wide level, we are still limited in our ability to infer genetic and environmental causes of disease, or consistently compare and link phenotypes. Moreover, despite the increasing availability of time-resolved experiments, the temporal context is often lost. In my thesis, I explored a series of analysis and visualization strategies to compare and connect dynamic phenotypic outcomes of cellular perturbations in a genetic and network context. More specifically, in the first part of my thesis, I focused on the cell cycle as one of the best examples of a complex, highly dynamic process. I applied analysis and data integration methods to investigate phenotypes derived from cell division failure. I examined how such phenotypes may arise as a result of perturbations in the underlying network. To this purpose, I investigated the role of short structural elements at binding interfaces of proteins, called linear motifs, in shaping the cell division network. I assessed their association to different phenotypes, in the context of local perturbations and of disease. This analysis enabled a more detailed understanding of the regulatory mechanisms beyond the malfunctioning of cell division processes, but the ability to compare phenotypes and track their evolution was limited. Exploring large-scale, time-resolved phenotypic screens is still a bottleneck, especially in the visualization area. To help address this question, in the subsequent parts of the thesis I proposed novel visualization approaches that would leverage pattern discovery in such heterogeneous, dynamic datasets and enable the generation of new hypotheses. First, I extended an existing visualization tool, Arena3D, to enable the comparison of phenotypes in a genetic and network context. I used this tool to continue the exploration of phenotype-wide differences between outcomes of gene function suppression within mitosis. I also applied it to an investigation of systemic changes in the network of embryonic stem cell fate determinants upon downregulation of the pluripotency factor Nanog. Second, time-resolved tracking of phenotypes opens up new possibilities in exploring how genetic and phenotypic connections evolve through time, an aspect that is largely missing in the visualization area. I developed a novel visualization approach that uses 2D/3D projections to enable the discovery of genetic determinants linking phenotypes through time. I used the resulting tool, PhenoTimer, to investigate the patterns of transitions between phenotypes in cell populations upon perturbation of cell division and the timing of cancer-relevant transcriptional events. I showed the potential of discovering drug synergistic effects by visual mapping of similarities in their mechanisms of action. Overall, these approaches help clarify aspects of the consequences of cell division failure and provide general visualization frameworks that should be of interest to the wider scientific community, for use in the analysis of multidimensional phenotypic screens

    Pan-Cancer Survey of Tumor Mass Dormancy and Underlying Mutational Processes

    Get PDF
    Tumor mass dormancy is the key intermediate step between immune surveillance and cancer progression, yet due to its transitory nature it has been difficult to capture and characterize. Little is understood of its prevalence across cancer types and of the mutational background that may favor such a state. While this balance is finely tuned internally by the equilibrium between cell proliferation and cell death, the main external factors contributing to tumor mass dormancy are immunological and angiogenic. To understand the genomic and cellular context in which tumor mass dormancy may develop, we comprehensively profiled signals of immune and angiogenic dormancy in 9,631 cancers from the Cancer Genome Atlas and linked them to tumor mutagenesis. We find evidence for immunological and angiogenic dormancy-like signals in 16.5% of bulk sequenced tumors, with a frequency of up to 33% in certain tissues. Mutations in the CASP8 and HRAS oncogenes were positively selected in dormant tumors, suggesting an evolutionary pressure for controlling cell growth/apoptosis signals. By surveying the mutational damage patterns left in the genome by known cancer risk factors, we found that aging-induced mutations were relatively depleted in these tumors, while patterns of smoking and defective base excision repair were linked with increased tumor mass dormancy. Furthermore, we identified a link between APOBEC mutagenesis and dormancy, which comes in conjunction with immune exhaustion and may partly depend on the expression of the angiogenesis regulator PLG as well as interferon and chemokine signals. Tumor mass dormancy also appeared to be impaired in hypoxic conditions in the majority of cancers. The microenvironment of dormant cancers was enriched in cytotoxic and regulatory T cells, as expected, but also in macrophages and showed a reduction in inflammatory Th17 signals. Finally, tumor mass dormancy was linked with improved patient survival outcomes. Our analysis sheds light onto the complex interplay between dormancy, exhaustion, APOBEC activity and hypoxia, and sets directions for future mechanistic explorations

    Multi-scale characterisation of homologous recombination deficiency in breast cancer

    Get PDF
    BACKGROUND: Homologous recombination is a robust, broadly error-free mechanism of double-strand break repair, and deficiencies lead to PARP inhibitor sensitivity. Patients displaying homologous recombination deficiency can be identified using 'mutational signatures'. However, these patterns are difficult to reliably infer from exome sequencing. Additionally, as mutational signatures are a historical record of mutagenic processes, this limits their utility in describing the current status of a tumour. METHODS: We apply two methods for characterising homologous recombination deficiency in breast cancer to explore the features and heterogeneity associated with this phenotype. We develop a likelihood-based method which leverages small insertions and deletions for high-confidence classification of homologous recombination deficiency for exome-sequenced breast cancers. We then use multinomial elastic net regression modelling to develop a transcriptional signature of heterogeneous homologous recombination deficiency. This signature is then applied to single-cell RNA-sequenced breast cancer cohorts enabling analysis of homologous recombination deficiency heterogeneity and differential patterns of tumour microenvironment interactivity. RESULTS: We demonstrate that the inclusion of indel events, even at low levels, improves homologous recombination deficiency classification. Whilst BRCA-positive homologous recombination deficient samples display strong similarities to those harbouring BRCA1/2 defects, they appear to deviate in microenvironmental features such as hypoxic signalling. We then present a 228-gene transcriptional signature which simultaneously characterises homologous recombination deficiency and BRCA1/2-defect status, and is associated with PARP inhibitor response. Finally, we show that this signature is applicable to single-cell transcriptomics data and predict that these cells present a distinct milieu of interactions with their microenvironment compared to their homologous recombination proficient counterparts, typified by a decreased cancer cell response to TNFα signalling. CONCLUSIONS: We apply multi-scale approaches to characterise homologous recombination deficiency in breast cancer through the development of mutational and transcriptional signatures. We demonstrate how indels can improve homologous recombination deficiency classification in exome-sequenced breast cancers. Additionally, we demonstrate the heterogeneity of homologous recombination deficiency, especially in relation to BRCA1/2-defect status, and show that indications of this feature can be captured at a single-cell level, enabling further investigations into interactions between DNA repair deficient cells and their tumour microenvironment

    A Comparison of Low Read Depth QuantSeq 3 ' Sequencing to Total RNA-Seq in FUS Mutant Mice

    Get PDF
    Transcriptomics is a developing field with new methods of analysis being produced which may hold advantages in price, accuracy, or information output. QuantSeq is a form of 3′ sequencing produced by Lexogen which aims to obtain similar gene-expression information to RNA-seq with significantly fewer reads, and therefore at a lower cost. QuantSeq is also able to provide information on differential polyadenylation. We applied both QuantSeq at low read depth and total RNA-seq to the same two sets of mouse spinal cord RNAs, each comprised by four controls and four mutants related to the neurodegenerative disease amyotrophic lateral sclerosis. We found substantial differences in which genes were found to be significantly differentially expressed by the two methods. Some of this difference likely due to the difference in number of reads between our QuantSeq and RNA-seq data. Other sources of difference can be explained by the differences in the way the two methods handle genes with different primary transcript lengths and how likely each method is to find a gene to be differentially expressed at different levels of overall gene expression. This work highlights how different methods aiming to assess expression difference can lead to different results

    Genomic and microenvironmental heterogeneity shaping epithelial-to-mesenchymal trajectories in cancer

    Get PDF
    The epithelial to mesenchymal transition (EMT) is a key cellular process underlying cancer progression, with multiple intermediate states whose molecular hallmarks remain poorly characterised. To fill this gap, we present a method to robustly evaluate EMT transformation in individual tumours based on transcriptomic signals. We apply this approach to explore EMT trajectories in 7180 tumours of epithelial origin and identify three macro-states with prognostic and therapeutic value, attributable to epithelial, hybrid E/M and mesenchymal phenotypes. We show that the hybrid state is relatively stable and linked with increased aneuploidy. We further employ spatial transcriptomics and single cell datasets to explore the spatial heterogeneity of EMT transformation and distinct interaction patterns with cytotoxic, NK cells and fibroblasts in the tumour microenvironment. Additionally, we provide a catalogue of genomic events underlying distinct evolutionary constraints on EMT transformation. This study sheds light on the aetiology of distinct stages along the EMT trajectory, and highlights broader genomic and environmental hallmarks shaping the mesenchymal transformation of primary tumours

    Visualizing time-related data in biology, a review

    Get PDF
    Time is of the essence, also in biology. Monitoring disease progression or timing developmental defects are key aspects in the process of drug discovery and therapy trial. Furthermore, before deciphering the course of evolution of these complex processes, we need an understanding of the basic dynamics of biological phenomena that are often strictly time-regulated (e.g. circadian rhythms). With the advances in technologies able to measure timing effects and dynamics of regulatory aspects, visualization and analysis tools try to keep up the pace with the new challenge. Beyond the classical timeline plots, notable attempts at more involved temporal interpretation have been made in the recent years, but awareness of the available resources is still limited within the scientific community. Here we review some of the advances in biological visualization of time-driven processes and look at how they allow analyzing data now and in the future

    How neurons migrate: a dynamic in-silico model of neuronal migration in the developing cortex

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Neuronal migration, the process by which neurons migrate from their place of origin to their final position in the brain, is a central process for normal brain development and function. Advances in experimental techniques have revealed much about many of the molecular components involved in this process. Notwithstanding these advances, how the molecular machinery works together to govern the migration process has yet to be fully understood. Here we present a computational model of neuronal migration, in which four key molecular entities, Lis1, DCX, Reelin and GABA, form a molecular program that mediates the migration process.</p> <p>Results</p> <p>The model simulated the dynamic migration process, consistent with in-vivo observations of morphological, cellular and population-level phenomena. Specifically, the model reproduced migration phases, cellular dynamics and population distributions that concur with experimental observations in normal neuronal development. We tested the model under reduced activity of Lis1 and DCX and found an aberrant development similar to observations in Lis1 and DCX silencing expression experiments. Analysis of the model gave rise to unforeseen insights that could guide future experimental study. Specifically: (1) the model revealed the possibility that under conditions of Lis1 reduced expression, neurons experience an oscillatory neuron-glial association prior to the multipolar stage; and (2) we hypothesized that observed morphology variations in rats and mice may be explained by a single difference in the way that Lis1 and DCX stimulate bipolar motility. From this we make the following predictions: (1) under reduced Lis1 and enhanced DCX expression, we predict a reduced bipolar migration in rats, and (2) under enhanced DCX expression in mice we predict a normal or a higher bipolar migration.</p> <p>Conclusions</p> <p>We present here a system-wide computational model of neuronal migration that integrates theory and data within a precise, testable framework. Our model accounts for a range of observable behaviors and affords a computational framework to study aspects of neuronal migration as a complex process that is driven by a relatively simple molecular program. Analysis of the model generated new hypotheses and yet unobserved phenomena that may guide future experimental studies. This paper thus reports a first step toward a comprehensive in-silico model of neuronal migration.</p

    Simulation-based model selection for dynamical systems in systems and population biology

    Get PDF
    Computer simulations have become an important tool across the biomedical sciences and beyond. For many important problems several different models or hypotheses exist and choosing which one best describes reality or observed data is not straightforward. We therefore require suitable statistical tools that allow us to choose rationally between different mechanistic models of e.g. signal transduction or gene regulation networks. This is particularly challenging in systems biology where only a small number of molecular species can be assayed at any given time and all measurements are subject to measurement uncertainty. Here we develop such a model selection framework based on approximate Bayesian computation and employing sequential Monte Carlo sampling. We show that our approach can be applied across a wide range of biological scenarios, and we illustrate its use on real data describing influenza dynamics and the JAK-STAT signalling pathway. Bayesian model selection strikes a balance between the complexity of the simulation models and their ability to describe observed data. The present approach enables us to employ the whole formal apparatus to any system that can be (efficiently) simulated, even when exact likelihoods are computationally intractable.Comment: This article is in press in Bioinformatics, 2009. Advance Access is available on Bioinformatics webpag
    corecore