4,093 research outputs found

    Development of a simple artificial intelligence method to accurately subtype breast cancers based on gene expression barcodes

    Get PDF
    >Magister Scientiae - MScINTRODUCTION: Breast cancer is a highly heterogeneous disease. The complexity of achieving an accurate diagnosis and an effective treatment regimen lies within this heterogeneity. Subtypes of the disease are not simply molecular, i.e. hormone receptor over-expression or absence, but the tumour itself is heterogeneous in terms of tissue of origin, metastases, and histopathological variability. Accurate tumour classification vastly improves treatment decisions, patient outcomes and 5-year survival rates. Gene expression studies aided by transcriptomic technologies such as microarrays and next-generation sequencing (e.g. RNA-Sequencing) have aided oncology researcher and clinician understanding of the complex molecular portraits of malignant breast tumours. Mechanisms governing cancers, which include tumorigenesis, gene fusions, gene over-expression and suppression, cellular process and pathway involvementinvolvement, have been elucidated through comprehensive analyses of the cancer transcriptome. Over the past 20 years, gene expression signatures, discovered with both microarray and RNA-Seq have reached clinical and commercial application through the development of tests such as Mammaprint®, OncotypeDX®, and FoundationOne® CDx, all which focus on chemotherapy sensitivity, prediction of cancer recurrence, and tumour mutational level. The Gene Expression Barcode (GExB) algorithm was developed to allow for easy interpretation and integration of microarray data through data normalization with frozen RMA (fRMA) preprocessing and conversion of relative gene expression to a sequence of 1's and 0's. Unfortunately, the algorithm has not yet been developed for RNA-Seq data. However, implementation of the GExB with feature-selection would contribute to a machine-learning based robust breast cancer and subtype classifier. METHODOLOGY: For microarray data, we applied the GExB algorithm to generate barcodes for normal breast and breast tumour samples. A two-class classifier for malignancy was developed through feature-selection on barcoded samples by selecting for genes with 85% stable absence or presence within a tissue type, and differentially stable between tissues. A multi-class feature-selection method was employed to identify genes with variable expression in one subtype, but 80% stable absence or presence in all other subtypes, i.e. 80% in n-1 subtypes. For RNA-Seq data, a barcoding method needed to be developed which could mimic the GExB algorithm for microarray data. A z-score-to-barcode method was implemented and differential gene expression analysis with selection of the top 100 genes as informative features for classification purposes. The accuracy and discriminatory capability of both microarray-based gene signatures and the RNA-Seq-based gene signatures was assessed through unsupervised and supervised machine-learning algorithms, i.e., K-means and Hierarchical clustering, as well as binary and multi-class Support Vector Machine (SVM) implementations. RESULTS: The GExB-FS method for microarray data yielded an 85-probe and 346-probe informative set for two-class and multi-class classifiers, respectively. The two-class classifier predicted samples as either normal or malignant with 100% accuracy and the multi-class classifier predicted molecular subtype with 96.5% accuracy with SVM. Combining RNA-Seq DE analysis for feature-selection with the z-score-to-barcode method, resulted in a two-class classifier for malignancy, and a multi-class classifier for normal-from-healthy, normal-adjacent-tumour (from cancer patients), and breast tumour samples with 100% accuracy. Most notably, a normal-adjacent-tumour gene expression signature emerged, which differentiated it from normal breast tissues in healthy individuals. CONCLUSION: A potentially novel method for microarray and RNA-Seq data transformation, feature selection and classifier development was established. The universal application of the microarray signatures and validity of the z-score-to-barcode method was proven with 95% accurate classification of RNA-Seq barcoded samples with a microarray discovered gene expression signature. The results from this comprehensive study into the discovery of robust gene expression signatures holds immense potential for further R&F towards implementation at the clinical endpoint, and translation to simpler and cost-effective laboratory methods such as qtPCR-based tests

    Algorithms for cancer genome data analysis - Learning techniques for ITH modeling and gene fusion classification

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    A Path to Implement Precision Child Health Cardiovascular Medicine.

    Get PDF
    Congenital heart defects (CHDs) affect approximately 1% of live births and are a major source of childhood morbidity and mortality even in countries with advanced healthcare systems. Along with phenotypic heterogeneity, the underlying etiology of CHDs is multifactorial, involving genetic, epigenetic, and/or environmental contributors. Clear dissection of the underlying mechanism is a powerful step to establish individualized therapies. However, the majority of CHDs are yet to be clearly diagnosed for the underlying genetic and environmental factors, and even less with effective therapies. Although the survival rate for CHDs is steadily improving, there is still a significant unmet need for refining diagnostic precision and establishing targeted therapies to optimize life quality and to minimize future complications. In particular, proper identification of disease associated genetic variants in humans has been challenging, and this greatly impedes our ability to delineate gene-environment interactions that contribute to the pathogenesis of CHDs. Implementing a systematic multileveled approach can establish a continuum from phenotypic characterization in the clinic to molecular dissection using combined next-generation sequencing platforms and validation studies in suitable models at the bench. Key elements necessary to advance the field are: first, proper delineation of the phenotypic spectrum of CHDs; second, defining the molecular genotype/phenotype by combining whole-exome sequencing and transcriptome analysis; third, integration of phenotypic, genotypic, and molecular datasets to identify molecular network contributing to CHDs; fourth, generation of relevant disease models and multileveled experimental investigations. In order to achieve all these goals, access to high-quality biological specimens from well-defined patient cohorts is a crucial step. Therefore, establishing a CHD BioCore is an essential infrastructure and a critical step on the path toward precision child health cardiovascular medicine

    Network-Based Approaches To Identify The Impacted Genes And Active Interactions

    Get PDF
    A very important step in system biology is the identification of the networks that are most impacted in the given phenotype. Such networks explain where the target genes are affected by some other genes, and therefore describe the mechanisms involved in a biological process. The identified networks are used to: 1) predict the disease or the responses of the system to a specific impact, 2) find the subset of genes that interact with each other and play an important role in the condition of interest, and 3) understand the mechanisms involved in that condition. In this thesis, we propose an approach that takes advantage of pre-defined pathways obtained from existing databases to identify the impact of a phenotype studied on such pathways. Next, we introduce a method able to build a network that captures the putative mechanisms at play in the given condition, by using datasets from multiple experiments studying the same phenotype. This method takes advantage of known interactions extracted from multiple sources such as protein-protein interactions and curated biological pathways. Based on such prior knowledge, we overcome the drawbacks of snap-shot data by considering the possible effects of each gene on its neighbors

    An Optimal Time for Treatment-Predicting Circadian Time by Machine Learning and Mathematical Modelling

    Get PDF
    Tailoring medical interventions to a particular patient and pathology has been termed personalized medicine. The outcome of cancer treatments is improved when the intervention is timed in accordance with the patient's internal time. Yet, one challenge of personalized medicine is how to consider the biological time of the patient. Prerequisite for this so-called chronotherapy is an accurate characterization of the internal circadian time of the patient. As an alternative to time-consuming measurements in a sleep-laboratory, recent studies in chronobiology predict circadian time by applying machine learning approaches and mathematical modelling to easier accessible observables such as gene expression. Embedding these results into the mathematical dynamics between clock and cancer in mammals, we review the precision of predictions and the potential usage with respect to cancer treatment and discuss whether the patient's internal time and circadian observables, may provide an additional indication for individualized treatment timing. Besides the health improvement, timing treatment may imply financial advantages, by ameliorating side effects of treatments, thus reducing costs. Summarizing the advances of recent years, this review brings together the current clinical standard for measuring biological time, the general assessment of circadian rhythmicity, the usage of rhythmic variables to predict biological time and models of circadian rhythmicity

    An updated State-of-the-Art Overview of transcriptomic Deconvolution Methods

    Full text link
    Although bulk transcriptomic analyses have significantly contributed to an enhanced comprehension of multifaceted diseases, their exploration capacity is impeded by the heterogeneous compositions of biological samples. Indeed, by averaging expression of multiple cell types, RNA-Seq analysis is oblivious to variations in cellular changes, hindering the identification of the internal constituents of tissues, involved in disease progression. On the other hand, single-cell techniques are still time, manpower and resource-consuming analyses.To address the intrinsic limitations of both bulk and single-cell methodologies, computational deconvolution techniques have been developed to estimate the frequencies of cell subtypes within complex tissues. These methods are especially valuable for dissecting intricate tissue niches, with a particular focus on tumour microenvironments (TME).In this paper, we offer a comprehensive overview of deconvolution techniques, classifying them based on their methodological characteristics, the type of prior knowledge required for the algorithm, and the statistical constraints they address. Within each category identified, we delve into the theoretical aspects for implementing the underlying method, while providing an in-depth discussion of their main advantages and disadvantages in supplementary materials.Notably, we emphasise the advantages of cutting-edge deconvolution tools based on probabilistic models, as they offer robust statistical frameworks that closely align with biological realities. We anticipate that this review will provide valuable guidelines for computational bioinformaticians in order to select the appropriate method in alignment with their statistical and biological objectives.We ultimately end this review by discussing open challenges that must be addressed to accurately quantify closely related cell types from RNA sequencing data, and the complementary role of single-cell RNA-Seq to that purpose

    Systems Biology Approaches For The Analysis Of High-Throughput Biological Data

    Get PDF
    The identification of biological processes involved with a certain phenotype, such as a disease or drug treatment, is the goal of the majority of life sciences experiments. Pathway analysis methods are used to interpret high-throughput biological data to identify such processes by incorporating information on biological systems to translate data into biological knowledge. Although widely used, current methods share a number of limitations. First, they do not take into account the individual contribution of each gene to the phenotype in analysis. Second, most of the methods include parameters of difficult interpretation, often arbitrarily set. Third, the results of all methods are affected by the fact that pathways are not independent entities, but communicate with each other by a phenomenon referred to as crosstalk. Crosstalk effects heavily influence the results of pathway analysis methods, adding a number of false positives and false negatives, making them difficult to interpret. We developed methods that address these limitations by i) allowing for the incorporation of individual gene contributions, ii) developing objective methods for the estimation of parameters of pathway analysis methods, and iii) developing an approach able to detect, quantify, and correct for crosstalk effects. We show on a number of real and simulated data that our approaches increase specificity and sensitivity of pathway analysis, allowing for a more effective identification of the processes and mechanisms underlying biological phenomena

    An Investigation Of Gene Networks Influenced By Low Dose Ionizing Radiation Using Statistical And Graph Theoretical Algorithms

    Get PDF
    Increased application of radiation in health and security sectors has raised concerns about its deleterious effects. Ionizing radiation (IR) less than 10cGys is considered low dose ionizing radiation (LDIR) by the National Research Committee to assess health risks from exposure to low levels of IR. It is hard to extract the effects of mild stimulus such as LDIR on gene expression profiles using simple differential expression. We hypothesized that differential correlation instead would capture the effects of LDIR on mutual relationships between genes. We tested this hypothesis on expression profiles from five inbred strains of mice treated with LDIR. Whereas ANOVA detected little effect of LDIR on gene expression, a differential correlation graph generated by a two stage statistical filter revealed gene networks enriched with genes implicated in radiation response, DNA damage repair, apoptosis, cancer and immune system. To mimic the effects of radiation on human populations, we profiled baseline expression of recombinant inbred strains of BXD mice derived from a cross between C57BL/6J and DBA/2J standard inbred strains. To establish a threshold for extraction of gene networks from the baseline expression profiles, we compared gene enrichment in paracliques obtained at different absolute Pearson correlations (APC) using graph algorithms. Gene networks extracted at statistically significant APC (r≈0.41) exhibited even better enrichment of genes participating in common biological processes than networks extracted at higher APCs from 0.6 to 0.875. Since immune response is influenced by LDIR, we investigated the effects of genetic background on variability of immune system in a population of BXD mice. Considering immune response as a complex trait, we identified significant QTLs explaining the ratio of CD8+ and CD4+ T-cells. Multiple regression modeling of genes neighboring statistically significant QTLs identified three candidate genes (Ptprk,Acp1 and Lamb1-1) explaining 61% variance of ratio of CD4+ and CD8+ T cells. Expression profiling of parental strains of BXD mice also revealed effects of LDIR and LDIR*strain on expression of genes related to immune response. Thus using an integrated approach involving transcriptomic, SNP and immunological data, we have developed novel methods to pinpoint candidate gene networks putatively influenced by LDIR
    • …
    corecore