143 research outputs found

    A Comparison of Methods for Data-Driven Cancer Outlier Discovery, and An Application Scheme to Semisupervised Predictive Biomarker Discovery

    Get PDF
    A core component in translational cancer research is biomarker discovery using gene expression profiling for clinical tumors. This is often based on cell line experiments; one population is sampled for inference in another. We disclose a semisupervised workflow focusing on binary (switch-like, bimodal) informative genes that are likely cancer relevant, to mitigate this non-statistical problem. Outlier detection is a key enabling technology of the workflow, and aids in identifying the focus genes

    Gene Expression Profiling Predicts Survival in Conventional Renal Cell Carcinoma

    Get PDF
    BACKGROUND: Conventional renal cell carcinoma (cRCC) accounts for most of the deaths due to kidney cancer. Tumor stage, grade, and patient performance status are used currently to predict survival after surgery. Our goal was to identify gene expression features, using comprehensive gene expression profiling, that correlate with survival. METHODS AND FINDINGS: Gene expression profiles were determined in 177 primary cRCCs using DNA microarrays. Unsupervised hierarchical clustering analysis segregated cRCC into five gene expression subgroups. Expression subgroup was correlated with survival in long-term follow-up and was independent of grade, stage, and performance status. The tumors were then divided evenly into training and test sets that were balanced for grade, stage, performance status, and length of follow-up. A semisupervised learning algorithm (supervised principal components analysis) was applied to identify transcripts whose expression was associated with survival in the training set, and the performance of this gene expression-based survival predictor was assessed using the test set. With this method, we identified 259 genes that accurately predicted disease-specific survival among patients in the independent validation group (p < 0.001). In multivariate analysis, the gene expression predictor was a strong predictor of survival independent of tumor stage, grade, and performance status (p < 0.001). CONCLUSIONS: cRCC displays molecular heterogeneity and can be separated into gene expression subgroups that correlate with survival after surgery. We have identified a set of 259 genes that predict survival after surgery independent of clinical prognostic factors

    Gains in Power from Structured Two-Sample Tests of Means on Graphs

    Get PDF
    We consider multivariate two-sample tests of means, where the location shift between the two populations is expected to be related to a known graph structure. An important application of such tests is the detection of differentially expressed genes between two patient populations, as shifts in expression levels are expected to be coherent with the structure of graphs reflecting gene properties such as biological process, molecular function, regulation, or metabolism. For a fixed graph of interest, we demonstrate that accounting for graph structure can yield more powerful tests under the assumption of smooth distribution shift on the graph. We also investigate the identification of non-homogeneous subgraphs of a given large graph, which poses both computational and multiple testing problems. The relevance and benefits of the proposed approach are illustrated on synthetic data and on breast cancer gene expression data analyzed in context of KEGG pathways

    Machine learning applied to enzyme turnover numbers reveals protein structural correlates and improves metabolic models.

    Get PDF
    Knowing the catalytic turnover numbers of enzymes is essential for understanding the growth rate, proteome composition, and physiology of organisms, but experimental data on enzyme turnover numbers is sparse and noisy. Here, we demonstrate that machine learning can successfully predict catalytic turnover numbers in Escherichia coli based on integrated data on enzyme biochemistry, protein structure, and network context. We identify a diverse set of features that are consistently predictive for both in vivo and in vitro enzyme turnover rates, revealing novel protein structural correlates of catalytic turnover. We use our predictions to parameterize two mechanistic genome-scale modelling frameworks for proteome-limited metabolism, leading to significantly higher accuracy in the prediction of quantitative proteome data than previous approaches. The presented machine learning models thus provide a valuable tool for understanding metabolism and the proteome at the genome scale, and elucidate structural, biochemical, and network properties that underlie enzyme kinetics

    Application of Artificial Intelligence in Modern Healthcare System

    Get PDF
    Artificial intelligence (AI) has the potential of detecting significant interactions in a dataset and also it is widely used in several clinical conditions to expect the results, treat, and diagnose. Artificial intelligence (AI) is being used or trialed for a variety of healthcare and research purposes, including detection of disease, management of chronic conditions, delivery of health services, and drug discovery. In this chapter, we will discuss the application of artificial intelligence (AI) in modern healthcare system and the challenges of this system in detail. Different types of artificial intelligence devices are described in this chapter with the help of working mechanism discussion. Alginate, a naturally available polymer found in the cell wall of the brown algae, is used in tissue engineering because of its biocompatibility, low cost, and easy gelation. It is composed of α-L-guluronic and β-D-manuronic acid. To improve the cell-material interaction and erratic degradation, alginate is blended with other polymers. Here, we discuss the relationship of artificial intelligence with alginate in tissue engineering fields

    Mimicry Embedding Facilitates Advanced Neural Network Training for Image-Based Pathogen Detection.

    Get PDF
    The use of deep neural networks (DNNs) for analysis of complex biomedical images shows great promise but is hampered by a lack of large verified data sets for rapid network evolution. Here, we present a novel strategy, termed "mimicry embedding," for rapid application of neural network architecture-based analysis of pathogen imaging data sets. Embedding of a novel host-pathogen data set, such that it mimics a verified data set, enables efficient deep learning using high expressive capacity architectures and seamless architecture switching. We applied this strategy across various microbiological phenotypes, from superresolved viruses to in vitro and in vivo parasitic infections. We demonstrate that mimicry embedding enables efficient and accurate analysis of two- and three-dimensional microscopy data sets. The results suggest that transfer learning from pretrained network data may be a powerful general strategy for analysis of heterogeneous pathogen fluorescence imaging data sets.IMPORTANCE In biology, the use of deep neural networks (DNNs) for analysis of pathogen infection is hampered by a lack of large verified data sets needed for rapid network evolution. Artificial neural networks detect handwritten digits with high precision thanks to large data sets, such as MNIST, that allow nearly unlimited training. Here, we developed a novel strategy we call mimicry embedding, which allows artificial intelligence (AI)-based analysis of variable pathogen-host data sets. We show that deep learning can be used to detect and classify single pathogens based on small differences

    Multiplatform biomarker identification using a data-driven approach enables single-sample classification

    Get PDF
    Background: High-throughput gene expression profiles have allowed discovery of potential biomarkers enabling early diagnosis, prognosis and developing individualized treatment. However, it remains a challenge to identify a set of reliable and reproducible biomarkers across various gene expression platforms and laboratories for single sample diagnosis and prognosis. We address this need with our Data-Driven Reference (DDR) approach, which employs stably expressed housekeeping genes as references to eliminate platform-specific biases and non-biological variabilities. Results: Our method identifies biomarkers with “built-in” features, and these features can be interpreted consistently regardless of profiling technology, which enable classification of single-sample independent of platforms. Validation with RNA-seq data of blood platelets shows that DDR achieves the superior performance in classification of six different tumor types as well as molecular target statuses (such as MET or HER2-positive, and mutant KRAS, EGFR or PIK3CA) with smaller sets of biomarkers. We demonstrate on the three microarray datasets that our method is capable of identifying robust biomarkers for subgrouping medulloblastoma samples with data perturbation due to different microarray platforms. In addition to identifying the majority of subgroup-specific biomarkers in CodeSet of nanoString, some potential new biomarkers for subgrouping medulloblastoma were detected by our method. Conclusions: In this study, we present a simple, yet powerful data-driven method which contributes significantly to identification of robust cross-platform gene signature for disease classification of single-patient to facilitate precision medicine. In addition, our method provides a new strategy for transcriptome analysis
    corecore