Search CORE

143 research outputs found

A Comparison of Methods for Data-Driven Cancer Outlier Discovery, and An Application Scheme to Semisupervised Predictive Biomarker Discovery

Author: Karrila Seppo
Lee Julian Hock Ean
Tucker-Kellogg Greg
Publication venue: Libertas Academica
Publication date: 01/01/2011
Field of study

A core component in translational cancer research is biomarker discovery using gene expression profiling for clinical tumors. This is often based on cell line experiments; one population is sampled for inference in another. We disclose a semisupervised workflow focusing on binary (switch-like, bimodal) informative genes that are likely cancer relevant, to mitigate this non-statistical problem. Outlier detection is a key enabling technology of the workflow, and aids in identifying the focus genes

Crossref

Directory of Open Access Journals

PubMed Central

ScholarBank@NUS

Gene Expression Profiling Predicts Survival in Conventional Renal Cell Carcinoma

Author: Bair
Ben-Tovim Jones
Boer
Bui
Bullinger
Börje Ljungberg
Eisen
Flanigan
Francesco Marincola
Frank
Gnarra
Gollub
Higgins
Hongjuan Zhao
Ivan
James D Brooks
Jemal
Kjell Grankvist
Lossos
Negrier
Paik
Patard
Perou
Ramaswamy
Robert Tibshirani
Rosenwald
Schuetz
Sorbellini
Takahashi
Torgny Rasmuson
van de Vijver
Vasselli
Vogelzang
Yao
Publication venue: Public Library of Science
Publication date: 01/12/2005
Field of study

BACKGROUND: Conventional renal cell carcinoma (cRCC) accounts for most of the deaths due to kidney cancer. Tumor stage, grade, and patient performance status are used currently to predict survival after surgery. Our goal was to identify gene expression features, using comprehensive gene expression profiling, that correlate with survival. METHODS AND FINDINGS: Gene expression profiles were determined in 177 primary cRCCs using DNA microarrays. Unsupervised hierarchical clustering analysis segregated cRCC into five gene expression subgroups. Expression subgroup was correlated with survival in long-term follow-up and was independent of grade, stage, and performance status. The tumors were then divided evenly into training and test sets that were balanced for grade, stage, performance status, and length of follow-up. A semisupervised learning algorithm (supervised principal components analysis) was applied to identify transcripts whose expression was associated with survival in the training set, and the performance of this gene expression-based survival predictor was assessed using the test set. With this method, we identified 259 genes that accurately predicted disease-specific survival among patients in the independent validation group (p < 0.001). In multivariate analysis, the gene expression predictor was a strong predictor of survival independent of tumor stage, grade, and performance status (p < 0.001). CONCLUSIONS: cRCC displays molecular heterogeneity and can be separated into gene expression subgroups that correlate with survival after surgery. We have identified a set of 259 genes that predict survival after surgery independent of clinical prognostic factors

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Examining the Classification Accuracy of TSVMs with Feature Selection in Comparison with the GLAD Algorithm

Author: A Gommerman
A S M Yong
A Zien
C Harris
F Valafar
Hala Helmi
I Guyon
J Han
Jonathan M. Garibaldi
K Bennett
M A Shipp
M P Brown
R Collobert
R Zhang
S Abney
T Jaakkola
T Joachims
T Joachims
T R Golub
Uwe Aickelin
X Zhu
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

Crossref

Gains in Power from Structured Two-Sample Tests of Means on Graphs

Author: Dudoit Sandrine
Jacob Laurent
Neuvial Pierre
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2010
Field of study

We consider multivariate two-sample tests of means, where the location shift between the two populations is expected to be related to a known graph structure. An important application of such tests is the detection of differentially expressed genes between two patient populations, as shifts in expression levels are expected to be coherent with the structure of graphs reflecting gene properties such as biological process, molecular function, regulation, or metabolism. For a fixed graph of interest, we demonstrate that accounting for graph structure can yield more powerful tests under the assumption of smooth distribution shift on the graph. We also investigate the identification of non-homogeneous subgraphs of a given large graph, which poses both computational and multiple testing problems. The relevance and benefits of the proposed approach are illustrated on synthetic data and on breast cancer gene expression data analyzed in context of KEGG pathways

arXiv.org e-Print Archive

Collection Of Biostatistics Research Archive

Machine learning applied to enzyme turnover numbers reveals protein structural correlates and improves metabolic models.

Author: Desouki Abdelmoneim Amer
Ha Yuanchi
Haiman Zachary B
Haiman Zachary B
Heckmann David
Lercher Martin J
Lloyd Colton J
Mih Nathan
Palsson Bernhard O
Zielinski Daniel C
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

Knowing the catalytic turnover numbers of enzymes is essential for understanding the growth rate, proteome composition, and physiology of organisms, but experimental data on enzyme turnover numbers is sparse and noisy. Here, we demonstrate that machine learning can successfully predict catalytic turnover numbers in Escherichia coli based on integrated data on enzyme biochemistry, protein structure, and network context. We identify a diverse set of features that are consistently predictive for both in vivo and in vitro enzyme turnover rates, revealing novel protein structural correlates of catalytic turnover. We use our predictions to parameterize two mechanistic genome-scale modelling frameworks for proteome-limited metabolism, leading to significantly higher accuracy in the prediction of quantitative proteome data than previous approaches. The presented machine learning models thus provide a valuable tool for understanding metabolism and the proteome at the genome scale, and elucidate structural, biochemical, and network properties that underlie enzyme kinetics

Directory of Open Access Journals

eScholarship - University of California

Online Research Database In Technology

Application of Artificial Intelligence in Modern Healthcare System

Author: Barua Ranjit
Das Jonali
Datta Sudipto
Publication venue: 'IntechOpen'
Publication date: 12/12/2019
Field of study

Artificial intelligence (AI) has the potential of detecting significant interactions in a dataset and also it is widely used in several clinical conditions to expect the results, treat, and diagnose. Artificial intelligence (AI) is being used or trialed for a variety of healthcare and research purposes, including detection of disease, management of chronic conditions, delivery of health services, and drug discovery. In this chapter, we will discuss the application of artificial intelligence (AI) in modern healthcare system and the challenges of this system in detail. Different types of artificial intelligence devices are described in this chapter with the help of working mechanism discussion. Alginate, a naturally available polymer found in the cell wall of the brown algae, is used in tissue engineering because of its biocompatibility, low cost, and easy gelation. It is composed of α-L-guluronic and β-D-manuronic acid. To improve the cell-material interaction and erratic degradation, alginate is blended with other polymers. Here, we discuss the relationship of artificial intelligence with alginate in tissue engineering fields

IntechOpen

Mimicry Embedding Facilitates Advanced Neural Network Training for Image-Based Pathogen Detection.

Author: Clough Barbara
Frickel Eva-Maria
Huttunen Moona
Mercer Jason
Mostowy Serge
Samolej Jerzy
Yakimovich Artur
Yoshida Nagisa
Publication venue: 'American Society for Microbiology'
Publication date: 01/09/2020
Field of study

The use of deep neural networks (DNNs) for analysis of complex biomedical images shows great promise but is hampered by a lack of large verified data sets for rapid network evolution. Here, we present a novel strategy, termed "mimicry embedding," for rapid application of neural network architecture-based analysis of pathogen imaging data sets. Embedding of a novel host-pathogen data set, such that it mimics a verified data set, enables efficient deep learning using high expressive capacity architectures and seamless architecture switching. We applied this strategy across various microbiological phenotypes, from superresolved viruses to in vitro and in vivo parasitic infections. We demonstrate that mimicry embedding enables efficient and accurate analysis of two- and three-dimensional microscopy data sets. The results suggest that transfer learning from pretrained network data may be a powerful general strategy for analysis of heterogeneous pathogen fluorescence imaging data sets.IMPORTANCE In biology, the use of deep neural networks (DNNs) for analysis of pathogen infection is hampered by a lack of large verified data sets needed for rapid network evolution. Artificial neural networks detect handwritten digits with high precision thanks to large data sets, such as MNIST, that allow nearly unlimited training. Here, we developed a novel strategy we call mimicry embedding, which allows artificial intelligence (AI)-based analysis of variable pathogen-host data sets. We show that deep learning can be used to detect and classify single pathogens based on small differences

LSHTM Research Online

University of Birmingham Research Portal

Directory of Open Access Journals

UCL Discovery

Multiplatform biomarker identification using a data-driven approach enables single-sample classification

Author: Bastola Dhundy Raj
Haas Christian
Thapa Ishwor
Zhang Ling
Publication venue: DigitalCommons@UNO
Publication date: 01/01/2019
Field of study

Background: High-throughput gene expression profiles have allowed discovery of potential biomarkers enabling early diagnosis, prognosis and developing individualized treatment. However, it remains a challenge to identify a set of reliable and reproducible biomarkers across various gene expression platforms and laboratories for single sample diagnosis and prognosis. We address this need with our Data-Driven Reference (DDR) approach, which employs stably expressed housekeeping genes as references to eliminate platform-specific biases and non-biological variabilities. Results: Our method identifies biomarkers with “built-in” features, and these features can be interpreted consistently regardless of profiling technology, which enable classification of single-sample independent of platforms. Validation with RNA-seq data of blood platelets shows that DDR achieves the superior performance in classification of six different tumor types as well as molecular target statuses (such as MET or HER2-positive, and mutant KRAS, EGFR or PIK3CA) with smaller sets of biomarkers. We demonstrate on the three microarray datasets that our method is capable of identifying robust biomarkers for subgrouping medulloblastoma samples with data perturbation due to different microarray platforms. In addition to identifying the majority of subgroup-specific biomarkers in CodeSet of nanoString, some potential new biomarkers for subgrouping medulloblastoma were detected by our method. Conclusions: In this study, we present a simple, yet powerful data-driven method which contributes significantly to identification of robust cross-platform gene signature for disease classification of single-patient to facilitate precision medicine. In addition, our method provides a new strategy for transcriptome analysis

The University of Nebraska, Omaha