Search CORE

17,270 research outputs found

Bayesian hierarchical clustering for studying cancer gene expression data with unknown statistics

Author: A Su
B Frey
C Nutt
C Rasmussen
C Rasmussen
D Arango
D Jiang
D Singh
David R. J. Snead
E Cooke
Ferdinando Di Cunto
G Brock
J Ihmels
J Yao
K Yeung
Korsuk Sirinukunwattana
L Hubert
L McQuitty
LF Wu
M De Souto
M Eisen
M Shipp
Muhammad F. Bari
Nasir M. Rajpoot
P D'haeseleer
P Laiho
R Neal
R Savage
R Sokal
Richard S. Savage
S Armstrong
S Datta
S Eschrich
S Falcon
S Matsui
S Pomeroy
S Ramaswamy
S Varambally
T Golub
Y Wang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Clustering analysis is an important tool in studying gene expression data. The Bayesian hierarchical clustering (BHC) algorithm can automatically infer the number of clusters and uses Bayesian model selection to improve clustering quality. In this paper, we present an extension of the BHC algorithm. Our Gaussian BHC (GBHC) algorithm represents data as a mixture of Gaussian distributions. It uses normal-gamma distribution as a conjugate prior on the mean and precision of each of the Gaussian components. We tested GBHC over 11 cancer and 3 synthetic datasets. The results on cancer datasets show that in sample clustering, GBHC on average produces a clustering partition that is more concordant with the ground truth than those obtained from other commonly used algorithms. Furthermore, GBHC frequently infers the number of clusters that is often close to the ground truth. In gene clustering, GBHC also produces a clustering partition that is more biologically plausible than several other state-of-the-art methods. This suggests GBHC as an alternative tool for studying gene expression data. The implementation of GBHC is available at https://sites. google.com/site/gaussianbhc

CiteSeerX

Public Library of Science (PLOS)

Qatar University Institutional Repository

Crossref

Directory of Open Access Journals

PubMed Central

Warwick Research Archives Portal Repository

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Multidisciplinary Digital Publishing Institute

Ezid

Directory of Open Access Journals

eScholarship - University of California

A Framework to Discover Emerging Patterns for Application in Microarray Data

Author: Boulesteix Anne-Laure
Tutz Gerhard
Publication venue
Publication date: 01/01/2003
Field of study

Various supervised learning and gene selection methods have been used for cancer diagnosis. Most of these methods do not consider interactions between genes, although this might be interesting biologically and improve classification accuracy. Here we introduce a new CART-based method to discover emerging patterns. Emerging patterns are structures of the form (X1>a1)AND(X2<a2) that have differing frequencies in the considered classes. Interaction structures of this kind are of great interest in cancer research. Moreover, they can be used to define new variables for classification. Using simulated data sets, we show that our method allows the identification of emerging patterns with high efficiency. We also perform classification using two publicly available data sets (leukemia and colon cancer). For each data set, the method allows efficient classification as well as the identification of interesting patterns

CiteSeerX

Open Access LMU

Multi-test Decision Tree and its Application to Microarray Data Classification

Author: Armstrong
Berzal
Breiman
Breiman
Breiman
Brodley
Brown
Brown
Che
Chen
Cohen
Cordell
Cowell
Czajkowski
Demsar
Dettling
Diaz-Uriarte
Dramiński
Fayyad
Freund
Freund
Ge
Golub
Grześ
Hall
Hastie
Hu
Kuo
Li
Marcin Czajkowski
Marek Grześ
Marek Kretowski
Murthy
Murthy
Pagallo
Qu
Quinlan
Robnik-Siikonja
Rokach
Rokach
Sebastiani
Shalev-Shwartz
Shi
Tan
Tan
Wold
Yeoh
Publication venue: 'Elsevier BV'
Publication date: 01/05/2014
Field of study

Objective: The desirable property of tools used to investigate biological data is easy to understand models and predictive decisions. Decision trees are particularly promising in this regard due to their comprehensible nature that resembles the hierarchical process of human decision making. However, existing algorithms for learning decision trees have tendency to underfit gene expression data. The main aim of this work is to improve the performance and stability of decision trees with only a small increase in their complexity. Methods: We propose a multi-test decision tree (MTDT); our main contribution is the application of several univariate tests in each non-terminal node of the decision tree. We also search for alternative, lower-ranked features in order to obtain more stable and reliable predictions. Results: Experimental validation was performed on several real-life gene expression datasets. Comparison results with eight classifiers show that MTDT has a statistically significantly higher accuracy than popular decision tree classifiers, and it was highly competitive with ensemble learning algorithms. The proposed solution managed to outperform its baseline algorithm on

14

datasets by an average

6

percent. A study performed on one of the datasets showed that the discovered genes used in the MTDT classification model are supported by biological evidence in the literature. Conclusion: This paper introduces a new type of decision tree which is more suitable for solving biological problems. MTDTs are relatively easy to analyze and much more powerful in modeling high dimensional microarray data than their popular counterparts

Crossref

Kent Academic Repository

Towards precise classification of cancers based on robust gene functional expression profiles

Author: Guo Zheng
Li Xia
Rao Shaoqi
Topol Eric J
Wang Chenguang
Wang Haiyun
Wang Qi
Wang Qing
Xu Jianzhen
Yu Hui
Zhang Tianwen
Zhu Jing
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Development of robust and efficient methods for analyzing and interpreting high dimension gene expression profiles continues to be a focus in computational biology. The accumulated experiment evidence supports the assumption that genes express and perform their functions in modular fashions in cells. Therefore, there is an open space for development of the timely and relevant computational algorithms that use robust functional expression profiles towards precise classification of complex human diseases at the modular level. RESULTS: Inspired by the insight that genes act as a module to carry out a highly integrated cellular function, we thus define a low dimension functional expression profile for data reduction. After annotating each individual gene to functional categories defined in a proper gene function classification system such as Gene Ontology applied in this study, we identify those functional categories enriched with differentially expressed genes. For each functional category or functional module, we compute a summary measure (s) for the raw expression values of the annotated genes to capture the overall activity level of the module. In this way, we can treat the gene expressions within a functional module as an integrative data point to replace the multiple values of individual genes. We compare the classification performance of decision trees based on functional expression profiles with the conventional gene expression profiles using four publicly available datasets, which indicates that precise classification of tumour types and improved interpretation can be achieved with the reduced functional expression profiles. CONCLUSION: This modular approach is demonstrated to be a powerful alternative approach to analyzing high dimension microarray data and is robust to high measurement noise and intrinsic biological variance inherent in microarray data. Furthermore, efficient integration with current biological knowledge has facilitated the interpretation of the underlying molecular mechanisms for complex human diseases at the modular level

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Elephant Search with Deep Learning for Microarray Data Analysis

Author: Panda Mrutyunjaya
Publication venue
Publication date: 12/07/2017
Field of study

Even though there is a plethora of research in Microarray gene expression data analysis, still, it poses challenges for researchers to effectively and efficiently analyze the large yet complex expression of genes. The feature (gene) selection method is of paramount importance for understanding the differences in biological and non-biological variation between samples. In order to address this problem, a novel elephant search (ES) based optimization is proposed to select best gene expressions from the large volume of microarray data. Further, a promising machine learning method is envisioned to leverage such high dimensional and complex microarray dataset for extracting hidden patterns inside to make a meaningful prediction and most accurate classification. In particular, stochastic gradient descent based Deep learning (DL) with softmax activation function is then used on the reduced features (genes) for better classification of different samples according to their gene expression levels. The experiments are carried out on nine most popular Cancer microarray gene selection datasets, obtained from UCI machine learning repository. The empirical results obtained by the proposed elephant search based deep learning (ESDL) approach are compared with most recent published article for its suitability in future Bioinformatics research.Comment: 12 pages, 5 Tabl

arXiv.org e-Print Archive

A cDNA Microarray Gene Expression Data Classifier for Clinical Diagnostics Based on Graph Theory

Author: Benso Alfredo
Di Carlo Stefano
Politano Gianfranco Michele Maria
Publication venue: IEEE Computer Society
Publication date: 01/01/2011
Field of study

Despite great advances in discovering cancer molecular profiles, the proper application of microarray technology to routine clinical diagnostics is still a challenge. Current practices in the classification of microarrays' data show two main limitations: the reliability of the training data sets used to build the classifiers, and the classifiers' performances, especially when the sample to be classified does not belong to any of the available classes. In this case, state-of-the-art algorithms usually produce a high rate of false positives that, in real diagnostic applications, are unacceptable. To address this problem, this paper presents a new cDNA microarray data classification algorithm based on graph theory and is able to overcome most of the limitations of known classification methodologies. The classifier works by analyzing gene expression data organized in an innovative data structure based on graphs, where vertices correspond to genes and edges to gene expression relationships. To demonstrate the novelty of the proposed approach, the authors present an experimental performance comparison between the proposed classifier and several state-of-the-art classification algorithm

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Visual and computational analysis of structure-activity relationships in high-throughput screening data

Author: Agrafiotis
Agrafiotis
Ahlberg
Ajay
Ajay
Bayada
Bemis
Bernard
Bonabeau
Brown
Calvert
Card
Chen
Chen
Cho
Christianini
Clark
Clark
Cox
Duda
Edwards
Engels
Frimurer
Gao
Garrido
Ghose
Gillet
Gillet
Hand
Hann
Haupts
Hayward
Hertzberg
Izrailev
Jiang
Jones-Hertzog
Kirew
Kobayashi
Kohonen
Ladd
Lee
Lepre
Martin
Mason
Mello
Meyer
Miller
Mitchell
Oprea
Peter Gedeck
Peter Willett
Poroikov
Rhodes
Roberts
Roberts
Ros
Rusinko
Sadowski
Sadowski
Scherf
Sheridan
Shi
Stanton
Su
Teague
Thompson
Tropsha
Tufte
Tufte
Wagener
Walters
Wang
Wedin
Xie
Xu
Zupan
Publication venue: 'Elsevier BV'
Publication date: 01/08/2001
Field of study

Novel analytic methods are required to assimilate the large volumes of structural and bioassay data generated by combinatorial chemistry and high-throughput screening programmes in the pharmaceutical and agrochemical industries. This paper reviews recent work in visualisation and data mining that can be used to develop structure-activity relationships from such chemical/biological datasets

Crossref

White Rose Research Online