Search CORE

2,096 research outputs found

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Multidisciplinary Digital Publishing Institute

Ezid

Directory of Open Access Journals

eScholarship - University of California

Application of Volcano Plots in Analyses of mRNA Differential Expressions with Microarrays

Author: Alvord W. G.
Auer P. L.
Chen Y.
Chen Z.
Cohen J.
Fechner G. T.
Guyon I.
Göhlmann H.
Lee J.
Li C.
Schwender H.
Smyth G. K.
Snedecor G. W.
Trevino V.
Vandesompele J.
Welsh B. L.
WENTIAN LI
Zhao C.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 28/08/2013
Field of study

Volcano plot displays unstandardized signal (e.g. log-fold-change) against noise-adjusted/standardized signal (e.g. t-statistic or -log10(p-value) from the t test). We review the basic and an interactive use of the volcano plot, and its crucial role in understanding the regularized t-statistic. The joint filtering gene selection criterion based on regularized statistics has a curved discriminant line in the volcano plot, as compared to the two perpendicular lines for the "double filtering" criterion. This review attempts to provide an unifying framework for discussions on alternative measures of differential expression, improved methods for estimating variance, and visual display of a microarray analysis result. We also discuss the possibility to apply volcano plots to other fields beyond microarray.Comment: 8 figure

arXiv.org e-Print Archive

Crossref

Non-stationary continuous dynamic Bayesian networks

Author: Grzegorczyk M.
Husmeier D.
Publication venue: Curran Associates, Inc.
Publication date: 01/01/2009
Field of study

Enlighten

EapGAFS: Microarray Dataset for Ensemble Classification for Diseases Prediction

Author: Krishna Peddarapu Rama
Rajarajeswari Pothuraju
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/08/2022
Field of study

Microarray data stores the measured expression levels of thousands of genes simultaneously which helps the researchers to get insight into the biological and prognostic information. Cancer is a deadly disease that develops over time and involves the uncontrolled division of body cells. In cancer, many genes are responsible for cell growth and division. But different kinds of cancer are caused by a different set of genes. So to be able to better understand, diagnose and treat cancer, it is essential to know which of the genes in the cancer cells are working abnormally. The advances in data mining, machine learning, soft computing, and pattern recognition have addressed the challenges posed by the researchers to develop computationally effective models to identify the new class of disease and develop diagnostic or therapeutic targets. This paper proposed an Ensemble Aprior Gentic Algorithm Feature Selection (EapGAFS) for microarray dataset classification. The proposed algorithm comprises of the genetic algorithm implemented with aprior learning for the microarray attributes classification. The proposed EapGAFS uses the rule set mining in the genetic algorithm for the microarray dataset processing. Through framed rule set the proposed model extract the attribute features in the dataset. Finally, with the ensemble classifier model the microarray dataset were classified for the processing. The performance of the proposed EapGAFS is conventional classifiers for the collected microarray dataset of the breast cancer, Hepatities, diabeties, and bupa. The comparative analysis of the proposed EapGAFS with the conventional classifier expressed that the proposed EapGAFS exhibits improved performance in the microarray dataset classification. The performance of the proposed EapGAFS is improved ~4 – 6% than the conventional classifiers such as Adaboost and ensemble

International Journal on Recent and Innovation Trends in Computing and Communication

Challenges in the Analysis of Mass-Throughput Data: A Technical Commentary from the Statistical Machine Learning Perspective

Author: Aliferis Constantin F.
Statnikov Alexander
Tsamardinos Ioannis
Publication venue: Libertas Academica
Publication date: 01/01/2006
Field of study

Sound data analysis is critical to the success of modern molecular medicine research that involves collection and interpretation of mass-throughput data. The novel nature and high-dimensionality in such datasets pose a series of nontrivial data analysis problems. This technical commentary discusses the problems of over-fitting, error estimation, curse of dimensionality, causal versus predictive modeling, integration of heterogeneous types of data, and lack of standard protocols for data analysis. We attempt to shed light on the nature and causes of these problems and to outline viable methodological approaches to overcome them

Directory of Open Access Journals

PubMed Central

Inference from binary gene expression data

Author: Tuna Salih
Publication venue
Publication date: 01/10/2009
Field of study

Microarrays provide a practical method for measuring the mRNA abundances of thousands of genes in a single experiment. Analysing such large dimensional data is a challenge which attracts researchers from many different fields and machine learning is one of them. However, the biological properties of mRNA such as its low stability, measurements being taken from a population of cells rather than from a single cell, etc. should make researchers sceptical about the high numerical precision reported and thus the reproducibility of these measurements. In this study we explore data representation at lower numerical precision, down to binary (retaining only the information whether a gene is expressed or not), thereby improving the quality of inferences drawn from microarray studies. With binary representation, we propose a solution to reduce the effect of algorithmic choice in the pre-processing stages.First we compare the information loss if researchers made the inferences from quantized transcriptome data rather than the continuous values. Classification, clustering, periodicity detection and analysis of developmental time series data are considered here. Our results showed that there is not much information loss with binary data. Then, by focusing on the two most widely used inference tools, classification and clustering, we show that inferences drawn from transcriptome data can actually be improved with a metric suitable for binary data. This is explained with the uncertainties of the probe level data. We also show that binary transcriptome data can be used in cross-platform studies and when used with Tanimoto kernel, this increase the performance of inferences when compared to individual datasets. In the last part of this work we show that binary transcriptome data reduces the effect of algorithm choice for pre-processing raw data. While there are many different algorithms for pre-processing stages there are few guidelines for the users as to which one to choose. In many studies it has been shown that the choice of algorithms has significant impact on the overall results of microarray studies. Here we show in classification, that if transcriptome data is binarized after pre-processed with any combination of algorithms it has the effect of reducing the variability of the results and increasing the performance of the classifier simultaneously

Southampton (e-Prints Soton)

Fuzzy-Granular Based Data Mining for Effective Decision Support in Biomedical Applications

Author: He Yuanchen
Publication venue: ScholarWorks @ Georgia State University
Publication date: 04/12/2006
Field of study

Due to complexity of biomedical problems, adaptive and intelligent knowledge discovery and data mining systems are highly needed to help humans to understand the inherent mechanism of diseases. For biomedical classification problems, typically it is impossible to build a perfect classifier with 100% prediction accuracy. Hence a more realistic target is to build an effective Decision Support System (DSS). In this dissertation, a novel adaptive Fuzzy Association Rules (FARs) mining algorithm, named FARM-DS, is proposed to build such a DSS for binary classification problems in the biomedical domain. Empirical studies show that FARM-DS is competitive to state-of-the-art classifiers in terms of prediction accuracy. More importantly, FARs can provide strong decision support on disease diagnoses due to their easy interpretability. This dissertation also proposes a fuzzy-granular method to select informative and discriminative genes from huge microarray gene expression data. With fuzzy granulation, information loss in the process of gene selection is decreased. As a result, more informative genes for cancer classification are selected and more accurate classifiers can be modeled. Empirical studies show that the proposed method is more accurate than traditional algorithms for cancer classification. And hence we expect that genes being selected can be more helpful for further biological studies

ScholarWorks @ Georgia State University

Gene Expression-Based Glioma Classification Using Hierarchical Bayesian Vector Machines

Author: Chakraborty Sounak
Dougherty Edward
Ghosh Debashis
Ghosh Malay
Mallick Bani K., 1965-
Publication venue: Indian Statistical Institute
Publication date: 01/01/2007
Field of study

This paper considers several Bayesian classification methods for the analysis of the glioma cancer with microarray data based on reproducing kernel Hilbert space under the multiclass setup. We consider the multinomial logit likelihood as well as the likelihood related to the multiclass Support Vector Machine (SVM) model. It is shown that our proposed Bayesian classification models with multiple shrinkage parameters can produce more accurate classification scheme for the glioma cancer compared to several existing classical methods. We have also proposed a Bayesian variable selection scheme for selecting the differentially expressed genes integrated with our model. This integrated approach improves classifier design by yielding simultaneous gene selection

CiteSeerX

University of Missouri: MOspace