1 research outputs found
A Novel Bayesian Rank-Based Framework for the Classification of High-Dimensional Biological Data
Statistical analysis of high-dimensional biological data is the central component of “personalized
medicine” and “translational bioinformatics.” Two major barriers limit the application of the
extracted information in clinical studies. These barriers are small sample size and lack of biological
interpretability due to the complex classification boundaries of current algorithms.
Motivated in removing these barriers, we focus in this dissertation to introduce novel statistical
analysis algorithms of high-dimensional biological data. We first introduce a novel predictive
model. In particular, we extend the top-scoring pair algorithm to a Bayesian setting. We test the
performance on several real datasets and various simulated data scenarios and show the proposed
method has the best overall performance. Besides having high accuracy rates on real and simulated
data sets, the proposed algorithm has the potential to discover gene markers that may be missed
via other algorithms.
We also suggested the Bayesian Top-Scoring Pair (BTSP) as a feature selection method. We
compared the proposed algorithm with many well-known feature selection methods by combining
the feature selection methods with different well-known classifiers. We checked the performance
of all feature selection methods for different data sets and for different numbers of genes. The
proposed BTSP algorithm has the best overall accuracy rates.
Finally, we introduce a novel biological pathway data-based algorithm (BTSPP). This algorithm
uses all pairwise interactions in the gene level and pathway level. We apply the proposed
method and well-known pathway data-based algorithms to different real data sets and check performances
in terms of accurately classifying independent test sets and show the proposed algorithm
superiority. We also checked the ability to find the biologically validated pathways related with
diseases of these pathway data-based algorithms, over-representation analysis (ORA), and gene
set enrichment analysis (GSEA). The proposed pathway analysis method has the potential to find
the biologically validated pathways, whereas the others cannot detect the biologically validated
pathways