3 research outputs found
Gene Shaving using influence function of a kernel method
Identifying significant subsets of the genes, gene shaving is an essential
and challenging issue for biomedical research for a huge number of genes and
the complex nature of biological networks,. Since positive definite kernel
based methods on genomic information can improve the prediction of diseases, in
this paper we proposed a new method, "kernel gene shaving (kernel canonical
correlation analysis (kernel CCA) based gene shaving). This problem is
addressed using the influence function of the kernel CCA. To investigate the
performance of the proposed method in a comparison of three popular gene
selection methods (T-test, SAM and LIMMA), we were used extensive simulated and
real microarray gene expression datasets. The performance measures AUC was
computed for each of the methods. The achievement of the proposed method has
improved than the three well-known gene selection methods. In real data
analysis, the proposed method identified a subsets of genes out of
genes. The network of these genes has significantly more interactions than
expected, which indicates that they may function in a concerted effort on colon
cancer.Comment: 14 pages, 6 figures, submitted to ICCIT2018, Banglades
Gene-Gene association for Imaging Genetics Data using Robust Kernel Canonical Correlation Analysis
In genome-wide interaction studies, to detect gene-gene interactions, most
methods are divided into two folds: single nucleotide polymorphisms (SNP) based
and gene-based methods. Basically, the methods based on the gene are more
effective than the methods based on a single SNP. Recent years, while the
kernel canonical correlation analysis (Classical kernel CCA) based U statistic
(KCCU) has proposed to detect the nonlinear relationship between genes. To
estimate the variance in KCCU, they have used resampling based methods which
are highly computationally intensive. In addition, classical kernel CCA is not
robust to contaminated data. We, therefore, first discuss robust kernel mean
element, the robust kernel covariance, and cross-covariance operators. Second,
we propose a method based on influence function to estimate the variance of the
KCCU. Third, we propose a nonparametric robust KCCU method based on robust
kernel CCA, which is designed for contaminated data and less sensitive to noise
than classical kernel CCA. Finally, we investigate the proposed methods to
synthesized data and imaging genetic data set. Based on gene ontology and
pathway analysis, the synthesized and genetics analysis demonstrate that the
proposed robust method shows the superior performance of the state-of-the-art
methods.Comment: arXiv admin note: substantial text overlap with arXiv:1602.0556
Robust Kernel (Cross-) Covariance Operators in Reproducing Kernel Hilbert Space toward Kernel Methods
To the best of our knowledge, there are no general well-founded robust
methods for statistical unsupervised learning. Most of the unsupervised methods
explicitly or implicitly depend on the kernel covariance operator (kernel CO)
or kernel cross-covariance operator (kernel CCO). They are sensitive to
contaminated data, even when using bounded positive definite kernels. First, we
propose robust kernel covariance operator (robust kernel CO) and robust kernel
crosscovariance operator (robust kernel CCO) based on a generalized loss
function instead of the quadratic loss function. Second, we propose influence
function of classical kernel canonical correlation analysis (classical kernel
CCA). Third, using this influence function, we propose a visualization method
to detect influential observations from two sets of data. Finally, we propose a
method based on robust kernel CO and robust kernel CCO, called robust kernel
CCA, which is designed for contaminated data and less sensitive to noise than
classical kernel CCA. The principles we describe also apply to many kernel
methods which must deal with the issue of kernel CO or kernel CCO. Experiments
on synthesized and imaging genetics analysis demonstrate that the proposed
visualization and robust kernel CCA can be applied effectively to both ideal
data and contaminated data. The robust methods show the superior performance
over the state-of-the-art methods