Search CORE

3 research outputs found

An Efficient Method to Estimate the Optimum Regularization Parameter in RLDA

Author: Bakir Daniyar
Pappachen Alex James
Zollanvari Amin
Publication venue
Publication date: 01/01/2016
Field of study

Motivation: The biomarker discovery process in high-throughput genomic profiles has presented the statistical learning community with a challenging problem, namely learning when the number of variables is comparable or exceeding the sample size. In these settings, many classical techniques including linear discriminant analysis (LDA) falter. Poor performance of LDA is attributed to the ill-conditioned nature of sample covariance matrix when the dimension and sample size are comparable. To alleviate this problem regularized LDA (RLDA) has been classically proposed in which the sample covariance matrix is replaced by its ridge estimate. However, the performance of RLDA depends heavily on the regularization parameter used in the ridge estimate of sample covariance matrix

Nazarbayev University Repository

HIGH-DIMENSIONAL SIGNAL PROCESSING AND STATISTICAL LEARNING

Author: Bakir Daniyar
Publication venue: Nazarbayev University School of Engineering and Digital Sciences
Publication date: 01/01/2017
Field of study

Classical statistical and signal processing techniques are not generally useful in situations wherein the dimensionality (p) of observations is comparable or exceeding the sample size (n). This is mainly due to the fact that the performance of these techniques is guaranteed through classical notion of statistical consistency, which is itself fashioned for situations wherein n >> p. Statistical consistency has been viogorously used in the past century to develop many signal processing and statistical learning techniques. However, in recent years, two sets of mathematical machineries have emerged that show the possibility of developing superior techniques suitable for analyzing high-dimensional observations, i.e., situations where p >> n. In this thesis, we refer to these techniques, which are grounded either in double asymptotic regimes or sparsity assumptions, as high-dimensional techniques. In this thesis, we examine and develop a set of high-dimensional techniques with applications in classification. The thesis is mainly divided to three parts. In the first part, we introduce a novel approach based on double asymptotics to estimate the regularization parameter used in a well-known technique known as RLDA classifier. We examine the robustness of the developed approach to Gaussianity, an assumption used in developing the core estimator. The performance of the technique in terms of accuracy and efficiency is verified against other popular methods such as cross-validation. In the second part of the thesis, the performance of the newly developed RLDA and several other classifiers are compared in situations where p is comparable or exceeding n. While in the first two parts of the thesis, we focus more on double asympii totic methods, in the third part, we study two important class of techniques based on sparsity assumption. One of these techniques known as LASSO has gained much attention in recent years within the statistical community, while the second one, known as compressed sensing, has become very popular in signal processing literature. Although both of these techniques use sparsity assumptions as well as L1 minimization, the objective functions and constrains they are constructed on are different. In the third part of the thesis, we demonstrate the application of both techniques in high-dimensional classification and compare them in terms of shrinkage rate and classification accurac

Nazarbayev University Repository