3 research outputs found
An Efficient Method to Estimate the Optimum Regularization Parameter in RLDA
Motivation: The biomarker discovery process in high-throughput genomic profiles has presented the statistical learning community with a challenging problem, namely learning when the number of variables is comparable or exceeding the sample size. In these settings, many classical techniques including linear discriminant analysis (LDA) falter. Poor performance of LDA is attributed to the ill-conditioned nature of sample covariance matrix when the dimension and sample size are comparable. To alleviate this problem regularized LDA (RLDA) has been classically proposed in which the sample covariance matrix is replaced by its ridge estimate. However, the performance of RLDA depends heavily on the regularization parameter used in the ridge estimate of sample covariance matrix
HIGH-DIMENSIONAL SIGNAL PROCESSING AND STATISTICAL LEARNING
Classical statistical and signal processing techniques are not generally
useful in situations wherein the dimensionality (p) of observations is comparable
or exceeding the sample size (n). This is mainly due to the fact that
the performance of these techniques is guaranteed through classical notion of
statistical consistency, which is itself fashioned for situations wherein n >> p.
Statistical consistency has been viogorously used in the past century to develop
many signal processing and statistical learning techniques. However, in recent
years, two sets of mathematical machineries have emerged that show the possibility
of developing superior techniques suitable for analyzing high-dimensional
observations, i.e., situations where p >> n. In this thesis, we refer to these
techniques, which are grounded either in double asymptotic regimes or sparsity
assumptions, as high-dimensional techniques.
In this thesis, we examine and develop a set of high-dimensional techniques
with applications in classification. The thesis is mainly divided to three
parts. In the first part, we introduce a novel approach based on double asymptotics
to estimate the regularization parameter used in a well-known technique
known as RLDA classifier. We examine the robustness of the developed approach
to Gaussianity, an assumption used in developing the core estimator.
The performance of the technique in terms of accuracy and efficiency is verified
against other popular methods such as cross-validation. In the second part of
the thesis, the performance of the newly developed RLDA and several other
classifiers are compared in situations where p is comparable or exceeding n.
While in the first two parts of the thesis, we focus more on double asympii
totic methods, in the third part, we study two important class of techniques
based on sparsity assumption. One of these techniques known as LASSO has
gained much attention in recent years within the statistical community, while the
second one, known as compressed sensing, has become very popular in signal
processing literature. Although both of these techniques use sparsity assumptions
as well as L1 minimization, the objective functions and constrains they are
constructed on are different. In the third part of the thesis, we demonstrate the
application of both techniques in high-dimensional classification and compare
them in terms of shrinkage rate and classification accurac