54 research outputs found
Dynamic Linear Discriminant Analysis in High Dimensional Space
High-dimensional data that evolve dynamically feature predominantly in the
modern data era. As a partial response to this, recent years have seen
increasing emphasis to address the dimensionality challenge. However, the
non-static nature of these datasets is largely ignored. This paper addresses
both challenges by proposing a novel yet simple dynamic linear programming
discriminant (DLPD) rule for binary classification. Different from the usual
static linear discriminant analysis, the new method is able to capture the
changing distributions of the underlying populations by modeling their means
and covariances as smooth functions of covariates of interest. Under an
approximate sparse condition, we show that the conditional misclassification
rate of the DLPD rule converges to the Bayes risk in probability uniformly over
the range of the variables used for modeling the dynamics, when the
dimensionality is allowed to grow exponentially with the sample size. The
minimax lower bound of the estimation of the Bayes risk is also established,
implying that the misclassification rate of our proposed rule is minimax-rate
optimal. The promising performance of the DLPD rule is illustrated via
extensive simulation studies and the analysis of a breast cancer dataset.Comment: 34 pages; 3 figure
MARS: A second-order reduction algorithm for high-dimensional sparse precision matrices estimation
Estimation of the precision matrix (or inverse covariance matrix) is of great
importance in statistical data analysis. However, as the number of parameters
scales quadratically with the dimension p, computation becomes very challenging
when p is large. In this paper, we propose an adaptive sieving reduction
algorithm to generate a solution path for the estimation of precision matrices
under the penalized D-trace loss, with each subproblem being solved by
a second-order algorithm. In each iteration of our algorithm, we are able to
greatly reduce the number of variables in the problem based on the
Karush-Kuhn-Tucker (KKT) conditions and the sparse structure of the estimated
precision matrix in the previous iteration. As a result, our algorithm is
capable of handling datasets with very high dimensions that may go beyond the
capacity of the existing methods. Moreover, for the sub-problem in each
iteration, other than solving the primal problem directly, we develop a
semismooth Newton augmented Lagrangian algorithm with global linear convergence
on the dual problem to improve the efficiency. Theoretical properties of our
proposed algorithm have been established. In particular, we show that the
convergence rate of our algorithm is asymptotically superlinear. The high
efficiency and promising performance of our algorithm are illustrated via
extensive simulation studies and real data applications, with comparison to
several state-of-the-art solvers
HiQR: An efficient algorithm for high dimensional quadratic regression with penalties
This paper investigates the efficient solution of penalized quadratic
regressions in high-dimensional settings. We propose a novel and efficient
algorithm for ridge-penalized quadratic regression that leverages the matrix
structures of the regression with interactions. Building on this formulation,
we develop an alternating direction method of multipliers (ADMM) framework for
penalized quadratic regression with general penalties, including both single
and hybrid penalty functions. Our approach greatly simplifies the calculations
to basic matrix-based operations, making it appealing in terms of both memory
storage and computational complexity.Comment: 18 page
Autoregressive Networks
We propose a first-order autoregressive model for dynamic network processes
in which edges change over time while nodes remain unchanged. The model depicts
the dynamic changes explicitly. It also facilitates simple and efficient
statistical inference such as the maximum likelihood estimators which are
proved to be (uniformly) consistent and asymptotically normal. The model
diagnostic checking can be carried out easily using a permutation test. The
proposed model can apply to any network processes with various underlying
structures but with independent edges. As an illustration, an autoregressive
stochastic block model has been investigated in depth, which characterizes the
latent communities by the transition probabilities over time. This leads to a
more effective spectral clustering algorithm for identifying the latent
communities. Inference for a change point is incorporated into the
autoregressive stochastic block model to cater for possible structure changes.
The developed asymptotic theory as well as the simulation study affirms the
performance of the proposed methods. Application with three real data sets
illustrates both relevance and usefulness of the proposed models
A direct approach for sparse quadratic discriminant analysis
Quadratic discriminant analysis (QDA) is a standard tool for classification due to its simplicity and flexibility. Because the number of its parameters scales quadratically with the number of the variables, QDA is not practical, however, when the dimensionality is relatively large. To address this, we propose a novel procedure named DA-QDA for QDA in analyzing high-dimensional data. Formulated in a simple and coherent framework, DA-QDA aims to directly estimate the key quantities in the Bayes discriminant function including quadratic interactions and a linear index of the variables for classification. Under appropriate sparsity assumptions, we establish consistency results for estimating the interactions and the linear index, and further demonstrate that the misclassification rate of our procedure converges to the optimal Bayes risk, even when the dimensionality is exponentially high with respect to the sample size. An efficient algorithm based on the alternating direction method of multipliers (ADMM) is developed for finding interactions, which is much faster than its competitor in the literature. The promising performance of DA-QDA is illustrated via extensive simulation studies and the analysis of four real datasets
- …