7 research outputs found
Statistical viewpoints on network model, PDE Identification, low-rank matrix estimation and deep learning
The phenomenal advancements in modern computational infrastructure enable the massive amounts of data acquisition in high-dimensional feature space possible.
To put it more specific,
the largest datasets available in the industry which often involve up to billions of samples and millions of features.
The nature of datasets arising in modern science and engineering are sometimes even larger, often with the dimension of the same order as, or possibly even larger than, the sample size.
The cornerstone of modern statistics and machine learning has been a precise characterization of how well we can estimate the objects of interests under these huge high-dimensional datasets.
While it remains impossible to consistently estimate in such a high-dimensional regime in general, a large body of research has investigated various structural assumptions under which statistical recovery is possible even in these seemingly ill-posed scenarios.
Examples include a large line of works on sparsity, low-rank assumptions and more abstract generalizations of these.
These structural assumptions on signals are often realized through specially designed norms; i.e., for inducing sparsity of either vector or matrix, entry-wise L1-norm is used; for inducing low-rank matrix, nuclear norm is used.
Not only in parametric, but in non-parametric models, high-dimensional dataset is common in real world applications.
A deep neural network, one of the most successful models in modern machine learning in various tasks, is a primary example of non-parametric model for function estimations.
Tasks such as image classification or speech recognition often require a dataset in high-dimensional space.
For the accurate function estimation avoiding the commonly known curse of dimensionality phenomena, some special structural assumptions on regression functions are imposed.
Under some specific structural assumptions imposed on problems, the main emphasis in this thesis proposal is on exploring how various regularizing penalties can be utilized for estimating parameters and functions in parametric and non-parametric statistical problems.
Specifically, our main focus will be the problems in network science, PDE identification, and neural network.Ph.D
On Excess Risk Convergence Rates of Neural Network Classifiers
The recent success of neural networks in pattern recognition and
classification problems suggests that neural networks possess qualities
distinct from other more classical classifiers such as SVMs or boosting
classifiers. This paper studies the performance of plug-in classifiers based on
neural networks in a binary classification setting as measured by their excess
risks. Compared to the typical settings imposed in the literature, we consider
a more general scenario that resembles actual practice in two respects: first,
the function class to be approximated includes the Barron functions as a proper
subset, and second, the neural network classifier constructed is the minimizer
of a surrogate loss instead of the - loss so that gradient descent-based
numerical optimizations can be easily applied. While the class of functions we
consider is quite large that optimal rates cannot be faster than
, it is a regime in which dimension-free rates are possible
and approximation power of neural networks can be taken advantage of. In
particular, we analyze the estimation and approximation properties of neural
networks to obtain a dimension-free, uniform rate of convergence for the excess
risk. Finally, we show that the rate obtained is in fact minimax optimal up to
a logarithmic factor, and the minimax lower bound shows the effect of the
margin assumption in this regime
AutoDiff: combining Auto-encoder and Diffusion model for tabular data synthesizing
Diffusion model has become a main paradigm for synthetic data generation in
many subfields of modern machine learning, including computer vision, language
model, or speech synthesis. In this paper, we leverage the power of diffusion
model for generating synthetic tabular data. The heterogeneous features in
tabular data have been main obstacles in tabular data synthesis, and we tackle
this problem by employing the auto-encoder architecture. When compared with the
state-of-the-art tabular synthesizers, the resulting synthetic tables from our
model show nice statistical fidelities to the real data, and perform well in
downstream tasks for machine learning utilities. We conducted the experiments
over publicly available datasets. Notably, our model adeptly captures the
correlations among features, which has been a long-standing challenge in
tabular data synthesis. Our code is available at
https://github.com/UCLA-Trustworthy-AI-Lab/AutoDiffusion
High-Dimensional Multivariate Linear Regression with Weighted Nuclear Norm Regularization
We consider a low-rank matrix estimation problem when the data is assumed to be generated from the multivariate linear regression model. To induce the low-rank coefficient matrix, we employ the weighted nuclear norm (WNN) penalty defined as the weighted sum of the singular values of the matrix. The weights are set in a nondecreasing order, which yields the non-convexity of the WNN objective function in the parameter space. Although the objective function has been widely applied, studies on the estimation properties of its resulting estimator are limited. We propose an efficient algorithm under the framework of the alternative directional method of multipliers (ADMM) to estimate the coefficient matrix. The estimator from the suggested algorithm converges to a stationary point of an augmented Lagrangian function. Under the orthogonal design setting, the effects of the weights for estimating the singular values of the ground-truth coefficient matrix are derived. Under the Gaussian design setting, a minimax convergence rate on the estimation error is derived. We also propose a generalized cross-validation (GCV) criterion for selecting the tuning parameter and an iterative algorithm for updating the weights. Simulations and a real data analysis demonstrate the competitive performance of our new method. Supplementary materials for this article are available online.</p