Search CORE

7 research outputs found

Statistical viewpoints on network model, PDE Identification, low-rank matrix estimation and deep learning

Author: Suh Namjoon
Publication venue: Georgia Institute of Technology
Publication date: 10/01/2023
Field of study

The phenomenal advancements in modern computational infrastructure enable the massive amounts of data acquisition in high-dimensional feature space possible. To put it more specific, the largest datasets available in the industry which often involve up to billions of samples and millions of features. The nature of datasets arising in modern science and engineering are sometimes even larger, often with the dimension of the same order as, or possibly even larger than, the sample size. The cornerstone of modern statistics and machine learning has been a precise characterization of how well we can estimate the objects of interests under these huge high-dimensional datasets. While it remains impossible to consistently estimate in such a high-dimensional regime in general, a large body of research has investigated various structural assumptions under which statistical recovery is possible even in these seemingly ill-posed scenarios. Examples include a large line of works on sparsity, low-rank assumptions and more abstract generalizations of these. These structural assumptions on signals are often realized through specially designed norms; i.e., for inducing sparsity of either vector or matrix, entry-wise L1-norm is used; for inducing low-rank matrix, nuclear norm is used. Not only in parametric, but in non-parametric models, high-dimensional dataset is common in real world applications. A deep neural network, one of the most successful models in modern machine learning in various tasks, is a primary example of non-parametric model for function estimations. Tasks such as image classification or speech recognition often require a dataset in high-dimensional space. For the accurate function estimation avoiding the commonly known curse of dimensionality phenomena, some special structural assumptions on regression functions are imposed. Under some specific structural assumptions imposed on problems, the main emphasis in this thesis proposal is on exploring how various regularizing penalties can be utilized for estimating parameters and functions in parametric and non-parametric statistical problems. Specifically, our main focus will be the problems in network science, PDE identification, and neural network.Ph.D

Scholarly Materials And Research @ Georgia Tech

On Excess Risk Convergence Rates of Neural Network Classifiers

Author: Huo Xiaoming
Ko Hyunouk
Suh Namjoon
Publication venue
Publication date: 26/09/2023
Field of study

The recent success of neural networks in pattern recognition and classification problems suggests that neural networks possess qualities distinct from other more classical classifiers such as SVMs or boosting classifiers. This paper studies the performance of plug-in classifiers based on neural networks in a binary classification setting as measured by their excess risks. Compared to the typical settings imposed in the literature, we consider a more general scenario that resembles actual practice in two respects: first, the function class to be approximated includes the Barron functions as a proper subset, and second, the neural network classifier constructed is the minimizer of a surrogate loss instead of the

0

1

loss so that gradient descent-based numerical optimizations can be easily applied. While the class of functions we consider is quite large that optimal rates cannot be faster than

n^{-\frac{1}{3}}

, it is a regime in which dimension-free rates are possible and approximation power of neural networks can be taken advantage of. In particular, we analyze the estimation and approximation properties of neural networks to obtain a dimension-free, uniform rate of convergence for the excess risk. Finally, we show that the rate obtained is in fact minimax optimal up to a logarithmic factor, and the minimax lower bound shows the effect of the margin assumption in this regime

arXiv.org e-Print Archive

AutoDiff: combining Auto-encoder and Diffusion model for tabular data synthesizing

Author: Cheng Guang
Honarkhah Merhdad
Hsieh Din-Yin
Lin Xiaofeng
Suh Namjoon
Publication venue
Publication date: 16/11/2023
Field of study

Diffusion model has become a main paradigm for synthetic data generation in many subfields of modern machine learning, including computer vision, language model, or speech synthesis. In this paper, we leverage the power of diffusion model for generating synthetic tabular data. The heterogeneous features in tabular data have been main obstacles in tabular data synthesis, and we tackle this problem by employing the auto-encoder architecture. When compared with the state-of-the-art tabular synthesizers, the resulting synthetic tables from our model show nice statistical fidelities to the real data, and perform well in downstream tasks for machine learning utilities. We conducted the experiments over

15

publicly available datasets. Notably, our model adeptly captures the correlations among features, which has been a long-standing challenge in tabular data synthesis. Our code is available at https://github.com/UCLA-Trustworthy-AI-Lab/AutoDiffusion

arXiv.org e-Print Archive

High-Dimensional Multivariate Linear Regression with Weighted Nuclear Norm Regularization

Author: Li-Hsiang Lin (7372223)
Namjoon Suh (18150563)
Xiaoming Huo (760259)
Publication venue
Publication date: 13/03/2024
Field of study

We consider a low-rank matrix estimation problem when the data is assumed to be generated from the multivariate linear regression model. To induce the low-rank coefficient matrix, we employ the weighted nuclear norm (WNN) penalty defined as the weighted sum of the singular values of the matrix. The weights are set in a nondecreasing order, which yields the non-convexity of the WNN objective function in the parameter space. Although the objective function has been widely applied, studies on the estimation properties of its resulting estimator are limited. We propose an efficient algorithm under the framework of the alternative directional method of multipliers (ADMM) to estimate the coefficient matrix. The estimator from the suggested algorithm converges to a stationary point of an augmented Lagrangian function. Under the orthogonal design setting, the effects of the weights for estimating the singular values of the ground-truth coefficient matrix are derived. Under the Gaussian design setting, a minimax convergence rate on the estimation error is derived. We also propose a generalized cross-validation (GCV) criterion for selecting the tuning parameter and an iterative algorithm for updating the weights. Simulations and a real data analysis demonstrate the competitive performance of our new method. Supplementary materials for this article are available online.</p

FigShare