28 research outputs found
Dimensionality Reduction for Stationary Time Series via Stochastic Nonconvex Optimization
Stochastic optimization naturally arises in machine learning. Efficient
algorithms with provable guarantees, however, are still largely missing, when
the objective function is nonconvex and the data points are dependent. This
paper studies this fundamental challenge through a streaming PCA problem for
stationary time series data. Specifically, our goal is to estimate the
principle component of time series data with respect to the covariance matrix
of the stationary distribution. Computationally, we propose a variant of Oja's
algorithm combined with downsampling to control the bias of the stochastic
gradient caused by the data dependency. Theoretically, we quantify the
uncertainty of our proposed stochastic algorithm based on diffusion
approximations. This allows us to prove the asymptotic rate of convergence and
further implies near optimal asymptotic sample complexity. Numerical
experiments are provided to support our analysis
Representation and statistical properties of deep neural networks on structured data
Significant success of deep learning has brought unprecedented challenges to conventional wisdom in statistics, optimization, and applied mathematics. In many high-dimensional applications, e.g., image data of hundreds of thousands of pixels, deep learning is remarkably scalable and mysteriously generalizes well. Although such appealing behavior stimulates wide applications, a fundamental theoretical challenge -- curse of data dimensionality -- naturally arises. Roughly put, the sample complexity in practical applications is significantly smaller than that predicted by theory. It is a common belief that deep neural networks are good at learning various geometric structures hidden in data sets. However, little theory has been established to explain such a power. This thesis aims to bridge the gap between theory and practice by studying function approximation and statistical theories of deep neural networks in exploitation of geometric structures in data.
-- Function Approximation Theories on Low-dimensional Manifolds using Deep Neural Networks.
We first develop an efficient universal approximation theory functions on a low-dimensional Riemannian manifold. A feedforward network architecture is constructed for function approximation, where the size of the network grows depending on the manifold dimension. Furthermore, we prove efficient approximation theory for convolutional residual networks in approximating Besov functions. Lastly, we demonstrate the benefit of overparameterized neural networks in function approximation. Specifically, we show that large neural networks are capable of accurately approximating a target function, and the network itself enjoys Lipschitz continuity.
-- Statistical Theories on Low-dimensional Data using Deep Neural Networks.
Efficient approximation theories of neural networks provide valuable guidelines to properly choose network architectures, when data exhibit geometric structures. In combination with statistical tools, we prove that neural networks can circumvent the curse of data dimensionality and enjoy fast statistical convergence in various learning problems, including nonparametric regression/classification, generative distribution estimation, and doubly-robust policy learning.Ph.D
Statistical Guarantees of Generative Adversarial Networks for Distribution Estimation
Generative Adversarial Networks (GANs) have achieved great success in
unsupervised learning. Despite the remarkable empirical performance, there are
limited theoretical understandings on the statistical properties of GANs. This
paper provides statistical guarantees of GANs for the estimation of data
distributions which have densities in a H\"{o}lder space. Our main result shows
that, if the generator and discriminator network architectures are properly
chosen (universally for all distributions with H\"{o}lder densities), GANs are
consistent estimators of the data distributions under strong discrepancy
metrics, such as the Wasserstein distance. To our best knowledge, this is the
first statistical theory of GANs for H\"{o}lder densities. In comparison with
existing works, our theory requires minimum assumptions on data distributions.
Our generator and discriminator networks utilize general weight matrices and
the non-invertible ReLU activation function, while many existing works only
apply to invertible weight matrices and invertible activation functions. In our
analysis, we decompose the error into a statistical error and an approximation
error by a new oracle inequality, which may be of independent interest
Score Approximation, Estimation and Distribution Recovery of Diffusion Models on Low-Dimensional Data
Diffusion models achieve state-of-the-art performance in various generation
tasks. However, their theoretical foundations fall far behind. This paper
studies score approximation, estimation, and distribution recovery of diffusion
models, when data are supported on an unknown low-dimensional linear subspace.
Our result provides sample complexity bounds for distribution estimation using
diffusion models. We show that with a properly chosen neural network
architecture, the score function can be both accurately approximated and
efficiently estimated. Furthermore, the generated distribution based on the
estimated score function captures the data geometric structures and converges
to a close vicinity of the data distribution. The convergence rate depends on
the subspace dimension, indicating that diffusion models can circumvent the
curse of data ambient dimensionality.Comment: 52 pages, 4 figure
Counterfactual Generative Models for Time-Varying Treatments
Estimating the counterfactual outcome of treatment is essential for
decision-making in public health and clinical science, among others. Often,
treatments are administered in a sequential, time-varying manner, leading to an
exponentially increased number of possible counterfactual outcomes.
Furthermore, in modern applications, the outcomes are high-dimensional and
conventional average treatment effect estimation fails to capture disparities
in individuals. To tackle these challenges, we propose a novel conditional
generative framework capable of producing counterfactual samples under
time-varying treatment, without the need for explicit density estimation. Our
method carefully addresses the distribution mismatch between the observed and
counterfactual distributions via a loss function based on inverse probability
weighting. We present a thorough evaluation of our method using both synthetic
and real-world data. Our results demonstrate that our method is capable of
generating high-quality counterfactual samples and outperforms the
state-of-the-art baselines
Efficient RL with Impaired Observability: Learning to Act with Delayed and Missing State Observations
In real-world reinforcement learning (RL) systems, various forms of impaired
observability can complicate matters. These situations arise when an agent is
unable to observe the most recent state of the system due to latency or lossy
channels, yet the agent must still make real-time decisions. This paper
introduces a theoretical investigation into efficient RL in control systems
where agents must act with delayed and missing state observations. We establish
near-optimal regret bounds, of the form , for RL in both the delayed and missing observation settings.
Despite impaired observability posing significant challenges to the policy
class and planning, our results demonstrate that learning remains efficient,
with the regret bound optimally depending on the state-action size of the
original system. Additionally, we provide a characterization of the performance
of the optimal policy under impaired observability, comparing it to the optimal
value obtained with full observability