282 research outputs found
MOD-Net: A Machine Learning Approach via Model-Operator-Data Network for Solving PDEs
In this paper, we propose a model-operator-data network (MOD-Net) for solving
PDEs. A MOD-Net is driven by a model to solve PDEs based on operator
representation with regularization from data. In this work, we use a deep
neural network to parameterize the Green's function. The empirical risk
consists of the mean square of the governing equation, boundary conditions, and
a few labels, which are numerically computed by traditional schemes on coarse
grid points with cheap computation cost. With only the labeled dataset or only
the model constraints, it is insufficient to accurately train a MOD-Net for
complicate problems. Intuitively, the labeled dataset works as a regularization
in addition to the model constraints. The MOD-Net is much efficient than
original neural operator because the MOD-Net also uses the information of
governing equation and the boundary conditions of the PDE rather than purely
the expensive labels. Since the MOD-Net learns the Green's function of a PDE,
it solves a type of PDEs but not a specific case. We numerically show MOD-Net
is very efficient in solving Poisson equation and one-dimensional Boltzmann
equation. For non-linear PDEs, where the concept of the Green's function does
not apply, the non-linear MOD-Net can be similarly used as an ansatz for
solving non-linear PDEs
Optimistic Estimate Uncovers the Potential of Nonlinear Models
We propose an optimistic estimate to evaluate the best possible fitting
performance of nonlinear models. It yields an optimistic sample size that
quantifies the smallest possible sample size to fit/recover a target function
using a nonlinear model. We estimate the optimistic sample sizes for matrix
factorization models, deep models, and deep neural networks (DNNs) with
fully-connected or convolutional architecture. For each nonlinear model, our
estimates predict a specific subset of targets that can be fitted at
overparameterization, which are confirmed by our experiments. Our optimistic
estimate reveals two special properties of the DNN models -- free
expressiveness in width and costly expressiveness in connection. These
properties suggest the following architecture design principles of DNNs: (i)
feel free to add neurons/kernels; (ii) restrain from connecting neurons.
Overall, our optimistic estimate theoretically unveils the vast potential of
nonlinear models in fitting at overparameterization. Based on this framework,
we anticipate gaining a deeper understanding of how and why numerous nonlinear
models such as DNNs can effectively realize their potential in practice in the
near future
Linear Stability Hypothesis and Rank Stratification for Nonlinear Models
Models with nonlinear architectures/parameterizations such as deep neural
networks (DNNs) are well known for their mysteriously good generalization
performance at overparameterization. In this work, we tackle this mystery from
a novel perspective focusing on the transition of the target recovery/fitting
accuracy as a function of the training data size. We propose a rank
stratification for general nonlinear models to uncover a model rank as an
"effective size of parameters" for each function in the function space of the
corresponding model. Moreover, we establish a linear stability theory proving
that a target function almost surely becomes linearly stable when the training
data size equals its model rank. Supported by our experiments, we propose a
linear stability hypothesis that linearly stable functions are preferred by
nonlinear training. By these results, model rank of a target function predicts
a minimal training data size for its successful recovery. Specifically for the
matrix factorization model and DNNs of fully-connected or convolutional
architectures, our rank stratification shows that the model rank for specific
target functions can be much lower than the size of model parameters. This
result predicts the target recovery capability even at heavy
overparameterization for these nonlinear models as demonstrated quantitatively
by our experiments. Overall, our work provides a unified framework with
quantitative prediction power to understand the mysterious target recovery
behavior at overparameterization for general nonlinear models
- …