280 research outputs found
Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization'' by V. Koltchinskii
Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and
oracle inequalities in risk minimization'' by V. Koltchinskii [arXiv:0708.0083]Comment: Published at http://dx.doi.org/10.1214/009053606000001055 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Generalization error for multi-class margin classification
In this article, we study rates of convergence of the generalization error of
multi-class margin classifiers. In particular, we develop an upper bound theory
quantifying the generalization error of various large margin classifiers. The
theory permits a treatment of general margin losses, convex or nonconvex, in
presence or absence of a dominating class. Three main results are established.
First, for any fixed margin loss, there may be a trade-off between the ideal
and actual generalization performances with respect to the choice of the class
of candidate decision functions, which is governed by the trade-off between the
approximation and estimation errors. In fact, different margin losses lead to
different ideal or actual performances in specific cases. Second, we
demonstrate, in a problem of linear learning, that the convergence rate can be
arbitrarily fast in the sample size depending on the joint distribution of
the input/output pair. This goes beyond the anticipated rate .
Third, we establish rates of convergence of several margin classifiers in
feature selection with the number of candidate variables allowed to greatly
exceed the sample size but no faster than .Comment: Published at http://dx.doi.org/10.1214/07-EJS069 in the Electronic
Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Conversions between barycentric, RKFUN, and Newton representations of rational interpolants
We derive explicit formulas for converting between rational interpolants in
barycentric, rational Krylov (RKFUN), and Newton form. We show applications of
these conversions when working with rational approximants produced by the AAA
algorithm [Y. Nakatsukasa, O. S\`ete, L. N. Trefethen, arXiv preprint
1612.00337, 2016] within the Rational Krylov Toolbox and for the solution of
nonlinear eigenvalue problems
Boosting Data Analytics With Synthetic Volume Expansion
Synthetic data generation, a cornerstone of Generative Artificial
Intelligence, promotes a paradigm shift in data science by addressing data
scarcity and privacy while enabling unprecedented performance. As synthetic
data becomes more prevalent, concerns emerge regarding the accuracy of
statistical methods when applied to synthetic data in contrast to raw data.
This article explores the effectiveness of statistical methods on synthetic
data and the privacy risks of synthetic data. Regarding effectiveness, we
present the Synthetic Data Generation for Analytics framework. This framework
applies statistical approaches to high-quality synthetic data produced by
generative models like tabular diffusion models, which, initially trained on
raw data, benefit from insights from pertinent studies through transfer
learning. A key finding within this framework is the generational effect, which
reveals that the error rate of statistical methods on synthetic data decreases
with the addition of more synthetic data but may eventually rise or stabilize.
This phenomenon, stemming from the challenge of accurately mirroring raw data
distributions, highlights a "reflection point"-an ideal volume of synthetic
data defined by specific error metrics. Through three case studies, sentiment
analysis, predictive modeling of structured data, and inference in tabular
data, we validate the superior performance of this framework compared to
conventional approaches. On privacy, synthetic data imposes lower risks while
supporting the differential privacy standard. These studies underscore
synthetic data's untapped potential in redefining data science's landscape
Perturbation-Assisted Sample Synthesis: A Novel Approach for Uncertainty Quantification
This paper introduces a novel generator called Perturbation-Assisted Sample
Synthesis (PASS), designed for drawing reliable conclusions from complex data,
especially when using advanced modeling techniques like deep neural networks.
PASS utilizes perturbation to generate synthetic data that closely mirrors the
distribution of raw data, encompassing numerical and unstructured data types
such as gene expression, images, and text. By estimating the data-generating
distribution and leveraging large pre-trained generative models, PASS enhances
estimation accuracy, providing an estimated distribution of any statistic
through Monte Carlo experiments. Building on PASS, we propose a generative
inference framework called Perturbation-Assisted Inference (PAI), which offers
a statistical guarantee of validity. In pivotal inference, PAI enables accurate
conclusions without knowing a pivotal's distribution as in simulations, even
with limited data. In non-pivotal situations, we train PASS using an
independent holdout sample, resulting in credible conclusions. To showcase
PAI's capability in tackling complex problems, we highlight its applications in
three domains: image synthesis inference, sentiment word inference, and
multimodal inference via stable diffusion
- β¦