356 research outputs found
Boosting Data Analytics With Synthetic Volume Expansion
Synthetic data generation, a cornerstone of Generative Artificial
Intelligence, promotes a paradigm shift in data science by addressing data
scarcity and privacy while enabling unprecedented performance. As synthetic
data becomes more prevalent, concerns emerge regarding the accuracy of
statistical methods when applied to synthetic data in contrast to raw data.
This article explores the effectiveness of statistical methods on synthetic
data and the privacy risks of synthetic data. Regarding effectiveness, we
present the Synthetic Data Generation for Analytics framework. This framework
applies statistical approaches to high-quality synthetic data produced by
generative models like tabular diffusion models, which, initially trained on
raw data, benefit from insights from pertinent studies through transfer
learning. A key finding within this framework is the generational effect, which
reveals that the error rate of statistical methods on synthetic data decreases
with the addition of more synthetic data but may eventually rise or stabilize.
This phenomenon, stemming from the challenge of accurately mirroring raw data
distributions, highlights a "reflection point"-an ideal volume of synthetic
data defined by specific error metrics. Through three case studies, sentiment
analysis, predictive modeling of structured data, and inference in tabular
data, we validate the superior performance of this framework compared to
conventional approaches. On privacy, synthetic data imposes lower risks while
supporting the differential privacy standard. These studies underscore
synthetic data's untapped potential in redefining data science's landscape
Perturbation-Assisted Sample Synthesis: A Novel Approach for Uncertainty Quantification
This paper introduces a novel generator called Perturbation-Assisted Sample
Synthesis (PASS), designed for drawing reliable conclusions from complex data,
especially when using advanced modeling techniques like deep neural networks.
PASS utilizes perturbation to generate synthetic data that closely mirrors the
distribution of raw data, encompassing numerical and unstructured data types
such as gene expression, images, and text. By estimating the data-generating
distribution and leveraging large pre-trained generative models, PASS enhances
estimation accuracy, providing an estimated distribution of any statistic
through Monte Carlo experiments. Building on PASS, we propose a generative
inference framework called Perturbation-Assisted Inference (PAI), which offers
a statistical guarantee of validity. In pivotal inference, PAI enables accurate
conclusions without knowing a pivotal's distribution as in simulations, even
with limited data. In non-pivotal situations, we train PASS using an
independent holdout sample, resulting in credible conclusions. To showcase
PAI's capability in tackling complex problems, we highlight its applications in
three domains: image synthesis inference, sentiment word inference, and
multimodal inference via stable diffusion
Scattering for defocusing mass sub-critical NLS
In this paper, we consider the -scattering of defocusing mass
sub-critical nonlinear Schr\"odinger equations with low weighted initial
condition.
It is known that the scattering holds with -data, while the
continuity of inverse wave operator breaks down with -data. Moreover, for
large -data with , there only exists the wave operator
result, but scattering results are lacking.
Our subject is to study the scattering in low weights space. Our results are
divided into two parts. Our first result presents a systematic study on the
scattering on for certain , without any restrictions on
smallness or radial symmetry. This extends the previous results to spaces with
lower weights. Our second result is the almost sure scattering on by
introducing a ``narrowed'' Wiener randomization in physical space. For mass
subcritical NLS when , this result represents the first scattering
result without imposing any conditions related to smallness, radial symmetry,
or weighted properties on the initial data.Comment: 86 page
Hot Mums. Motherhood and Feminism in Post-socialist China
The term “hot mum” (La Ma, 辣妈) has become popular in the Chinese media in the 21st century, being regarded as a “feminist” image of the modern mother, as it breaks with the stereotype of the traditional Chinese mother. Departing from a historical framework of motherhood and feminism, as well as western theories of subjectification and individualization, the article explores the discourses of hot mums in contemporary China. Based on an analysis of more than eight hundred articles in a Chinese database, this article explores the impacts of the image of the hot mum upon practices of motherhood among contemporary Chinese women. The findings show that the notion of the hot mum has been transformed into the concept of “all-around hot mums” who take care of both their families and their careers. It is argued that this process has not changed power relations between men and women, nor the roles of father and mother. Commercial and market aspects have turned hot mums from an initial expression of women’s subjectivity with particular maternal values into subjects of consumerism. The hot mum discourse is apparently contributing to the oppression rather than empowerment of Chinese women, let alone their increased sense of individuality
STL-SGD: Speeding Up Local SGD with Stagewise Communication Period
Distributed parallel stochastic gradient descent algorithms are workhorses
for large scale machine learning tasks. Among them, local stochastic gradient
descent (Local SGD) has attracted significant attention due to its low
communication complexity. Previous studies prove that the communication
complexity of Local SGD with a fixed or an adaptive communication period is in
the order of and when the data distributions on clients are identical (IID) or
otherwise (Non-IID), where is the number of clients and is the number
of iterations. In this paper, to accelerate the convergence by reducing the
communication complexity, we propose \textit{ST}agewise \textit{L}ocal
\textit{SGD} (STL-SGD), which increases the communication period gradually
along with decreasing learning rate. We prove that STL-SGD can keep the same
convergence rate and linear speedup as mini-batch SGD. In addition, as the
benefit of increasing the communication period, when the objective is strongly
convex or satisfies the Polyak-\L ojasiewicz condition, the communication
complexity of STL-SGD is and for the IID case and the Non-IID case respectively, achieving
significant improvements over Local SGD. Experiments on both convex and
non-convex problems demonstrate the superior performance of STL-SGD.Comment: Accepted by AAAI202
- …