356 research outputs found

    Boosting Data Analytics With Synthetic Volume Expansion

    Full text link
    Synthetic data generation, a cornerstone of Generative Artificial Intelligence, promotes a paradigm shift in data science by addressing data scarcity and privacy while enabling unprecedented performance. As synthetic data becomes more prevalent, concerns emerge regarding the accuracy of statistical methods when applied to synthetic data in contrast to raw data. This article explores the effectiveness of statistical methods on synthetic data and the privacy risks of synthetic data. Regarding effectiveness, we present the Synthetic Data Generation for Analytics framework. This framework applies statistical approaches to high-quality synthetic data produced by generative models like tabular diffusion models, which, initially trained on raw data, benefit from insights from pertinent studies through transfer learning. A key finding within this framework is the generational effect, which reveals that the error rate of statistical methods on synthetic data decreases with the addition of more synthetic data but may eventually rise or stabilize. This phenomenon, stemming from the challenge of accurately mirroring raw data distributions, highlights a "reflection point"-an ideal volume of synthetic data defined by specific error metrics. Through three case studies, sentiment analysis, predictive modeling of structured data, and inference in tabular data, we validate the superior performance of this framework compared to conventional approaches. On privacy, synthetic data imposes lower risks while supporting the differential privacy standard. These studies underscore synthetic data's untapped potential in redefining data science's landscape

    Perturbation-Assisted Sample Synthesis: A Novel Approach for Uncertainty Quantification

    Full text link
    This paper introduces a novel generator called Perturbation-Assisted Sample Synthesis (PASS), designed for drawing reliable conclusions from complex data, especially when using advanced modeling techniques like deep neural networks. PASS utilizes perturbation to generate synthetic data that closely mirrors the distribution of raw data, encompassing numerical and unstructured data types such as gene expression, images, and text. By estimating the data-generating distribution and leveraging large pre-trained generative models, PASS enhances estimation accuracy, providing an estimated distribution of any statistic through Monte Carlo experiments. Building on PASS, we propose a generative inference framework called Perturbation-Assisted Inference (PAI), which offers a statistical guarantee of validity. In pivotal inference, PAI enables accurate conclusions without knowing a pivotal's distribution as in simulations, even with limited data. In non-pivotal situations, we train PASS using an independent holdout sample, resulting in credible conclusions. To showcase PAI's capability in tackling complex problems, we highlight its applications in three domains: image synthesis inference, sentiment word inference, and multimodal inference via stable diffusion

    Scattering for defocusing mass sub-critical NLS

    Full text link
    In this paper, we consider the Lx2L_x^2-scattering of defocusing mass sub-critical nonlinear Schr\"odinger equations with low weighted initial condition. It is known that the scattering holds with FH1\mathcal{F} H^1-data, while the continuity of inverse wave operator breaks down with L2L^2-data. Moreover, for large FHs\mathcal{F} H^s-data with s<1s<1, there only exists the wave operator result, but scattering results are lacking. Our subject is to study the scattering in low weights space. Our results are divided into two parts. Our first result presents a systematic study on the scattering on FHs\mathcal{F} H^s for certain s<1s<1, without any restrictions on smallness or radial symmetry. This extends the previous results to spaces with lower weights. Our second result is the almost sure scattering on L2L^2 by introducing a ``narrowed'' Wiener randomization in physical space. For mass subcritical NLS when d2d\ge 2, this result represents the first scattering result without imposing any conditions related to smallness, radial symmetry, or weighted properties on the initial data.Comment: 86 page

    Hot Mums. Motherhood and Feminism in Post-socialist China

    Get PDF
    The term “hot mum” (La Ma, 辣妈) has become popular in the Chinese media in the 21st century, being regarded as a “feminist” image of the modern mother, as it breaks with the stereotype of the traditional Chinese mother. Departing from a historical framework of motherhood and feminism, as well as western theories of subjectification and individualization, the article explores the discourses of hot mums in contemporary China. Based on an analysis of more than eight hundred articles in a Chinese database, this article explores the impacts of the image of the hot mum upon practices of motherhood among contemporary Chinese women. The findings show that the notion of the hot mum has been transformed into the concept of “all-around hot mums” who take care of both their families and their careers. It is argued that this process has not changed power relations between men and women, nor the roles of father and mother. Commercial and market aspects have turned hot mums from an initial expression of women’s subjectivity with particular maternal values into subjects of consumerism. The hot mum discourse is apparently contributing to the oppression rather than empowerment of Chinese women, let alone their increased sense of individuality

    STL-SGD: Speeding Up Local SGD with Stagewise Communication Period

    Full text link
    Distributed parallel stochastic gradient descent algorithms are workhorses for large scale machine learning tasks. Among them, local stochastic gradient descent (Local SGD) has attracted significant attention due to its low communication complexity. Previous studies prove that the communication complexity of Local SGD with a fixed or an adaptive communication period is in the order of O(N32T12)O (N^{\frac{3}{2}} T^{\frac{1}{2}}) and O(N34T34)O (N^{\frac{3}{4}} T^{\frac{3}{4}}) when the data distributions on clients are identical (IID) or otherwise (Non-IID), where NN is the number of clients and TT is the number of iterations. In this paper, to accelerate the convergence by reducing the communication complexity, we propose \textit{ST}agewise \textit{L}ocal \textit{SGD} (STL-SGD), which increases the communication period gradually along with decreasing learning rate. We prove that STL-SGD can keep the same convergence rate and linear speedup as mini-batch SGD. In addition, as the benefit of increasing the communication period, when the objective is strongly convex or satisfies the Polyak-\L ojasiewicz condition, the communication complexity of STL-SGD is O(NlogT)O (N \log{T}) and O(N12T12)O (N^{\frac{1}{2}} T^{\frac{1}{2}}) for the IID case and the Non-IID case respectively, achieving significant improvements over Local SGD. Experiments on both convex and non-convex problems demonstrate the superior performance of STL-SGD.Comment: Accepted by AAAI202
    corecore