61 research outputs found

    AutoDiff: combining Auto-encoder and Diffusion model for tabular data synthesizing

    Full text link
    Diffusion model has become a main paradigm for synthetic data generation in many subfields of modern machine learning, including computer vision, language model, or speech synthesis. In this paper, we leverage the power of diffusion model for generating synthetic tabular data. The heterogeneous features in tabular data have been main obstacles in tabular data synthesis, and we tackle this problem by employing the auto-encoder architecture. When compared with the state-of-the-art tabular synthesizers, the resulting synthetic tables from our model show nice statistical fidelities to the real data, and perform well in downstream tasks for machine learning utilities. We conducted the experiments over 1515 publicly available datasets. Notably, our model adeptly captures the correlations among features, which has been a long-standing challenge in tabular data synthesis. Our code is available at https://github.com/UCLA-Trustworthy-AI-Lab/AutoDiffusion

    Inverse sequential simulation: Performance and implementation details

    Full text link
    For good groundwater flow and solute transport numerical modeling, it is important to characterize the formation properties. In this paper, we analyze the performance and important implementation details of a new approach for stochastic inverse modeling called inverse sequential simulation (iSS). This approach is capable of characterizing conductivity fields with heterogeneity patterns difficult to capture by standard multiGaussian-based inverse approaches. The method is based on the multivariate sequential simulation principle, but the covariances and cross-covariances used to compute the local conditional probability distributions are computed by simple co-kriging which are derived from an ensemble of conductivity and piezometric head fields, in a similar manner as the experimental covariances are computed in an ensemble Kalman filtering. A sensitivity analysis is performed on a synthetic aquifer regarding the number of members of the ensemble of realizations, the number of conditioning data, the number of piezometers at which piezometric heads are observed, and the number of nodes retained within the search neighborhood at the moment of computing the local conditional probabilities. The results show the importance of having a sufficiently large number of all of the mentioned parameters for the algorithm to characterize properly hydraulic conductivity fields with clear non-multiGaussian features. © 2015 Elsevier Ltd. All rights reserved.The first author acknowledgs the financial support from the China Scholarship Council (CSC [2010]3010). Financial support to carry out this work was also received from the Spanish Ministry of Economy and Competitiveness through Project CGL2014-59841-P. We thank the three reviewers for their thorough review and their insightful comments, which have helped to improve the final manuscript.Xu, T.; Gómez-Hernández, JJ. (2015). Inverse sequential simulation: Performance and implementation details. Advances in Water Resources. 86B:311-326. https://doi.org/10.1016/j.advwatres.2015.04.015S31132686

    The electronic structure of delta doped semiconductors

    No full text

    Geological Resources Modeling in Mining

    No full text
    corecore