Search CORE

4 research outputs found

Synthetic dataset generation with itemset-based generative models

Author: Arias Vicente Marta
Lezcano Ríos Christian Gerardo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

This paper proposes three different data generators, tailored to transactional datasets, based on existing itemset-based generative models. All these generators are intuitive and easy to implement and show satisfactory performance. The quality of each generator is assessed by means of three different methods that capture how well the original dataset structure is preserved.Both authors have been partially supported by TIN2017-89244-R from MINECO (Spain’s Ministerio de Economia, Industria y Competitividad) and the recognition 2017SGR-856 (MACDA) from AGAUR (Generalitat de Catalunya). Christian Lezcano is supported by Paraguay’s Foreign Postgraduate Scholarship Programme Don Carlos Antonio López (BECAL).Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Synthetic Dataset Generation with Itemset-Based Generative Models

Author: Arias Marta
Lezcano Christian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/07/2020
Field of study

arXiv.org e-Print Archive

Crossref

Machine Learning Methods for Generating High Dimensional Discrete Datasets

Author: Manco Giuseppe
Ritacco Ettore
Rullo Antonino
Saccà Domenico
Serra Edoardo
Publication venue: 'IUScholarWorks'
Publication date: 01/03/2022
Field of study

The development of platforms and techniques for emerging Big Data and Machine Learning applications requires the availability of real-life datasets. A possible solution is to synthesize datasets that reflect patterns of real ones using a two-step approach: first, a real dataset X is analyzed to derive relevant patterns Z and, then, to use such patterns for reconstructing a new dataset X\u27 that preserves the main characteristics of X. This survey explores two possible approaches: (1) Constraint-based generation and (2) probabilistic generative modeling. The former is devised using inverse mining (IFM) techniques, and consists of generating a dataset satisfying given support constraints on the itemsets of an input set, that are typically the frequent ones. By contrast, for the latter approach, recent developments in probabilistic generative modeling (PGM) are explored that model the generation as a sampling process from a parametric distribution, typically encoded as neural network. The two approaches are compared by providing an overview of their instantiations for the case of discrete data and discussing their pros and cons

Boise State University - ScholarWorks

Machine learning methods for generating high dimensional discrete datasets

Author: Manco G.
Ritacco E.
Rullo A.
Sacca D.
Serra E.
Publication venue
Publication date: 01/01/2022
Field of study

The development of platforms and techniques for emerging Big Data and Machine Learning applications requires the availability of real-life datasets. A possible solution is to synthesize datasets that reflect patterns of real ones using a two-step approach: first, a real dataset X is analyzed to derive relevant patterns Z and, then, to use such patterns for reconstructing a new dataset X ' that preserves the main characteristics of X. This survey explores two possible approaches: (1) Constraint-based generation and (2) probabilistic generative modeling. The former is devised using inverse mining (IFM) techniques, and consists of generating a dataset satisfying given support constraints on the itemsets of an input set, that are typically the frequent ones. By contrast, for the latter approach, recent developments in probabilistic generative modeling (PGM) are explored that model the generation as a sampling process from a parametric distribution, typically encoded as neural network. The two approaches are compared by providing an overview of their instantiations for the case of discrete data and discussing their pros and cons. This article is categorized under: Fundamental Concepts of Data and Knowledge > Big Data Mining Technologies > Machine Learning Algorithmic Development > Structure Discover

Archivio istituzionale della ricerca - Università degli Studi di Udine