Search CORE

5 research outputs found

Multi-Sorted Inverse Frequent Itemsets Mining: On-Going Research

Author: Piccolo Antonio
Saccà Domenico
Serra Edoardo
Publication venue: 'IUScholarWorks'
Publication date: 01/01/2016
Field of study

Inverse frequent itemset mining (IFM) consists of generating artificial transactional databases reflecting patterns of real ones, in particular, satisfying given frequency constraints on the itemsets. An extension of IFM called many-sorted IFM, is introduced where the schemes for the datasets to be generated are those typical of Big Tables, as required in emerging big data applications, e.g., social network analytics

Boise State University - ScholarWorks

Generating Synthetic Discrete Datasets with Machine Learning

Author: Manco Giuseppe
Ritacco Ettore
Rullo Antonino
Saccà Domenico
Serra Edoardo
Publication venue: 'IUScholarWorks'
Publication date: 01/01/2022
Field of study

The real data are not always available/accessible/sufficient or in many cases they are incomplete and lacking in semantic content necessary to the definition of optimization processes. In this paper we discuss about the synthetic data generation under two different perspectives. The core common idea is to analyze a limited set of real data to learn the main patterns that characterize them and exploit this knowledge to generate brand new data. The first perspective is constraint-based generation and consists in generating a synthetic dataset satisfying given support constraints on the real frequent patterns. The second one is based on probabilistic generative modeling and considers the synthetic generation as a sampling process from a parametric distribution learned on the real data, typically encoded as a neural network (e.g. Variational Autoencoders, Generative Adversarial Networks)

Archivio istituzionale della ricerca - Università degli Studi di Udine

Boise State University - ScholarWorks

Machine Learning Methods for Generating High Dimensional Discrete Datasets

Author: Manco Giuseppe
Ritacco Ettore
Rullo Antonino
Saccà Domenico
Serra Edoardo
Publication venue: 'IUScholarWorks'
Publication date: 01/03/2022
Field of study

The development of platforms and techniques for emerging Big Data and Machine Learning applications requires the availability of real-life datasets. A possible solution is to synthesize datasets that reflect patterns of real ones using a two-step approach: first, a real dataset X is analyzed to derive relevant patterns Z and, then, to use such patterns for reconstructing a new dataset X\u27 that preserves the main characteristics of X. This survey explores two possible approaches: (1) Constraint-based generation and (2) probabilistic generative modeling. The former is devised using inverse mining (IFM) techniques, and consists of generating a dataset satisfying given support constraints on the itemsets of an input set, that are typically the frequent ones. By contrast, for the latter approach, recent developments in probabilistic generative modeling (PGM) are explored that model the generation as a sampling process from a parametric distribution, typically encoded as neural network. The two approaches are compared by providing an overview of their instantiations for the case of discrete data and discussing their pros and cons

Boise State University - ScholarWorks

Machine learning methods for generating high dimensional discrete datasets

Author: Manco G.
Ritacco E.
Rullo A.
Sacca D.
Serra E.
Publication venue
Publication date: 01/01/2022
Field of study

The development of platforms and techniques for emerging Big Data and Machine Learning applications requires the availability of real-life datasets. A possible solution is to synthesize datasets that reflect patterns of real ones using a two-step approach: first, a real dataset X is analyzed to derive relevant patterns Z and, then, to use such patterns for reconstructing a new dataset X ' that preserves the main characteristics of X. This survey explores two possible approaches: (1) Constraint-based generation and (2) probabilistic generative modeling. The former is devised using inverse mining (IFM) techniques, and consists of generating a dataset satisfying given support constraints on the itemsets of an input set, that are typically the frequent ones. By contrast, for the latter approach, recent developments in probabilistic generative modeling (PGM) are explored that model the generation as a sampling process from a parametric distribution, typically encoded as neural network. The two approaches are compared by providing an overview of their instantiations for the case of discrete data and discussing their pros and cons. This article is categorized under: Fundamental Concepts of Data and Knowledge > Big Data Mining Technologies > Machine Learning Algorithmic Development > Structure Discover

Archivio istituzionale della ricerca - Università degli Studi di Udine