Search CORE

19 research outputs found

Massive Data Exploration using Estimated Cardinalities

Author: Lesot Marie-Jeanne
Nerzic Pierre
Pivert Olivier
Smits Grégory
Publication venue: HAL CCSD
Publication date: 18/07/2022
Field of study

International audienceLinguistic summaries are used in this work to provide personalized exploration functionalities on massive relational data. To ensure a fluid exploration of the data, cardinalities of the data properties described in the summaries are estimated from statistics about the data distribution. The proposed workflow also involves a vocabulary inference mechanism from these statistics and a sampling-based approach to consolidate the estimated cardinalities. The paper shows that soft computing techniques are particularly relevant to build concrete and functional business intelligence solutions

INRIA a CCSD electronic archive server

Exploration de données massives à l'aide d'estimations de cardinalités

Author: Lesot Marie-Jeanne
Nerzic Pierre
Pivert Olivier
Smits Grégory
Publication venue: HAL CCSD
Publication date: 20/10/2022
Field of study

National audienceThis paper describes FuzViz, a tool to explore interactively massive relational data. FuzViz relies on a method building automatically linguistic summaries, that provide concise and intelligible insights in the data content. It offers an interactive view of these summaries, dynamically recomputed on demand. To ensure a fluid exploration of the data, FuzViz exploits the proposition of a highly efficient method for estimating the cardinality ofthe summary properties, estimated from statistics about the data distribution stored in the relational data base, consolidated by a sampling-based approach. The proposed workflow also involves a vocabulary inference mechanism from these statistics.Cet article présente un outil d'exploration interactive de données massives stockées dans un système de gestion de base de données (SGBD), nommé FuzViz. Il repose sur une méthode de construction automatique de résumés linguistiques, qui fournissent une synthèse concise et intelligible du contenu des données. Il offre une vue interactive de ces résumés recalculée dynamiquement selon les demandes de l'utilisateur. Pour assurer une exploration fluide des propriétés décrites par ces résumés, FuzViz s'appuie sur la proposition d'une méthode efficace d'estimations de leurs cardinalités, produites à partir des statistiques sur la distribution des données gérées par le SGBD et consolidées par une approche basée sur un échantillonnage. L'outil propose de plus un mécanisme d'inférence de vocabulaire flou à partir de ces statistiques

INRIA a CCSD electronic archive server