Search CORE

6 research outputs found

Evaluating Visual Data Analysis Systems: A Discussion Report

Author: Angelini Marco
Battle Leilani
Binnig Carsten
Catarci Tiziana
Eichmann Philipp
Fekete Jean-Daniel
Santucci Giuseppe
Sedlmair Michael
Willett Wesley
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

International audienceVisual data analysis is a key tool for helping people to make sense of and interact with massive data sets. However, existing evaluation methods (e.g., database benchmarks, individual user studies) fail to capture the key points that make systems for visual data analysis (or visual data systems) challenging to design. In November 2017, members of both the Database and Visualization communities came together in a Dagstuhl seminar to discuss the grand challenges in the intersection of data analysis and interactive visualization. In this paper, we report on the discussions of the working group on the evaluation of visual data systems, which addressed questions centered around developing better evaluation methods, such as " How do the different communities evaluate visual data systems? " and " What we could learn from each other to develop evaluation techniques that cut across areas? ". In their discussions, the group brainstormed initial steps towards new joint evaluation methods and developed a first concrete initiative — a trace repository of various real-world workloads and visual data systems — that enables researchers to derive evaluation setups (e.g., performance benchmarks, user studies) under more realistic assumptions, and enables new evaluation perspectives (e.g., broader meta analysis across analysis contexts, reproducibility and comparability across systems)

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

Archivio della ricerca- Università di Roma La Sapienza

HAL-Rennes 1

Navigating Diverse Datasets in the Face of Uncertainty

Author: Alvarez-Ayllon Alejandro
Publication venue
Publication date: 11/09/2023
Field of study

When exploring big volumes of data, one of the challenging aspects is their diversity of origin. Multiple files that have not yet been ingested into a database system may contain information of interest to a researcher, who must curate, understand and sieve their content before being able to extract knowledge. Performance is one of the greatest difficulties in exploring these datasets. On the one hand, examining non-indexed, unprocessed files can be inefficient. On the other hand, any processing before its understanding introduces latency and potentially un- necessary work if the chosen schema matches poorly the data. We have surveyed the state-of-the-art and, fortunately, there exist multiple proposal of solutions to handle data in-situ performantly. Another major difficulty is matching files from multiple origins since their schema and layout may not be compatible or properly documented. Most surveyed solutions overlook this problem, especially for numeric, uncertain data, as is typical in fields like astronomy. The main objective of our research is to assist data scientists during the exploration of unprocessed, numerical, raw data distributed across multiple files based solely on its intrinsic distribution. In this thesis, we first introduce the concept of Equally-Distributed Dependencies, which provides the foundations to match this kind of dataset. We propose PresQ, a novel algorithm that finds quasi-cliques on hypergraphs based on their expected statistical properties. The probabilistic approach of PresQ can be successfully exploited to mine EDD between diverse datasets when the underlying populations can be assumed to be the same. Finally, we propose a two-sample statistical test based on Self-Organizing Maps (SOM). This method can outperform, in terms of power, other classifier-based two- sample tests, being in some cases comparable to kernel-based methods, with the advantage of being interpretable. Both PresQ and the SOM-based statistical test can provide insights that drive serendipitous discoveries

Repositorio de Objetos de Docencia e Investigación de la Universidad de Cádiz

BIG DATA AND ANALYTICS AS A NEW FRONTIER OF ENTERPRISE DATA MANAGEMENT

Author: FADLER Martin
Publication venue: Université de Lausanne, Faculté des hautes études commerciales
Publication date: 01/01/2021
Field of study

Big Data and Analytics (BDA) promises significant value generation opportunities across industries. Even though companies increase their investments, their BDA initiatives fall short of expectations and they struggle to guarantee a return on investments. In order to create business value from BDA, companies must build and extend their data-related capabilities. While BDA literature has emphasized the capabilities needed to analyze the increasing volumes of data from heterogeneous sources, EDM researchers have suggested organizational capabilities to improve data quality. However, to date, little is known how companies actually orchestrate the allocated resources, especially regarding the quality and use of data to create value from BDA. Considering these gaps, this thesis – through five interrelated essays – investigates how companies adapt their EDM capabilities to create additional business value from BDA. The first essay lays the foundation of the thesis by investigating how companies extend their Business Intelligence and Analytics (BI&A) capabilities to build more comprehensive enterprise analytics platforms. The second and third essays contribute to fundamental reflections on how organizations are changing and designing data governance in the context of BDA. The fourth and fifth essays look at how companies provide high quality data to an increasing number of users with innovative EDM tools, that are, machine learning (ML) and enterprise data catalogs (EDC). The thesis outcomes show that BDA has profound implications on EDM practices. In the past, operational data processing and analytical data processing were two “worlds” that were managed separately from each other. With BDA, these "worlds" are becoming increasingly interdependent and organizations must manage the lifecycles of data and analytics products in close coordination. Also, with BDA, data have become the long-expected, strategically relevant resource. As such data must now be viewed as a distinct value driver separate from IT as it requires specific mechanisms to foster value creation from BDA. BDA thus extends data governance goals: in addition to data quality and regulatory compliance, governance should facilitate data use by broadening data availability and enabling data monetization. Accordingly, companies establish comprehensive data governance designs including structural, procedural, and relational mechanisms to enable a broad network of employees to work with data. Existing EDM practices therefore need to be rethought to meet the emerging BDA requirements. While ML is a promising solution to improve data quality in a scalable and adaptable way, EDCs help companies democratize data to a broader range of employees

Serveur académique lausannois

Revisiting the essence and purpose of regional planning:A historical analysis of regional questions - going forward?

Author: Galland Daniel
Harrison John
Tewdwr-Jones Mark
Publication venue
Publication date: 01/01/2021
Field of study

VBN

Periodising planning history in metropolitan regions:An historical-discursive institutionalist approach

Author: Galland Daniel
Stead Dominic
Publication venue
Publication date: 01/01/2022
Field of study

VBN