Search CORE

12 research outputs found

Big Data Visualization Tools

Author: Bikakis Nikos
Publication venue
Publication date: 22/02/2018
Field of study

Data visualization is the presentation of data in a pictorial or graphical format, and a data visualization tool is the software that generates this presentation. Data visualization provides users with intuitive means to interactively explore and analyze data, enabling them to effectively identify interesting patterns, infer correlations and causalities, and supports sense-making activities.Comment: This article appears in Encyclopedia of Big Data Technologies, Springer, 201

arXiv.org e-Print Archive

XSnippets : exploring semi-structured data via snippets

Author: Chen Lu
Islam Md Saiful
Liu Chengfei
Naseriparsa Mehdi
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Users are usually not familiar with the content and structure of the data when they explore the data source. However, to improve the exploration usability, they need some primary hints about the data source. These hints should represent the overall picture of the data source and include the trending issues that can be extracted from the query log. In this paper, we propose a two-phase interactive exploratory search framework for the clueless users that exploits the snippets for conducting the search on the XML data. In the first phase, we present the primary snippets that are generated from the keywords of the query log to start the exploration. To retrieve the primary snippets, we develop an A* search-based technique on the keyword space of the query log. To improve the performance of computations, we store the primary snippet computations in an index data structure to reuse it for the next steps. In the second phase, we exploit the co-occurring content of the snippets to generate more specific snippets with the user interaction. To expedite the performance, we design two pruning techniques called inter-snippet and intra-snippet pruning to stop unnecessary computations. Finally, we discuss a termination condition that checks the cardinality of the snippets to stop the interactive phase and present the final Top-l snippets to the user. Our experiments on real datasets verify the effectiveness and efficiency of the proposed framework. © 2019 Elsevier B.V

Federation ResearchOnline

DBEst : revisiting approximate query processing engines with machine learning models

Author: Ma Qingzhi
Triantafillou Peter
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

In the era of big data, computing exact answers to analytical queries becomes prohibitively expensive. This greatly increases the value of approaches that can compute efficiently approximate, but highly-accurate, answers to analytical queries. Alas, the state of the art still suffers from many shortcomings: Errors are still high, unless large memory investments are made. Many important analytics tasks are not supported. Query response times are too long and thus approaches rely on parallel execution of queries atop large big data analytics clusters, in-situ or in the cloud, whose acquisition/use costs dearly. Hence, the following questions are crucial: Can we develop AQP engines that reduce response times by orders of magnitude, ensure high accuracy, and support most aggregate functions? With smaller memory footprints and small overheads to build the state upon which they are based? With this paper, we show that the answers to all questions above can be positive. The paper presents DBEst, a system based on Machine Learning models (regression models and probability density estimators). It will discuss its limitations, promises, and how it can complement existing systems. It will substantiate its advantages using queries and data from the TPC-DS benchmark and real-life datasets, compared against state of the art AQP engines

Warwick Research Archives Portal Repository