7 research outputs found
Declarative Data Analytics: a Survey
The area of declarative data analytics explores the application of the
declarative paradigm on data science and machine learning. It proposes
declarative languages for expressing data analysis tasks and develops systems
which optimize programs written in those languages. The execution engine can be
either centralized or distributed, as the declarative paradigm advocates
independence from particular physical implementations. The survey explores a
wide range of declarative data analysis frameworks by examining both the
programming model and the optimization techniques used, in order to provide
conclusions on the current state of the art in the area and identify open
challenges.Comment: 36 pages, 2 figure
Progressive join algorithms considering user preference
Progressive query processing is a new attractive paradigm for exploratory data analysis. This paper considers the case where users want to receive results ordered according to their preference, and specifically focuses on the design of join algorithms. We investigate the use of contour lines in progressive algorithms with user preferences, and propose ContourJoin to reduce sorting overhead of progressive preference-aware joins. Experimental results show that compared with the na ̈ıve blocking algorithm and the top-k RankJoin algorithm, ContourJoin has superior performance in both early result generation and total result computation
Evaluation of query modifications in progressive data processing
In recent years we have seen an increasing interest in interactive data exploration and visualization applications due to the large volume and heterogeneity of data. Users typically interact with these applications through interface components following a trial-and-error approach, where they submit different queries and based on the results, they refine these queries further. As a consequence, such applications call for a progressive processing paradigm, where periodical feedback is returned to the user, instead of the traditional execution model, where a complete answer needs to be computed.
Data from a quite recent user study [1] on interactive visualization applications show that although real-time interactions pose performance challenges to underlying database systems, the queries generated by many of these interactions present significant similarities. The reason for this is that these queries are usually the result of a steering action on a running query through mechanisms like sliders, panning, adding attributes etc, creating in this way a modified version of the running query. Typical RDBMSes treat queries independently. Recycling of intermediate results, as proposed in previous work [2], mainly targets traditional query processing and assumes reusing of well-defined state or complete results. We examine the sharing of incomplete state/results between relevant queries in the context of the progressive processing paradigm, in order to evaluate such queries more efficiently avoiding redundant
work. Two questions that naturally arise are the handling of incomplete state/results when an interruption happens during query execution, as well as how to communicate query modifications to the system. For the former we propose evaluation strategies that map each query modification to a reusable state between the initial and the altered version of a query, and change the query plan accordingly to take the reusable state into account. For the latter we
elaborate on the idea of the ALTER QUERY command discussed in [3] and design a SQL extension as a means to communicate modifications on specific parts of a query to the RDBMS.
Preliminary experimental results on specific query modifications and evaluation strategies demonstrate considerable performance gains.
References
[1] Database Benchmarking for Supporting Real-Time Interactive Querying of Large Multi-Dimensional Data. L. Battle, P. Eichmann, M. Angel
Evaluation of query modifications in progressive data processing
In recent years we have seen an increasing interest in interactive data exploration and visualization applications due to the large volume and heterogeneity of data. Users typically interact with these applications through interface components following a trial-and-error approach, where they submit different queries and based on the results, they refine these queries further. As a consequence, such applications call for a progressive processing paradigm, where periodical feedback is returned to the user, instead of the traditional execution model, where a complete answer needs to be computed.
Data from a quite recent user study [1] on interactive visualization applications show that although real-time interactions pose performance challenges to underlying database systems, the queries generated by many of these interactions present significant similarities. The reason for this is that these queries are usually the result of a steering action on a running query through mechanisms like sliders, panning, adding attributes etc, creating in this way a modified version of the running query. Typical RDBMSes treat queries independently. Recycling of intermediate results, as proposed in previous work [2], mainly targets traditional query processing and assumes reusing of well-defined state or complete results. We examine the sharing of incomplete state/results between relevant queries in the context of the progressive processing paradigm, in order to evaluate such queries more efficiently avoiding redundant
work. Two questions that naturally arise are the handling of incomplete state/results when an interruption happens during query execution, as well as how to communicate query modifications to the system. For the former we propose evaluation strategies that map each query modification to a reusable state between the initial and the altered version of a query, and change the query plan accordingly to take the reusable state into account. For the latter we
elaborate on the idea of the ALTER QUERY command discussed in [3] and design a SQL extension as a means to communicate modifications on specific parts of a query to the RDBMS.
Preliminary experimental results on specific query modifications and evaluation strategies demonstrate considerable performance gains.
References
[1] Database Benchmarking for Supporting Real-Time Interactive Querying of Large Multi-Dimensional Data. L
Progressive join algorithms considering user preference
Progressive query processing is a new attractive paradigm for exploratory data analysis. This paper considers the case where users want to receive results ordered according to their preference, and specifically focuses on the design of join algorithms. We investigate the use of contour lines in progressive algorithms with user preferences, and propose ContourJoin to reduce sorting overhead of progressive preference-aware joins. Experimental results show that compared with the na ̈ıve blocking algorithm and the top-k RankJoin algorithm, ContourJoin has superior performance in both early result generation and total result computation