450 research outputs found
Explora : interactive querying of multidimensional data in the context of smart cities
Citizen engagement is one of the key factors for smart city initiatives to remain sustainable over time. This in turn entails providing citizens and other relevant stakeholders with the latest data and tools that enable them to derive insights that add value to their day-to-day life. The massive volume of data being constantly produced in these smart city environments makes satisfying this requirement particularly challenging. This paper introduces Explora, a generic framework for serving interactive low-latency requests, typical of visual exploratory applications on spatiotemporal data, which leverages the stream processing for deriving-on ingestion time-synopsis data structures that concisely capture the spatial and temporal trends and dynamics of the sensed variables and serve as compacted data sets to provide fast (approximate) answers to visual queries on smart city data. The experimental evaluation conducted on proof-of-concept implementations of Explora, based on traditional database and distributed data processing setups, accounts for a decrease of up to 2 orders of magnitude in query latency compared to queries running on the base raw data at the expense of less than 10% query accuracy and 30% data footprint. The implementation of the framework on real smart city data along with the obtained experimental results prove the feasibility of the proposed approach
Qd-tree: Learning Data Layouts for Big Data Analytics
Corporations today collect data at an unprecedented and accelerating scale,
making the need to run queries on large datasets increasingly important.
Technologies such as columnar block-based data organization and compression
have become standard practice in most commercial database systems. However, the
problem of best assigning records to data blocks on storage is still open. For
example, today's systems usually partition data by arrival time into row
groups, or range/hash partition the data based on selected fields. For a given
workload, however, such techniques are unable to optimize for the important
metric of the number of blocks accessed by a query. This metric directly
relates to the I/O cost, and therefore performance, of most analytical queries.
Further, they are unable to exploit additional available storage to drive this
metric down further.
In this paper, we propose a new framework called a query-data routing tree,
or qd-tree, to address this problem, and propose two algorithms for their
construction based on greedy and deep reinforcement learning techniques.
Experiments over benchmark and real workloads show that a qd-tree can provide
physical speedups of more than an order of magnitude compared to current
blocking schemes, and can reach within 2X of the lower bound for data skipping
based on selectivity, while providing complete semantic descriptions of created
blocks.Comment: ACM SIGMOD 202
A Quality Model for Actionable Analytics in Rapid Software Development
Background: Accessing relevant data on the product, process, and usage
perspectives of software as well as integrating and analyzing such data is
crucial for getting reliable and timely actionable insights aimed at
continuously managing software quality in Rapid Software Development (RSD). In
this context, several software analytics tools have been developed in recent
years. However, there is a lack of explainable software analytics that software
practitioners trust. Aims: We aimed at creating a quality model (called
Q-Rapids quality model) for actionable analytics in RSD, implementing it, and
evaluating its understandability and relevance. Method: We performed workshops
at four companies in order to determine relevant metrics as well as product and
process factors. We also elicited how these metrics and factors are used and
interpreted by practitioners when making decisions in RSD. We specified the
Q-Rapids quality model by comparing and integrating the results of the four
workshops. Then we implemented the Q-Rapids tool to support the usage of the
Q-Rapids quality model as well as the gathering, integration, and analysis of
the required data. Afterwards we installed the Q-Rapids tool in the four
companies and performed semi-structured interviews with eight product owners to
evaluate the understandability and relevance of the Q-Rapids quality model.
Results: The participants of the evaluation perceived the metrics as well as
the product and process factors of the Q-Rapids quality model as
understandable. Also, they considered the Q-Rapids quality model relevant for
identifying product and process deficiencies (e.g., blocking code situations).
Conclusions: By means of heterogeneous data sources, the Q-Rapids quality model
enables detecting problems that take more time to find manually and adds
transparency among the perspectives of system, process, and usage.Comment: This is an Author's Accepted Manuscript of a paper to be published by
IEEE in the 44th Euromicro Conference on Software Engineering and Advanced
Applications (SEAA) 2018. The final authenticated version will be available
onlin
- …