450 research outputs found

    Explora : interactive querying of multidimensional data in the context of smart cities

    Get PDF
    Citizen engagement is one of the key factors for smart city initiatives to remain sustainable over time. This in turn entails providing citizens and other relevant stakeholders with the latest data and tools that enable them to derive insights that add value to their day-to-day life. The massive volume of data being constantly produced in these smart city environments makes satisfying this requirement particularly challenging. This paper introduces Explora, a generic framework for serving interactive low-latency requests, typical of visual exploratory applications on spatiotemporal data, which leverages the stream processing for deriving-on ingestion time-synopsis data structures that concisely capture the spatial and temporal trends and dynamics of the sensed variables and serve as compacted data sets to provide fast (approximate) answers to visual queries on smart city data. The experimental evaluation conducted on proof-of-concept implementations of Explora, based on traditional database and distributed data processing setups, accounts for a decrease of up to 2 orders of magnitude in query latency compared to queries running on the base raw data at the expense of less than 10% query accuracy and 30% data footprint. The implementation of the framework on real smart city data along with the obtained experimental results prove the feasibility of the proposed approach

    Qd-tree: Learning Data Layouts for Big Data Analytics

    Full text link
    Corporations today collect data at an unprecedented and accelerating scale, making the need to run queries on large datasets increasingly important. Technologies such as columnar block-based data organization and compression have become standard practice in most commercial database systems. However, the problem of best assigning records to data blocks on storage is still open. For example, today's systems usually partition data by arrival time into row groups, or range/hash partition the data based on selected fields. For a given workload, however, such techniques are unable to optimize for the important metric of the number of blocks accessed by a query. This metric directly relates to the I/O cost, and therefore performance, of most analytical queries. Further, they are unable to exploit additional available storage to drive this metric down further. In this paper, we propose a new framework called a query-data routing tree, or qd-tree, to address this problem, and propose two algorithms for their construction based on greedy and deep reinforcement learning techniques. Experiments over benchmark and real workloads show that a qd-tree can provide physical speedups of more than an order of magnitude compared to current blocking schemes, and can reach within 2X of the lower bound for data skipping based on selectivity, while providing complete semantic descriptions of created blocks.Comment: ACM SIGMOD 202

    Time Series Management Systems: A 2022 Survey

    Get PDF

    A Quality Model for Actionable Analytics in Rapid Software Development

    Get PDF
    Background: Accessing relevant data on the product, process, and usage perspectives of software as well as integrating and analyzing such data is crucial for getting reliable and timely actionable insights aimed at continuously managing software quality in Rapid Software Development (RSD). In this context, several software analytics tools have been developed in recent years. However, there is a lack of explainable software analytics that software practitioners trust. Aims: We aimed at creating a quality model (called Q-Rapids quality model) for actionable analytics in RSD, implementing it, and evaluating its understandability and relevance. Method: We performed workshops at four companies in order to determine relevant metrics as well as product and process factors. We also elicited how these metrics and factors are used and interpreted by practitioners when making decisions in RSD. We specified the Q-Rapids quality model by comparing and integrating the results of the four workshops. Then we implemented the Q-Rapids tool to support the usage of the Q-Rapids quality model as well as the gathering, integration, and analysis of the required data. Afterwards we installed the Q-Rapids tool in the four companies and performed semi-structured interviews with eight product owners to evaluate the understandability and relevance of the Q-Rapids quality model. Results: The participants of the evaluation perceived the metrics as well as the product and process factors of the Q-Rapids quality model as understandable. Also, they considered the Q-Rapids quality model relevant for identifying product and process deficiencies (e.g., blocking code situations). Conclusions: By means of heterogeneous data sources, the Q-Rapids quality model enables detecting problems that take more time to find manually and adds transparency among the perspectives of system, process, and usage.Comment: This is an Author's Accepted Manuscript of a paper to be published by IEEE in the 44th Euromicro Conference on Software Engineering and Advanced Applications (SEAA) 2018. The final authenticated version will be available onlin
    • …
    corecore