612 research outputs found
Statistical structures for internet-scale data management
Efficient query processing in traditional database management systems relies on statistics on base data. For centralized systems, there is a rich body of research results on such statistics, from simple aggregates to more elaborate synopses such as sketches and histograms. For Internet-scale distributed systems, on the other hand, statistics management still poses major challenges. With the work in this paper we aim to endow peer-to-peer data management over structured overlays with the power associated with such statistical information, with emphasis on meeting the scalability challenge. To this end, we first contribute efficient, accurate, and decentralized algorithms that can compute key aggregates such as Count, CountDistinct, Sum, and Average. We show how to construct several types of histograms, such as simple Equi-Width, Average-Shifted Equi-Width, and Equi-Depth histograms. We present a full-fledged open-source implementation of these tools for distributed statistical synopses, and report on a comprehensive experimental performance evaluation, evaluating our contributions in terms of efficiency, accuracy, and scalability
Recommended from our members
Big data analytics for time critical maritime and aerial mobility forecasting
The correlated exploitation of heterogeneous data sources offering very large archival and streaming data is important to increase the accuracy of computations when analysing and predicting future states of moving entities. Aiming to significantly advance the capacities of systems to improve safety and effectiveness of critical operations involving a large number of moving entities in large geographical areas, this paper describes progress achieved towards time critical big data analytics solutions to user-defined challenges in the air-traffic management and maritime domains. Besides, this paper presents further research challenges concerning data integration and management, predictive analytics for trajectory and events forecasting, and visual analytics
- …