13 research outputs found

    Quality of Service Aware Data Stream Processing for Highly Dynamic and Scalable Applications

    Get PDF
    Huge amounts of georeferenced data streams are arriving daily to data stream management systems that are deployed for serving highly scalable and dynamic applications. There are innumerable ways at which those loads can be exploited to gain deep insights in various domains. Decision makers require an interactive visualization of such data in the form of maps and dashboards for decision making and strategic planning. Data streams normally exhibit fluctuation and oscillation in arrival rates and skewness. Those are the two predominant factors that greatly impact the overall quality of service. This requires data stream management systems to be attuned to those factors in addition to the spatial shape of the data that may exaggerate the negative impact of those factors. Current systems do not natively support services with quality guarantees for dynamic scenarios, leaving the handling of those logistics to the user which is challenging and cumbersome. Three workloads are predominant for any data stream, batch processing, scalable storage and stream processing. In this thesis, we have designed a quality of service aware system, SpatialDSMS, that constitutes several subsystems that are covering those loads and any mixed load that results from intermixing them. Most importantly, we natively have incorporated quality of service optimizations for processing avalanches of geo-referenced data streams in highly dynamic application scenarios. This has been achieved transparently on top of the codebases of emerging de facto standard best-in-class representatives, thus relieving the overburdened shoulders of the users in the presentation layer from having to reason about those services. Instead, users express their queries with quality goals and our system optimizers compiles that down into query plans with an embedded quality guarantee and leaves logistic handling to the underlying layers. We have developed standard compliant prototypes for all the subsystems that constitutes SpatialDSMS

    SACHER Project: A Cloud Platform and Integrated Services for Cultural Heritage and for Restoration

    Get PDF
    The SACHER project provides a distributed, open source and federated cloud platform able to support the life-cycle management of various kinds of data concerning tangible Cultural Heritage. The paper describes the SACHER platform and, in particular, among the various integrated service prototypes, the most important ones to support restoration processes and cultural asset management: (i) 3D Life Cycle Management for Cultural Heritage (SACHER 3D CH), based on 3D digital models of architecture and dedicated to the management of Cultural Heritage and to the storage of the numerous data generated by the team of professionals involved in the restoration process; (ii) Multidimensional Search Engine for Cultural Heritage (SACHER MuSE CH), an advanced multi-level search system designed to manage Heritage data from heterogeneous sources

    Spatial-aware approximate big data stream processing

    No full text
    The widespread adoption of ubiquitous IoT edge devices and modern telemetry spewing out unprecedented avalanches of spatially-tagged datasets that if could interactively be explored would offer deep insights into interesting natural phenomena, which might remain otherwise illusive. Online application of spatial queries is expensive, a problem that is further inflated by the fact that we, more than often, do not have access to a full dataset population in non- stationary settings. As a way of coping up, sampling stands out as a natural solution for approximating estimators such as averages and totals of some interesting correlated parameters. In any sampling design, representativeness remains the main issue upon which a method is regarded good or bad. In a loose way, in a spatial context, this means fairly sampling quantities in a way that preserves spatial characteristics so as to provide more accurate approximates for spatial query responses. Current big data management systems either do not offer over-the-counter spatial-aware online sampling solutions or, at best, rely on randomness, which causes too many imponderables for an overall estimation. We herein have designed a QoS- spatial-aware online sampling method that outperforms vanilla baselines by statically significant magnitudes. Our method sits atop Apache Spark Structured Streaming's codebase and have been tested against a benchmark that is consisting of millions-records of spatially- augmented dataset

    QoS-Aware Approximate Query Processing for Smart Cities Spatial Data Streams

    No full text
    Large amounts of georeferenced data streams arrive daily to stream processing systems. This is attributable to the overabundance of affordable IoT devices. In addition, interested practitioners desire to exploit Internet of Things (IoT) data streams for strategic decision-making purposes. However, mobility data are highly skewed and their arrival rates fluctuate. This nature poses an extra challenge on data stream processing systems, which are required in order to achieve pre-specified latency and accuracy goals. In this paper, we propose ApproxSSPS, which is a system for approximate processing of geo-referenced mobility data, at scale with quality of service guarantees. We focus on stateful aggregations (e.g., means, counts) and top-N queries. ApproxSSPS features a controller that interactively learns the latency statistics and calculates proper sampling rates to meet latency or/and accuracy targets. An overarching trait of ApproxSSPS is its ability to strike a plausible balance between latency and accuracy targets. We evaluate ApproxSSPS on Apache Spark Structured Streaming with real mobility data. We also compared ApproxSSPS against a state-of-the-art online adaptive processing system. Our extensive experiments prove that ApproxSSPS can fulfill latency and accuracy targets with varying sets of parameter configurations and load intensities (i.e., transient peaks in data loads versus slow arriving streams). Moreover, our results show that ApproxSSPS outperforms the baseline counterpart by significant magnitudes. In short, ApproxSSPS is a novel spatial data stream processing system that can deliver real accurate results in a timely manner, by dynamically specifying the limits on data samples

    Cost-Effective Strategies for Provisioning NoSQL Storage Services in Support for Industry 4.0

    No full text
    The advancement of networking and sensor-enabled devices have motivated the emergence of unprecedented initiatives, including Industry 4.0 and smart cities. Those are entwined in a way that makes their operation duly interconnected. Industry 4.0 will sooner become the biggest consumer of smart city big data. That data is geo-referenced, and its storage and processing need spatial-awareness, which is currently absent within the constellation of biggest big data management players of the market. We aim to fill this gap by providing spatial-aware big data management strategies in support for Industry 4.0 main principles. Our experimental results show that our strategies outperform those of state-of-the-art by orders of magnitude

    Efficiently Integrating Mobility and Environment Data for Climate Change Analytics

    No full text
    Recent research focuses on building Cloud-based solutions for big geospatial data analytics. Avalanches of georeferenced mobility data are being collected and processed daily. However, mobility data alone is not enough to unleash the opportunities for insightful analytics that may assist in mitigating the adverse effects of climate change. For example, answering complex queries such as follows: 'what are the Top-3 neighborhoods in Buenos Aires in terms of vehicle mobility where the index of PM10 pollutant is greater than 40'. Similar queries are necessary for emergent health-aware smart city policies. For example, they can provide insights to municipality administrators so that they foster the design of future city infrastructure plans that feature citizen health as a priority. For example, building mobile maps for daily dwellers so that to inform them which routes to avoid passing-through during specific hours of a day to avoid being subjected to high-levels PM10. However, answering such a query would require joining real-time mobility and environment data. Stock versions of the current Cloud-based geospatial management systems do not include intrinsic solutions for such scenarios. In this paper, we report the design and implementation of a novel system MeteoMobil for the combined analytics of information representing mobility and environment. We have implemented our system atop Apache Spark for efficient operation over the Cloud. Our results show that MeteoMobil can be efficiently exploited for advanced climate change analytics

    In-memory Spatial-Aware Framework for Processing Proximity-Alike Queries in Big Spatial Data

    No full text
    The widespread adoption of sensor-enabled and mobile ubiquitous devices has caused an avalanche of big data that is mostly geospatially tagged. Most cloud-based big data processing systems are designed for general-purpose workloads, neglecting spatial-characteristics. However, interesting analytics often seek answers for proximity-alike queries. We fill this gap by providing custom geospatial service layer atop of Apache Spark. To be more specific, we leverage Spark to design a custom spatial-aware partitioning method to boost geospatial query performances. Our results show that our patches outperform state-of-the-art implementations by significant fractions

    Efficient spark-based framework for big geospatial data query processing and analysis

    No full text
    The exponential amount of geospatial data that has been accumulated in an accelerated pace has inevitably motivated the scientific community to examine novel parallel technologies for tuning the performance of spatial queries. Managing spatial data for an optimized query performance is particularly a challenging task. This is due to the growing complexity of geometric computations involved in querying spatial data, where traditional systems failed to beneficially expand. However, the use of large-scale and parallel-based computing infrastructures based on cost-effective commodity clusters and cloud computing environments introduces new management challenges to avoid bottlenecks such as overloading scarce computing resources, which may be caused by an unbalanced loading of parallel tasks. In this paper, we aim to fill those gaps by introducing a generic framework for optimizing the performance of big spatial data queries on top of Apache Spark. Our framework also supports advanced management functions including a unique self-adaptable load-balancing service to self-tune framework execution. Our experimental evaluation shows that our framework is scalable and efficient for querying massive amounts of real spatial datasets
    corecore