894 research outputs found

    Auto-scaling techniques for cloud-based Complex Event Processing

    Get PDF
    One key topic in cloud computing is elasticity, which is the ability of the cloud environment to timely adapt the resource assignment along with the workload demand. According to cloud on-demand model, the infrastructure should be able to scale up and down to unpredictable workloads, in order to achieve both a guaranteed service level and cost efficiency. This work addresses the cloud elasticity problem, with particular reference to the Complex Event Processing (CEP) systems. CEP systems are designed to process large volumes of event-driven data streams and continuously provide results with a low latency and in real-time. CEP systems need to adapt to changing query and events loads. Because of the high computational requirements and varying loads, CEP are distributed system and running on cloud infrastructures. In this work we review the cloud computing auto-scaling solutions, and study their suit- ability in the CEP model. We implement some solutions in a CEP prototype and evaluate the experimental results

    Big Data and Large-scale Data Analytics: Efficiency of Sustainable Scalability and Security of Centralized Clouds and Edge Deployment Architectures

    Get PDF
    One of the significant shifts of the next-generation computing technologies will certainly be in the development of Big Data (BD) deployment architectures. Apache Hadoop, the BD landmark, evolved as a widely deployed BD operating system. Its new features include federation structure and many associated frameworks, which provide Hadoop 3.x with the maturity to serve different markets. This dissertation addresses two leading issues involved in exploiting BD and large-scale data analytics realm using the Hadoop platform. Namely, (i)Scalability that directly affects the system performance and overall throughput using portable Docker containers. (ii) Security that spread the adoption of data protection practices among practitioners using access controls. An Enhanced Mapreduce Environment (EME), OPportunistic and Elastic Resource Allocation (OPERA) scheduler, BD Federation Access Broker (BDFAB), and a Secure Intelligent Transportation System (SITS) of multi-tiers architecture for data streaming to the cloud computing are the main contribution of this thesis study

    Low-latency, query-driven analytics over voluminous multidimensional, spatiotemporal datasets

    Get PDF
    2017 Summer.Includes bibliographical references.Ubiquitous data collection from sources such as remote sensing equipment, networked observational devices, location-based services, and sales tracking has led to the accumulation of voluminous datasets; IDC projects that by 2020 we will generate 40 zettabytes of data per year, while Gartner and ABI estimate 20-35 billion new devices will be connected to the Internet in the same time frame. The storage and processing requirements of these datasets far exceed the capabilities of modern computing hardware, which has led to the development of distributed storage frameworks that can scale out by assimilating more computing resources as necessary. While challenging in its own right, storing and managing voluminous datasets is only the precursor to a broader field of study: extracting knowledge, insights, and relationships from the underlying datasets. The basic building block of this knowledge discovery process is analytic queries, encompassing both query instrumentation and evaluation. This dissertation is centered around query-driven exploratory and predictive analytics over voluminous, multidimensional datasets. Both of these types of analysis represent a higher-level abstraction over classical query models; rather than indexing every discrete value for subsequent retrieval, our framework autonomously learns the relationships and interactions between dimensions in the dataset (including time series and geospatial aspects), and makes the information readily available to users. This functionality includes statistical synopses, correlation analysis, hypothesis testing, probabilistic structures, and predictive models that not only enable the discovery of nuanced relationships between dimensions, but also allow future events and trends to be predicted. This requires specialized data structures and partitioning algorithms, along with adaptive reductions in the search space and management of the inherent trade-off between timeliness and accuracy. The algorithms presented in this dissertation were evaluated empirically on real-world geospatial time-series datasets in a production environment, and are broadly applicable across other storage frameworks

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    Quality of service differentiation for multimedia delivery in wireless LANs

    Get PDF
    Delivering multimedia content to heterogeneous devices over a variable networking environment while maintaining high quality levels involves many technical challenges. The research reported in this thesis presents a solution for Quality of Service (QoS)-based service differentiation when delivering multimedia content over the wireless LANs. This thesis has three major contributions outlined below: 1. A Model-based Bandwidth Estimation algorithm (MBE), which estimates the available bandwidth based on novel TCP and UDP throughput models over IEEE 802.11 WLANs. MBE has been modelled, implemented, and tested through simulations and real life testing. In comparison with other bandwidth estimation techniques, MBE shows better performance in terms of error rate, overhead, and loss. 2. An intelligent Prioritized Adaptive Scheme (iPAS), which provides QoS service differentiation for multimedia delivery in wireless networks. iPAS assigns dynamic priorities to various streams and determines their bandwidth share by employing a probabilistic approach-which makes use of stereotypes. The total bandwidth to be allocated is estimated using MBE. The priority level of individual stream is variable and dependent on stream-related characteristics and delivery QoS parameters. iPAS can be deployed seamlessly over the original IEEE 802.11 protocols and can be included in the IEEE 802.21 framework in order to optimize the control signal communication. iPAS has been modelled, implemented, and evaluated via simulations. The results demonstrate that iPAS achieves better performance than the equal channel access mechanism over IEEE 802.11 DCF and a service differentiation scheme on top of IEEE 802.11e EDCA, in terms of fairness, throughput, delay, loss, and estimated PSNR. Additionally, both objective and subjective video quality assessment have been performed using a prototype system. 3. A QoS-based Downlink/Uplink Fairness Scheme, which uses the stereotypes-based structure to balance the QoS parameters (i.e. throughput, delay, and loss) between downlink and uplink VoIP traffic. The proposed scheme has been modelled and tested through simulations. The results show that, in comparison with other downlink/uplink fairness-oriented solutions, the proposed scheme performs better in terms of VoIP capacity and fairness level between downlink and uplink traffic

    A Survey on Automatic Parameter Tuning for Big Data Processing Systems

    Get PDF
    Big data processing systems (e.g., Hadoop, Spark, Storm) contain a vast number of configuration parameters controlling parallelism, I/O behavior, memory settings, and compression. Improper parameter settings can cause significant performance degradation and stability issues. However, regular users and even expert administrators grapple with understanding and tuning them to achieve good performance. We investigate existing approaches on parameter tuning for both batch and stream data processing systems and classify them into six categories: rule-based, cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. We summarize the pros and cons of each approach and raise some open research problems for automatic parameter tuning.Peer reviewe
    corecore