894 research outputs found
Auto-scaling techniques for cloud-based Complex Event Processing
One key topic in cloud computing is elasticity, which is the ability of the cloud environment to timely adapt the resource assignment along with the workload demand. According
to cloud on-demand model, the infrastructure should be able to scale up and down to unpredictable workloads, in order to achieve both a guaranteed service level and cost efficiency.
This work addresses the cloud elasticity problem, with particular reference to the Complex
Event Processing (CEP) systems.
CEP systems are designed to process large volumes of event-driven data streams and
continuously provide results with a low latency and in real-time. CEP systems need to
adapt to changing query and events loads. Because of the high computational requirements
and varying loads, CEP are distributed system and running on cloud infrastructures.
In this work we review the cloud computing auto-scaling solutions, and study their suit-
ability in the CEP model. We implement some solutions in a CEP prototype and evaluate
the experimental results
Big Data and Large-scale Data Analytics: Efficiency of Sustainable Scalability and Security of Centralized Clouds and Edge Deployment Architectures
One of the significant shifts of the next-generation computing technologies will certainly be in
the development of Big Data (BD) deployment architectures. Apache Hadoop, the BD
landmark, evolved as a widely deployed BD operating system. Its new features include
federation structure and many associated frameworks, which provide Hadoop 3.x with the
maturity to serve different markets. This dissertation addresses two leading issues involved in
exploiting BD and large-scale data analytics realm using the Hadoop platform. Namely,
(i)Scalability that directly affects the system performance and overall throughput using
portable Docker containers. (ii) Security that spread the adoption of data protection practices
among practitioners using access controls. An Enhanced Mapreduce Environment (EME),
OPportunistic and Elastic Resource Allocation (OPERA) scheduler, BD Federation Access Broker
(BDFAB), and a Secure Intelligent Transportation System (SITS) of multi-tiers architecture for
data streaming to the cloud computing are the main contribution of this thesis study
Low-latency, query-driven analytics over voluminous multidimensional, spatiotemporal datasets
2017 Summer.Includes bibliographical references.Ubiquitous data collection from sources such as remote sensing equipment, networked observational devices, location-based services, and sales tracking has led to the accumulation of voluminous datasets; IDC projects that by 2020 we will generate 40 zettabytes of data per year, while Gartner and ABI estimate 20-35 billion new devices will be connected to the Internet in the same time frame. The storage and processing requirements of these datasets far exceed the capabilities of modern computing hardware, which has led to the development of distributed storage frameworks that can scale out by assimilating more computing resources as necessary. While challenging in its own right, storing and managing voluminous datasets is only the precursor to a broader field of study: extracting knowledge, insights, and relationships from the underlying datasets. The basic building block of this knowledge discovery process is analytic queries, encompassing both query instrumentation and evaluation. This dissertation is centered around query-driven exploratory and predictive analytics over voluminous, multidimensional datasets. Both of these types of analysis represent a higher-level abstraction over classical query models; rather than indexing every discrete value for subsequent retrieval, our framework autonomously learns the relationships and interactions between dimensions in the dataset (including time series and geospatial aspects), and makes the information readily available to users. This functionality includes statistical synopses, correlation analysis, hypothesis testing, probabilistic structures, and predictive models that not only enable the discovery of nuanced relationships between dimensions, but also allow future events and trends to be predicted. This requires specialized data structures and partitioning algorithms, along with adaptive reductions in the search space and management of the inherent trade-off between timeliness and accuracy. The algorithms presented in this dissertation were evaluated empirically on real-world geospatial time-series datasets in a production environment, and are broadly applicable across other storage frameworks
High-Performance Modelling and Simulation for Big Data Applications
This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications
Quality of service differentiation for multimedia delivery in wireless LANs
Delivering multimedia content to heterogeneous devices over a variable networking environment while maintaining high quality levels involves many technical challenges. The research reported in this thesis presents a solution for Quality of Service (QoS)-based service differentiation when delivering multimedia content over the wireless LANs. This thesis has three major contributions outlined below:
1. A Model-based Bandwidth Estimation algorithm (MBE), which estimates the available bandwidth based on novel TCP and UDP throughput models over IEEE 802.11 WLANs. MBE has been modelled, implemented, and tested through simulations and real life testing. In comparison with other bandwidth estimation techniques, MBE shows better performance in terms of error rate, overhead, and loss.
2. An intelligent Prioritized Adaptive Scheme (iPAS), which provides QoS service differentiation for multimedia delivery in wireless networks. iPAS assigns dynamic priorities to various streams and determines their bandwidth share by employing a probabilistic approach-which makes use of stereotypes. The total bandwidth to be allocated is estimated using MBE. The priority level of individual stream is variable and dependent on stream-related characteristics and delivery QoS parameters. iPAS can be deployed seamlessly over the original IEEE 802.11 protocols and can be included in the IEEE 802.21 framework in order to optimize the control signal communication. iPAS has been modelled, implemented, and evaluated via simulations. The results demonstrate that iPAS achieves better performance than the equal channel access mechanism over IEEE 802.11 DCF and a service differentiation scheme on top of IEEE 802.11e EDCA, in terms of fairness, throughput, delay, loss, and estimated PSNR. Additionally, both objective and subjective video quality assessment have been performed using a prototype system.
3. A QoS-based Downlink/Uplink Fairness Scheme, which uses the stereotypes-based structure to balance the QoS parameters (i.e. throughput, delay, and loss) between downlink and uplink VoIP traffic. The proposed scheme has been modelled and tested through simulations. The results show that, in comparison with other downlink/uplink fairness-oriented solutions, the proposed scheme performs better in terms of VoIP capacity and fairness level between downlink and uplink traffic
A Survey on Automatic Parameter Tuning for Big Data Processing Systems
Big data processing systems (e.g., Hadoop, Spark, Storm) contain a vast number of configuration parameters controlling parallelism, I/O behavior, memory settings, and compression. Improper parameter settings can cause significant performance degradation and stability issues. However, regular users and even expert administrators grapple with understanding and tuning them to achieve good performance. We investigate existing approaches on parameter tuning for both batch and stream data processing systems and classify them into six categories: rule-based, cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. We summarize the pros and cons of each approach and raise some open research problems for automatic parameter tuning.Peer reviewe
- …