706 research outputs found

    Complex Event Processing as a Service in Multi-Cloud Environments

    Get PDF
    The rise of mobile technologies and the Internet of Things, combined with advances in Web technologies, have created a new Big Data world in which the volume and velocity of data generation have achieved an unprecedented scale. As a technology created to process continuous streams of data, Complex Event Processing (CEP) has been often related to Big Data and used as a tool to obtain real-time insights. However, despite this recent surge of interest, the CEP market is still dominated by solutions that are costly and inflexible or too low-level and hard to operate. To address these problems, this research proposes the creation of a CEP system that can be offered as a service and used over the Internet. Such a CEP as a Service (CEPaaS) system would give its users CEP functionalities associated with the advantages of the services model, such as no up-front investment and low maintenance cost. Nevertheless, creating such a service involves challenges that are not addressed by current CEP systems. This research proposes solutions for three open problems that exist in this context. First, to address the problem of understanding and reusing existing CEP management procedures, this research introduces the Attributed Graph Rewriting for Complex Event Processing Management (AGeCEP) formalism as a technology- and language-agnostic representation of queries and their reconfigurations. Second, to address the problem of evaluating CEP query management and processing strategies, this research introduces CEPSim, a simulator of cloud-based CEP systems. Finally, this research also introduces a CEPaaS system based on a multi-cloud architecture, container management systems, and an AGeCEP-based multi-tenant design. To demonstrate its feasibility, AGeCEP was used to design an autonomic manager and a selected set of self-management policies. Moreover, CEPSim was thoroughly evaluated by experiments that showed it can simulate existing systems with accuracy and low execution overhead. Finally, additional experiments validated the CEPaaS system and demonstrated it achieves the goal of offering CEP functionalities as a scalable and fault-tolerant service. In tandem, these results confirm this research significantly advances the CEP state of the art and provides novel tools and methodologies that can be applied to CEP research

    CEPSim: Modelling and Simulation of Complex Event Processing Systems in Cloud Environments

    Get PDF
    The emergence of Big Data has had profound impacts on how data are stored and processed. As technologies created to process continuous streams of data with low latency, Complex Event Processing (CEP) and Stream Processing (SP) have often been related to the Big Data velocity dimension and used in this context. Many modern CEP and SP systems leverage cloud environments to provide the low latency and scalability required by Big Data applications, yet validating these systems at the required scale is a research problem per se. Cloud computing simulators have been used as a tool to facilitate reproducible and repeatable experiments in clouds. Nevertheless, existing simulators are mostly based on simple application and simulation models that are not appropriate for CEP or for SP. This article presents CEPSim, a simulator for CEP and SP systems in cloud environments. CEPSim proposes a query model based on Directed Acyclic Graphs (DAGs) and introduces a simulation algorithm based on a novel abstraction called event sets. CEPSim is highly customizable and can be used to analyze the performance and scalability of user-defined queries and to evaluate the effects of various query processing strategies. Experimental results show that CEPSim can simulate existing systems in large Big Data scenarios with accuracy and precision

    Evaluation of Particle Swarm Optimization Applied to Grid Scheduling

    Get PDF
    The problem of scheduling independent users’ jobs to resources in Grid Computing systems is of paramount importance. This problem is known to be NP-hard, and many techniques have been proposed to solve it, such as heuristics, genetic algorithms (GA), and, more recently, particle swarm optimization (PSO). This article aims to use PSO to solve grid scheduling problems, and compare it with other techniques. It is shown that many often-overlooked implementation details can have a huge impact on the performance of the method. In addition, experiments also show that the PSO has a tendency to stagnate around local minima in high-dimensional input problems. Therefore, this work also proposes a novel hybrid PSO-GA method that aims to increase swarm diversity when a stagnation condition is detected. The method is evaluated and compared with other PSO formulations; the results show that the new method can successfully improve the scheduling solution

    Data management in cloud environments: NoSQL and NewSQL data stores

    Get PDF
    : Advances in Web technology and the proliferation of mobile devices and sensors connected to the Internet have resulted in immense processing and storage requirements. Cloud computing has emerged as a paradigm that promises to meet these requirements. This work focuses on the storage aspect of cloud computing, specifically on data management in cloud environments. Traditional relational databases were designed in a different hardware and software era and are facing challenges in meeting the performance and scale requirements of Big Data. NoSQL and NewSQL data stores present themselves as alternatives that can handle huge volume of data. Because of the large number and diversity of existing NoSQL and NewSQL solutions, it is difficult to comprehend the domain and even more challenging to choose an appropriate solution for a specific task. Therefore, this paper reviews NoSQL and NewSQL solutions with the objective of: (1) providing a perspective in the field, (2) providing guidance to practitioners and researchers to choose the appropriate data store, and (3) identifying challenges and opportunities in the field. Specifically, the most prominent solutions are compared focusing on data models, querying, scaling, and security related capabilities. Features driving the ability to scale read requests and write requests, or scaling data storage are investigated, in particular partitioning, replication, consistency, and concurrency control. Furthermore, use cases and scenarios in which NoSQL and NewSQL data stores have been used are discussed and the suitability of various solutions for different sets of applications is examined. Consequently, this study has identified challenges in the field, including the immense diversity and inconsistency of terminologies, limited documentation, sparse comparison and benchmarking criteria, and nonexistence of standardized query languages

    Service Evolution Patterns

    Get PDF
    Service evolution is the process of maintaining and evolving existing Web services to cater for new requirements and technological changes. In this paper, a service evolution model is proposed to analyze service dependencies, identify changes on services and estimate impact on consumers that will use new versions of these services. Based on the proposed service evolution model, four service evolution patterns are described: compatibility, transition, split-map, and merge-map. These proposed patterns provide reusable templates to encourage well-defined service evolution while minimizing issues that arise otherwise. They can be applied in the service evolution scenario where a single service is used by many, possibly unknown, consumers’ applications. In such a scenario, providers evolve their services independently from consumers, which might cause unexpected errors and incur unpredicted impact on the dependent consumers\u27 applications. Therefore, providers can use these patterns to estimate the impact that changes to be introduced to their services may cause on their consumers, and to allow consumers smoothly migrate to the newest version of the service

    Network and Energy-Aware Resource Selection Model for Opportunistic Grids

    Get PDF
    Due to increasing hardware capacity, computing grids have been handling and processing more data. This has led to higher amount of energy being consumed by grids; hence the necessity for strategies to reduce their energy consumption. Scheduling is a process carried out to define in which node tasks will be executed in the grid. This process can significantly impact the global system performance, including energy consumption. This paper focuses on a scheduling model for opportunistic grids that considers network traffic, distance between input files and execution node as well as the execution node status. The model was tested in a simulated environment created using GreenCloud. The simulation results of this model compared to a usual approach show a total power consumption savings of 7.10%

    Deep Neural Networks With Confidence Sampling For Electrical Anomaly Detection

    Get PDF
    The increase in electrical metering has created tremendous quantities of data and, as a result, possibilities for deep insights into energy usage, better energy management, and new ways of energy conservation. As buildings are responsible for a significant portion of overall energy consumption, conservation efforts targeting buildings can provide tremendous effect on energy savings. Building energy monitoring enables identification of anomalous or unexpected behaviors which, when corrected, can lead to energy savings. Although the available data is large, the limited availability of labels makes anomaly detection difficult. This research proposes a deep semi-supervised convolutional neural network with confidence sampling for electrical anomaly detection. To achieve semi-supervised learning, two sub-networks are used: the first performs reconstruction and uses unlabelled data, while the second performs classification with labelled data. The two sub-networks overlap: the encoder parameters are shared between the two. To quantify anomaly detection confidence, a valuable metric in anomaly detection, the network uses a dropout sampling method. The proposed approach has been evaluated with real-world electrical data from systems such as HVAC, lighting, and heat pumps. The results demonstrated the accuracy of the proposed anomaly detection solution

    A Gamification Framework for Sensor Data Analytics

    Get PDF
    The Internet of Things (IoT) enables connected objects to capture, communicate, and collect information over the network through a multitude of sensors, setting the foundation for applications such as smart grids, smart cars, and smart cities. In this context, large scale analytics is needed to extract knowledge and value from the data produced by these sensors. The ability to perform analytics on these data, however, is highly limited by the difficulties of collecting labels. Indeed, the machine learning techniques used to perform analytics rely upon data labels to learn and to validate results. Historically, crowdsourcing platforms have been used to gather labels, yet they cannot be directly used in the IoT because of poor human readability of sensor data. To overcome these limitations, this paper proposes a framework for sensor data analytics which leverages the power of crowdsourcing through gamification to acquire sensor data labels. The framework uses gamification as a socially engaging vehicle and as a way to motivate users to participate in various labelling tasks. To demonstrate the framework proposed, a case study is also presented. Evaluation results show the framework can successfully translate gamification events into sensor data labels

    From Inception to Execution: Query Management for Complex Event Processing as a Service

    Get PDF
    International audienceComplex Event Processing (CEP) is a set of tools and techniques that can be used to obtain insights from high- volume, high-velocity continuous streams of events. CEP-based systems have been adopted in many situations that require prompt establishment of system diagnostics and execution of reaction plans, such as in monitoring of complex systems. This article describes the Query Analyzer and Manager (QAM) mod- ule, a first effort toward the development of a CEP as a Service (CEPaaS) system. This module is responsible for analyzing user-defined CEP queries and for managing their execution in distributed cloud-based environments. Using a language-agnostic internal query representation, QAM has a modular design that enables its adoption by virtually any CEP system

    Challenges for MapReduce in Big Data

    Get PDF
    In the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting continuously increasing demands on computing resources imposed by massive data sets. The reason for this is the high scalability of the MapReduce paradigm which allows for massively parallel and distributed execution over a large number of computing nodes. This paper identifies MapReduce issues and challenges in handling Big Data with the objective of providing an overview of the field, facilitating better planning and management of Big Data projects, and identifying opportunities for future research in this field. The identified challenges are grouped into four main categories corresponding to Big Data tasks types: data storage (relational databases and NoSQL stores), Big Data analytics (machine learning and interactive analytics), online processing, and security and privacy. Moreover, current efforts aimed at improving and extending MapReduce to address identified challenges are presented. Consequently, by identifying issues and challenges MapReduce faces when handling Big Data, this study encourages future Big Data research
    • …
    corecore