70,142 research outputs found
GraphMP: An Efficient Semi-External-Memory Big Graph Processing System on a Single Machine
Recent studies showed that single-machine graph processing systems can be as
highly competitive as cluster-based approaches on large-scale problems. While
several out-of-core graph processing systems and computation models have been
proposed, the high disk I/O overhead could significantly reduce performance in
many practical cases. In this paper, we propose GraphMP to tackle big graph
analytics on a single machine. GraphMP achieves low disk I/O overhead with
three techniques. First, we design a vertex-centric sliding window (VSW)
computation model to avoid reading and writing vertices on disk. Second, we
propose a selective scheduling method to skip loading and processing
unnecessary edge shards on disk. Third, we use a compressed edge cache
mechanism to fully utilize the available memory of a machine to reduce the
amount of disk accesses for edges. Extensive evaluations have shown that
GraphMP could outperform state-of-the-art systems such as GraphChi, X-Stream
and GridGraph by 31.6x, 54.5x and 23.1x respectively, when running popular
graph applications on a billion-vertex graph
Dynamic Scheduling for Energy Minimization in Delay-Sensitive Stream Mining
Numerous stream mining applications, such as visual detection, online patient monitoring, and video search and retrieval, are emerging on both mobile and high-performance computing systems. These applications are subject to responsiveness (i.e., delay) constraints for user interactivity and, at the same time, must be optimized for energy efficiency. The increasingly heterogeneous power-versus-performance profile of modern hardware presents new opportunities for energy saving as well as challenges. For example, employing low-performance processing nodes can save energy but may violate delay requirements, whereas employing high-performance processing nodes can deliver a fast response but may unnecessarily waste energy. Existing scheduling algorithms balance energy versus delay assuming constant processing and power requirements throughout the execution of a stream mining task and without exploiting hardware heterogeneity. In this paper, we propose a novel framework for dynamic scheduling for energy minimization (DSE) that leverages this emerging hardware heterogeneity. By optimally determining the processing speeds for hardware executing classifiers, DSE minimizes the average energy consumption while satisfying an average delay constraint. To assess the performance of DSE, we build a face detection application based on the Viola-Jones classifier chain and conduct experimental studies via heterogeneous processor system emulation. The results show that, under the same delay requirement, DSE reduces the average energy consumption by up to 50% in comparison to conventional scheduling that does not exploit hardware heterogeneity. We also demonstrate that DSE is robust against processing node switching overhead and model inaccuracy
Real-Time Stream Processing in Embedded Systems
Modern real-time embedded systems often involve computational-intensive data processing algorithms to meet their application requirements. As a result, there has been an increase in the use of multiprocessor platforms. The stream processing programming model aims to facilitate the construction of concurrent data processing programs to exploit the parallelism available on these architectures. However, most current stream processing frameworks or languages are not designed for use in real-time systems, let alone systems that might also have hard real-time control algorithms. This thesis contends that a generic architecture of a real-time stream processing infrastructure can be created to support predictable processing of both batched and live streaming data sources, and integrated with hard real-time control algorithms.
The thesis first reviews relevant stream processing techniques, and identifies the open issues. Then a real-time stream processing task model, and an architecture for supporting that model is proposed. An approach to the integration of stream processing tasks into a real-time environment that also has hard real-time components is presented. Data is processed in parallel using execution-time servers allocated to each core. An algorithm is presented for selecting the parameters of the servers that maximises their capacities (within an overall deadline) and ensures that hard real-time components remain schedulable. Response-time analysis is derived to guarantee that the real-time requirements (deadlines for batched data processing, and latency for each data item for live data) for the stream processing activity are met. A framework, called SPRY, is implemented to support the proposed real-time stream processing architecture. The framework supports fully-partitioned applications that are scheduled using fixed priority-based scheduling techniques. A case study based on a modified Generic Avionics Platform is given to demonstrate the overall approach. Finally, the evaluation shows that the presented approach provides a better schedulability than alternative approaches
Efficient Online Scheduling in Distributed Stream Data Processing Systems
General-purpose Distributed Stream Data Processing Systems (DSDPSs) have attracted extensive attention from industry and academia in recent years. They are capable of processing unbounded big streams of continuous data in a distributed and real (or near-real) time manner. A fundamental problem in a DSDPS is the scheduling problem, i.e., assigning threads (carrying workload) to workers/machines with the objective of minimizing average end-to-end tuple processing time (or simply tuple processing time). A widely-used solution is to distribute workload over machines in the cluster in a round-robin manner, which is obviously not efficient due to the lack of consideration for communication delay among processes/machines. A scheduling solution makes a significant impact on the average tuple processing time. However, their relationship is very subtle and complicated. It does not even seem possible to have a mathematical programming formulation for the scheduling problem if its objective is to directly minimize the average tuple processing time.
In this dissertation, we first propose a model-based approach that accurately models the correlation between a scheduling solution and its objective value (i.e. average tuple processing time) for a given scheduling solution according to the topology of the application graph and runtime statistics. A predictive scheduling algorithm is then presented, which as- signs tasks (threads) to machines under the guidance of the proposed model. This approach achieves an average of 24.9% improvement over Storm’s default scheduler. However, the model-based approach still has its limitations: the model may not be able to fully capture the features of a DSDPS; prediction may not be accurate enough; and a large amount of high-dimensional data may lead to high overhead.
To address the limitations, we develop a model-free approach that can learn to control a DSDPS from its experience rather than adopting accurate and mathematically solvable system models, just as a human learns a skill (such as cooking, driving, swimming, etc.). Recent breakthrough of Deep Reinforcement Learning (DRL) provides a promising approach for enabling effective model-free control. The proposed DRL-based model-free approach minimizes the average end-to-end tuple processing time by jointly learning the system environment via collecting very limited runtime statistics and making decisions under the guidance of powerful Deep Neural Networks (DNNs). This approach achieves great performance improvement over the current practice and the state-of-the-art model-based approach.
Moreover, there is still room for improvement for the above model-free approach: For the above model-free approach and most existing methods, a user specifies the number of threads for an application in advance without knowing much about runtime needs, which, however, remains unchanged during runtime. This could severely affect the performance of a DSDPS. Therefore, we further develop another model-free approach using DRL, EXTRA, which enables the dynamic use of a variable number of threads at runtime. It has been shown by extensive experimental results, by adding this new feature, EXTRA can achieve further performance improvement and greater flexibility on scheduling
A Deviant Load Shedding System for Data Stream Mining
AbstractLoad shedding is imperative for data stream processing systems in numerous functions as data streams are susceptible to sudden spikes in volume. The proposed system is an attempt to seek and resolve four major problems associated with data stream, which include load shedding and anti-shedding time, number of transactions pruned and selecting predicate; using efficient mining system. The frequent pattern discovered in data stream used in the model exploits the synergy between scheduling and load shedding. This paper also proposes various load shedding strategies which reduce and lighten the workload of the system ensuring an acceptable level of mining accuracy using various parameters like transaction, priority and attributes of data mining. A majority chunk of workload in mining algorithm lies in the innumerable item sets, which are counted and enumerated. The approach is based on the frequent pattern matching principle of stream mining which involves reducing the workload to maintain smaller item sets
Resource Allocation Optimization through Task Based Scheduling Algorithms in Distributed Real Time Embedded Systems
Distributed embedded system is a type of distributed system, which consists of a large number of nodes, each node having lower computational power when compared
to a node of a regular distributed system (like a cluster). A real time system is the one where every task has an associated dead line and the system works with a continuous stream of data supplied in real time.Such systems find wide applications in various fields such as automobile industry as fly-by-wire,brake-by-wire and steer-by-wire systems. Scheduling and efficient allocation of resources is extremely important in such systems because a distributed embedded real time system must deliver its output within a certain time frame, failing which the output becomes useless.In this paper, we have taken up processing unit number as a resource and have optimized the allocation of it to the various tasks.We use techniques such as model-based redundancy,heartbeat monitoring and check-pointing for fault detection and failure recovery.Our fault tolerance framework uses an existing list-based scheduling algorithm for task scheduling.This helps in diagnosis and shutting down of faulty actuators before the system becomes unsafe. The framework is designed and tested using a new simulation model consisting of virtual nodes working on a message passing system
DRS: Dynamic Resource Scheduling for Real-Time Analytics over Fast Streams
In a data stream management system (DSMS), users register continuous queries,
and receive result updates as data arrive and expire. We focus on applications
with real-time constraints, in which the user must receive each result update
within a given period after the update occurs. To handle fast data, the DSMS is
commonly placed on top of a cloud infrastructure. Because stream properties
such as arrival rates can fluctuate unpredictably, cloud resources must be
dynamically provisioned and scheduled accordingly to ensure real-time response.
It is quite essential, for the existing systems or future developments, to
possess the ability of scheduling resources dynamically according to the
current workload, in order to avoid wasting resources, or failing in delivering
correct results on time. Motivated by this, we propose DRS, a novel dynamic
resource scheduler for cloud-based DSMSs. DRS overcomes three fundamental
challenges: (a) how to model the relationship between the provisioned resources
and query response time (b) where to best place resources; and (c) how to
measure system load with minimal overhead. In particular, DRS includes an
accurate performance model based on the theory of \emph{Jackson open queueing
networks} and is capable of handling \emph{arbitrary} operator topologies,
possibly with loops, splits and joins. Extensive experiments with real data
confirm that DRS achieves real-time response with close to optimal resource
consumption.Comment: This is the our latest version with certain modificatio
Recommended from our members
A Platform for Scalable Low-Latency Analytics using MapReduce
Today, the ability to process big data has become crucial to the information needs of many enterprise businesses, scientific applications, and governments. Recently, there have been increasing needs of processing data that is not only big but also fast . Here fast data refers to high-speed real-time and near real-time data streams, such as Twitter feeds, search query streams, click streams, impressions, and system logs. To handle both historical data and real-time data, many companies have to maintain multiple systems. However, recent real-world case studies show that maintaining multiple systems cause not only code duplication, but also intensive manual work to partition the analytics workloads and determine which data is processed by which system. These issues point to the need for a general, unified data processing framework to support analytical queries with different latency requirements.
This thesis takes a further step towards building a general, unified system for big and fast data analytics. In order to build such a system, I propose to build on existing solutions on data parallelism and extend them with two new features: incremental processing and stream processing with latency constraints. This thesis starts with Hadoop, the most popular open-source MapReduce implementation, which provides proven scalability based on data parallelism. I answer the following questions: (1) Is Hadoop able to support incremental processing? (2) What are the necessary architecture changes in order to support incremental processing? (3) What are the additional design features needed to support stream processing with latency constraints? The thesis includes three parts that answer each of the questions.
The first part of the thesis validates whether the existing MapReduce implementations can support incremental processing. Incremental processing means that computation is performed as soon as the relevant data becomes available. My extensive benchmark study of Hadoop-based MapReduce systems shows that the widely-used sort-merge implementation for partitioning and parallel processing poses a fundamental barrier to incremental computation. I further propose a cost model, and optimize the Hadoop system configuration based on the model. The benchmark results over the optimized system verify that the barrier to incremental computation is intrinsic, and cannot be removed by tuning system parameters.
In the second part of the thesis, I employ various purely hash-based techniques to enable fast in-memory incremental processing in MapReduce, and frequent key based techniques to extend such processing to workloads that require memory more than available. I evaluate my Hadoop-based prototype equipped with all proposed techniques. The results show that the hash techniques allow the reduce progress to keep up with the map progress with up to 3 orders of magnitude reduction of internal disk spills, and enable results to be returned early.
The third part of the thesis aims to support stream processing with latency constraints based on the incremental processing platform resulted from the second part. I perform a benchmark study to understand the sources of latency. I then propose a number of necessary architecture changes to support stream processing, and augment the platform with new latency-aware model-driven resource planning and latency-aware runtime scheduling techniques to meet user-specified latency constraints while maximizing throughput. Experiments using real-world workloads show that the techniques reduce the latency from tens or hundreds of seconds to sub-second, with 2x-5x increase in throughput. The new platform offers 1-2 orders of magnitude improvements over Storm, a commercial-grade distributed stream system, and Spark Streaming, a state-of-the-art academic prototype, when considering both latency and throughput
- …