84,766 research outputs found

    Generalizable Resource Allocation in Stream Processing via Deep Reinforcement Learning

    Full text link
    This paper considers the problem of resource allocation in stream processing, where continuous data flows must be processed in real time in a large distributed system. To maximize system throughput, the resource allocation strategy that partitions the computation tasks of a stream processing graph onto computing devices must simultaneously balance workload distribution and minimize communication. Since this problem of graph partitioning is known to be NP-complete yet crucial to practical streaming systems, many heuristic-based algorithms have been developed to find reasonably good solutions. In this paper, we present a graph-aware encoder-decoder framework to learn a generalizable resource allocation strategy that can properly distribute computation tasks of stream processing graphs unobserved from training data. We, for the first time, propose to leverage graph embedding to learn the structural information of the stream processing graphs. Jointly trained with the graph-aware decoder using deep reinforcement learning, our approach can effectively find optimized solutions for unseen graphs. Our experiments show that the proposed model outperforms both METIS, a state-of-the-art graph partitioning algorithm, and an LSTM-based encoder-decoder model, in about 70% of the test cases.Comment: Accepted by AAAI 202

    Distributed Resource Allocation for Stream Data Processing

    Full text link
    Abstract. Data streaming applications are becoming more and more common due to the rapid development in the areas such as sensor net-works, multimedia streaming, and on-line data mining, etc. These ap-plications are often running in a decentralized, distributed environment. The requirements for processing large volumes of streaming data at real time have posed many great design challenges. It is critical to optimize the ongoing resource consumption of multiple, distributed, cooperating, processing units. In this paper, we consider a generic model for the gen-eral stream data processing systems. We address the resource alloca-tion problem for a collection of processing units so as to maximize the weighted sum of the throughput of different streams. Each processing unit may require multiple input data streams simultaneously and pro-duce one or many valuable output streams. Data streams flow through such a system after processing at multiple processing units. Based on this framework, we develop distributed algorithms for finding the best resource allocation schemes in such data stream processing networks. Performance analysis on the optimality and complexity of these algo-rithms are also provided

    Resource Allocation Optimization through Task Based Scheduling Algorithms in Distributed Real Time Embedded Systems

    Get PDF
    Distributed embedded system is a type of distributed system, which consists of a large number of nodes, each node having lower computational power when compared to a node of a regular distributed system (like a cluster). A real time system is the one where every task has an associated dead line and the system works with a continuous stream of data supplied in real time.Such systems find wide applications in various fields such as automobile industry as fly-by-wire,brake-by-wire and steer-by-wire systems. Scheduling and efficient allocation of resources is extremely important in such systems because a distributed embedded real time system must deliver its output within a certain time frame, failing which the output becomes useless.In this paper, we have taken up processing unit number as a resource and have optimized the allocation of it to the various tasks.We use techniques such as model-based redundancy,heartbeat monitoring and check-pointing for fault detection and failure recovery.Our fault tolerance framework uses an existing list-based scheduling algorithm for task scheduling.This helps in diagnosis and shutting down of faulty actuators before the system becomes unsafe. The framework is designed and tested using a new simulation model consisting of virtual nodes working on a message passing system

    Optimality Properties, Distributed Strategies, and Measurement-Based Evaluation of Coordinated Multicell OFDMA Transmission

    Full text link
    The throughput of multicell systems is inherently limited by interference and the available communication resources. Coordinated resource allocation is the key to efficient performance, but the demand on backhaul signaling and computational resources grows rapidly with number of cells, terminals, and subcarriers. To handle this, we propose a novel multicell framework with dynamic cooperation clusters where each terminal is jointly served by a small set of base stations. Each base station coordinates interference to neighboring terminals only, thus limiting backhaul signalling and making the framework scalable. This framework can describe anything from interference channels to ideal joint multicell transmission. The resource allocation (i.e., precoding and scheduling) is formulated as an optimization problem (P1) with performance described by arbitrary monotonic functions of the signal-to-interference-and-noise ratios (SINRs) and arbitrary linear power constraints. Although (P1) is non-convex and difficult to solve optimally, we are able to prove: 1) Optimality of single-stream beamforming; 2) Conditions for full power usage; and 3) A precoding parametrization based on a few parameters between zero and one. These optimality properties are used to propose low-complexity strategies: both a centralized scheme and a distributed version that only requires local channel knowledge and processing. We evaluate the performance on measured multicell channels and observe that the proposed strategies achieve close-to-optimal performance among centralized and distributed solutions, respectively. In addition, we show that multicell interference coordination can give substantial improvements in sum performance, but that joint transmission is very sensitive to synchronization errors and that some terminals can experience performance degradations.Comment: Published in IEEE Transactions on Signal Processing, 15 pages, 7 figures. This version corrects typos related to Eq. (4) and Eq. (28

    DRS: Dynamic Resource Scheduling for Real-Time Analytics over Fast Streams

    Full text link
    In a data stream management system (DSMS), users register continuous queries, and receive result updates as data arrive and expire. We focus on applications with real-time constraints, in which the user must receive each result update within a given period after the update occurs. To handle fast data, the DSMS is commonly placed on top of a cloud infrastructure. Because stream properties such as arrival rates can fluctuate unpredictably, cloud resources must be dynamically provisioned and scheduled accordingly to ensure real-time response. It is quite essential, for the existing systems or future developments, to possess the ability of scheduling resources dynamically according to the current workload, in order to avoid wasting resources, or failing in delivering correct results on time. Motivated by this, we propose DRS, a novel dynamic resource scheduler for cloud-based DSMSs. DRS overcomes three fundamental challenges: (a) how to model the relationship between the provisioned resources and query response time (b) where to best place resources; and (c) how to measure system load with minimal overhead. In particular, DRS includes an accurate performance model based on the theory of \emph{Jackson open queueing networks} and is capable of handling \emph{arbitrary} operator topologies, possibly with loops, splits and joins. Extensive experiments with real data confirm that DRS achieves real-time response with close to optimal resource consumption.Comment: This is the our latest version with certain modificatio

    Model-driven Scheduling for Distributed Stream Processing Systems

    Full text link
    Distributed Stream Processing frameworks are being commonly used with the evolution of Internet of Things(IoT). These frameworks are designed to adapt to the dynamic input message rate by scaling in/out.Apache Storm, originally developed by Twitter is a widely used stream processing engine while others includes Flink, Spark streaming. For running the streaming applications successfully there is need to know the optimal resource requirement, as over-estimation of resources adds extra cost.So we need some strategy to come up with the optimal resource requirement for a given streaming application. In this article, we propose a model-driven approach for scheduling streaming applications that effectively utilizes a priori knowledge of the applications to provide predictable scheduling behavior. Specifically, we use application performance models to offer reliable estimates of the resource allocation required. Further, this intuition also drives resource mapping, and helps narrow the estimated and actual dataflow performance and resource utilization. Together, this model-driven scheduling approach gives a predictable application performance and resource utilization behavior for executing a given DSPS application at a target input stream rate on distributed resources.Comment: 54 page

    SQPR: Stream Query Planning with Reuse

    Get PDF
    When users submit new queries to a distributed stream processing system (DSPS), a query planner must allocate physical resources, such as CPU cores, memory and network bandwidth, from a set of hosts to queries. Allocation decisions must provide the correct mix of resources required by queries, while achieving an efficient overall allocation to scale in the number of admitted queries. By exploiting overlap between queries and reusing partial results, a query planner can conserve resources but has to carry out more complex planning decisions. In this paper, we describe SQPR, a query planner that targets DSPSs in data centre environments with heterogeneous resources. SQPR models query admission, allocation and reuse as a single constrained optimisation problem and solves an approximate version to achieve scalability. It prevents individual resources from becoming bottlenecks by re-planning past allocation decisions and supports different allocation objectives. As our experimental evaluation in comparison with a state-of-the-art planner shows SQPR makes efficient resource allocation decisions, even with a high utilisation of resources, with acceptable overheads

    Using Dedicated and Opportunistic Networks in Synergy for a Cost-effective Distributed Stream Processing Platform

    Full text link
    This paper presents a case for exploiting the synergy of dedicated and opportunistic network resources in a distributed hosting platform for data stream processing applications. Our previous studies have demonstrated the benefits of combining dedicated reliable resources with opportunistic resources in case of high-throughput computing applications, where timely allocation of the processing units is the primary concern. Since distributed stream processing applications demand large volume of data transmission between the processing sites at a consistent rate, adequate control over the network resources is important here to assure a steady flow of processing. In this paper, we propose a system model for the hybrid hosting platform where stream processing servers installed at distributed sites are interconnected with a combination of dedicated links and public Internet. Decentralized algorithms have been developed for allocation of the two classes of network resources among the competing tasks with an objective towards higher task throughput and better utilization of expensive dedicated resources. Results from extensive simulation study show that with proper management, systems exploiting the synergy of dedicated and opportunistic resources yield considerably higher task throughput and thus, higher return on investment over the systems solely using expensive dedicated resources.Comment: 9 page
    corecore