84,766 research outputs found
Generalizable Resource Allocation in Stream Processing via Deep Reinforcement Learning
This paper considers the problem of resource allocation in stream processing,
where continuous data flows must be processed in real time in a large
distributed system. To maximize system throughput, the resource allocation
strategy that partitions the computation tasks of a stream processing graph
onto computing devices must simultaneously balance workload distribution and
minimize communication. Since this problem of graph partitioning is known to be
NP-complete yet crucial to practical streaming systems, many heuristic-based
algorithms have been developed to find reasonably good solutions. In this
paper, we present a graph-aware encoder-decoder framework to learn a
generalizable resource allocation strategy that can properly distribute
computation tasks of stream processing graphs unobserved from training data.
We, for the first time, propose to leverage graph embedding to learn the
structural information of the stream processing graphs. Jointly trained with
the graph-aware decoder using deep reinforcement learning, our approach can
effectively find optimized solutions for unseen graphs. Our experiments show
that the proposed model outperforms both METIS, a state-of-the-art graph
partitioning algorithm, and an LSTM-based encoder-decoder model, in about 70%
of the test cases.Comment: Accepted by AAAI 202
Distributed Resource Allocation for Stream Data Processing
Abstract. Data streaming applications are becoming more and more common due to the rapid development in the areas such as sensor net-works, multimedia streaming, and on-line data mining, etc. These ap-plications are often running in a decentralized, distributed environment. The requirements for processing large volumes of streaming data at real time have posed many great design challenges. It is critical to optimize the ongoing resource consumption of multiple, distributed, cooperating, processing units. In this paper, we consider a generic model for the gen-eral stream data processing systems. We address the resource alloca-tion problem for a collection of processing units so as to maximize the weighted sum of the throughput of different streams. Each processing unit may require multiple input data streams simultaneously and pro-duce one or many valuable output streams. Data streams flow through such a system after processing at multiple processing units. Based on this framework, we develop distributed algorithms for finding the best resource allocation schemes in such data stream processing networks. Performance analysis on the optimality and complexity of these algo-rithms are also provided
Resource Allocation Optimization through Task Based Scheduling Algorithms in Distributed Real Time Embedded Systems
Distributed embedded system is a type of distributed system, which consists of a large number of nodes, each node having lower computational power when compared
to a node of a regular distributed system (like a cluster). A real time system is the one where every task has an associated dead line and the system works with a continuous stream of data supplied in real time.Such systems find wide applications in various fields such as automobile industry as fly-by-wire,brake-by-wire and steer-by-wire systems. Scheduling and efficient allocation of resources is extremely important in such systems because a distributed embedded real time system must deliver its output within a certain time frame, failing which the output becomes useless.In this paper, we have taken up processing unit number as a resource and have optimized the allocation of it to the various tasks.We use techniques such as model-based redundancy,heartbeat monitoring and check-pointing for fault detection and failure recovery.Our fault tolerance framework uses an existing list-based scheduling algorithm for task scheduling.This helps in diagnosis and shutting down of faulty actuators before the system becomes unsafe. The framework is designed and tested using a new simulation model consisting of virtual nodes working on a message passing system
Optimality Properties, Distributed Strategies, and Measurement-Based Evaluation of Coordinated Multicell OFDMA Transmission
The throughput of multicell systems is inherently limited by interference and
the available communication resources. Coordinated resource allocation is the
key to efficient performance, but the demand on backhaul signaling and
computational resources grows rapidly with number of cells, terminals, and
subcarriers. To handle this, we propose a novel multicell framework with
dynamic cooperation clusters where each terminal is jointly served by a small
set of base stations. Each base station coordinates interference to neighboring
terminals only, thus limiting backhaul signalling and making the framework
scalable. This framework can describe anything from interference channels to
ideal joint multicell transmission.
The resource allocation (i.e., precoding and scheduling) is formulated as an
optimization problem (P1) with performance described by arbitrary monotonic
functions of the signal-to-interference-and-noise ratios (SINRs) and arbitrary
linear power constraints. Although (P1) is non-convex and difficult to solve
optimally, we are able to prove: 1) Optimality of single-stream beamforming; 2)
Conditions for full power usage; and 3) A precoding parametrization based on a
few parameters between zero and one. These optimality properties are used to
propose low-complexity strategies: both a centralized scheme and a distributed
version that only requires local channel knowledge and processing. We evaluate
the performance on measured multicell channels and observe that the proposed
strategies achieve close-to-optimal performance among centralized and
distributed solutions, respectively. In addition, we show that multicell
interference coordination can give substantial improvements in sum performance,
but that joint transmission is very sensitive to synchronization errors and
that some terminals can experience performance degradations.Comment: Published in IEEE Transactions on Signal Processing, 15 pages, 7
figures. This version corrects typos related to Eq. (4) and Eq. (28
DRS: Dynamic Resource Scheduling for Real-Time Analytics over Fast Streams
In a data stream management system (DSMS), users register continuous queries,
and receive result updates as data arrive and expire. We focus on applications
with real-time constraints, in which the user must receive each result update
within a given period after the update occurs. To handle fast data, the DSMS is
commonly placed on top of a cloud infrastructure. Because stream properties
such as arrival rates can fluctuate unpredictably, cloud resources must be
dynamically provisioned and scheduled accordingly to ensure real-time response.
It is quite essential, for the existing systems or future developments, to
possess the ability of scheduling resources dynamically according to the
current workload, in order to avoid wasting resources, or failing in delivering
correct results on time. Motivated by this, we propose DRS, a novel dynamic
resource scheduler for cloud-based DSMSs. DRS overcomes three fundamental
challenges: (a) how to model the relationship between the provisioned resources
and query response time (b) where to best place resources; and (c) how to
measure system load with minimal overhead. In particular, DRS includes an
accurate performance model based on the theory of \emph{Jackson open queueing
networks} and is capable of handling \emph{arbitrary} operator topologies,
possibly with loops, splits and joins. Extensive experiments with real data
confirm that DRS achieves real-time response with close to optimal resource
consumption.Comment: This is the our latest version with certain modificatio
Model-driven Scheduling for Distributed Stream Processing Systems
Distributed Stream Processing frameworks are being commonly used with the
evolution of Internet of Things(IoT). These frameworks are designed to adapt to
the dynamic input message rate by scaling in/out.Apache Storm, originally
developed by Twitter is a widely used stream processing engine while others
includes Flink, Spark streaming. For running the streaming applications
successfully there is need to know the optimal resource requirement, as
over-estimation of resources adds extra cost.So we need some strategy to come
up with the optimal resource requirement for a given streaming application. In
this article, we propose a model-driven approach for scheduling streaming
applications that effectively utilizes a priori knowledge of the applications
to provide predictable scheduling behavior. Specifically, we use application
performance models to offer reliable estimates of the resource allocation
required. Further, this intuition also drives resource mapping, and helps
narrow the estimated and actual dataflow performance and resource utilization.
Together, this model-driven scheduling approach gives a predictable application
performance and resource utilization behavior for executing a given DSPS
application at a target input stream rate on distributed resources.Comment: 54 page
SQPR: Stream Query Planning with Reuse
When users submit new queries to a distributed stream processing system (DSPS), a query planner must allocate physical resources, such as CPU cores, memory and network bandwidth, from a set of hosts to queries. Allocation decisions must provide the correct mix of resources required by queries, while achieving an efficient overall allocation to scale in the number of admitted queries. By exploiting overlap between queries and reusing partial results, a query planner can conserve resources but has to carry out more complex planning decisions. In this paper, we describe SQPR, a query planner that targets DSPSs in data centre environments with heterogeneous resources. SQPR models query admission, allocation and reuse as a single constrained optimisation problem and solves an approximate version to achieve scalability. It prevents individual resources from becoming bottlenecks by re-planning past allocation decisions and supports different allocation objectives. As our experimental evaluation in comparison with a state-of-the-art planner shows SQPR makes efficient resource allocation decisions, even with a high utilisation of resources, with acceptable overheads
Using Dedicated and Opportunistic Networks in Synergy for a Cost-effective Distributed Stream Processing Platform
This paper presents a case for exploiting the synergy of dedicated and
opportunistic network resources in a distributed hosting platform for data
stream processing applications. Our previous studies have demonstrated the
benefits of combining dedicated reliable resources with opportunistic resources
in case of high-throughput computing applications, where timely allocation of
the processing units is the primary concern. Since distributed stream
processing applications demand large volume of data transmission between the
processing sites at a consistent rate, adequate control over the network
resources is important here to assure a steady flow of processing. In this
paper, we propose a system model for the hybrid hosting platform where stream
processing servers installed at distributed sites are interconnected with a
combination of dedicated links and public Internet. Decentralized algorithms
have been developed for allocation of the two classes of network resources
among the competing tasks with an objective towards higher task throughput and
better utilization of expensive dedicated resources. Results from extensive
simulation study show that with proper management, systems exploiting the
synergy of dedicated and opportunistic resources yield considerably higher task
throughput and thus, higher return on investment over the systems solely using
expensive dedicated resources.Comment: 9 page
- …