Motivated by emerging big streaming data processing paradigms (e.g., Twitter
Storm, Streaming MapReduce), we investigate the problem of scheduling graphs
over a large cluster of servers. Each graph is a job, where nodes represent
compute tasks and edges indicate data-flows between these compute tasks. Jobs
(graphs) arrive randomly over time, and upon completion, leave the system. When
a job arrives, the scheduler needs to partition the graph and distribute it
over the servers to satisfy load balancing and cost considerations.
Specifically, neighboring compute tasks in the graph that are mapped to
different servers incur load on the network; thus a mapping of the jobs among
the servers incurs a cost that is proportional to the number of "broken edges".
We propose a low complexity randomized scheduling algorithm that, without
service preemptions, stabilizes the system with graph arrivals/departures; more
importantly, it allows a smooth trade-off between minimizing average
partitioning cost and average queue lengths. Interestingly, to avoid service
preemptions, our approach does not rely on a Gibbs sampler; instead, we show
that the corresponding limiting invariant measure has an interpretation
stemming from a loss system.Comment: 14 page