Search CORE

1,296 research outputs found

SELFISHMIGRATE: A Scalable Algorithm for Non-clairvoyantly Scheduling Heterogeneous Processors

Author: Im Sungjin
Kulkarni Janardhan
Munagala Kamesh
Pruhs Kirk
Publication venue
Publication date: 01/01/2014
Field of study

We consider the classical problem of minimizing the total weighted flow-time for unrelated machines in the online \emph{non-clairvoyant} setting. In this problem, a set of jobs

J

arrive over time to be scheduled on a set of

M

machines. Each job

j

has processing length

p_j

, weight

w_j

, and is processed at a rate of

\ell_{ij}

when scheduled on machine

i

. The online scheduler knows the values of

w_j

and

\ell_{ij}

upon arrival of the job, but is not aware of the quantity

p_j

. We present the {\em first} online algorithm that is {\em scalable} ((1+\eps)-speed

O(\frac{1}{\epsilon^2})

-competitive for any constant \eps > 0) for the total weighted flow-time objective. No non-trivial results were known for this setting, except for the most basic case of identical machines. Our result resolves a major open problem in online scheduling theory. Moreover, we also show that no job needs more than a logarithmic number of migrations. We further extend our result and give a scalable algorithm for the objective of minimizing total weighted flow-time plus energy cost for the case of unrelated machines and obtain a scalable algorithm. The key algorithmic idea is to let jobs migrate selfishly until they converge to an equilibrium. Towards this end, we define a game where each job's utility which is closely tied to the instantaneous increase in the objective the job is responsible for, and each machine declares a policy that assigns priorities to jobs based on when they migrate to it, and the execution speeds. This has a spirit similar to coordination mechanisms that attempt to achieve near optimum welfare in the presence of selfish agents (jobs). To the best our knowledge, this is the first work that demonstrates the usefulness of ideas from coordination mechanisms and Nash equilibria for designing and analyzing online algorithms

arXiv.org e-Print Archive

CiteSeerX

Crossref

Power Modeling for Heterogeneous Processors

Author: Jason Anderson
Natalie Enright Jerger
Tahir Diop
Publication venue
Publication date: 01/01/2014
Field of study

As power becomes an ever more important design consideration, there is a need for accurate power models at all stages of the design process. While power models are available for CPUs and GPUs, only simple models are available for heterogeneous processors. We present a micro-benchmarkbased modeling technique that can be used for chip multiprocessor (CMPs) and accelerated processing units (APUs). We use our approach to model power on an Intel Xeon CPU and an AMD Fusion heterogeneous processor. The resulting error rate for the Xeon’s model is below 3 % and is only 7% for the Fusion. We also present a method to reduce the number of benchmarks required to create these models. Instead of running micro-benchmarks for every combination of factors (e.g. different operations or memory access patterns), we cluster similar micro-benchmarks to avoid unnecessary simulations. We show that it is possible to eliminate as many as 93 % of the compute micro-benchmarks, while still producing power models having less than 10 % error rate

CiteSeerX

Crossref

Speculative Segmented Sum for Sparse Matrix-Vector Multiplication on Heterogeneous Processors

Author: Liu Weifeng
Vinter Brian
Publication venue: 'Elsevier BV'
Publication date: 14/09/2015
Field of study

Sparse matrix-vector multiplication (SpMV) is a central building block for scientific software and graph applications. Recently, heterogeneous processors composed of different types of cores attracted much attention because of their flexible core configuration and high energy efficiency. In this paper, we propose a compressed sparse row (CSR) format based SpMV algorithm utilizing both types of cores in a CPU-GPU heterogeneous processor. We first speculatively execute segmented sum operations on the GPU part of a heterogeneous processor and generate a possibly incorrect results. Then the CPU part of the same chip is triggered to re-arrange the predicted partial sums for a correct resulting vector. On three heterogeneous processors from Intel, AMD and nVidia, using 20 sparse matrices as a benchmark suite, the experimental results show that our method obtains significant performance improvement over the best existing CSR-based SpMV algorithms. The source code of this work is downloadable at https://github.com/bhSPARSE/Benchmark_SpMV_using_CSRComment: 22 pages, 8 figures, Published at Parallel Computing (PARCO

arXiv.org e-Print Archive

Copenhagen University Research Information System

Scalably Scheduling Power-Heterogeneous Processors

Author: F.A. Bower
G. Greiner
K. Pruhs
K. Pruhs
L. Becchetti
L.L. Andrew
N. Bansal
R. Kumar
R. Kumar
S. Leonardi
T.Y. Morad
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Crossref

Scaling CUDA for Distributed Heterogeneous Processors

Author: Lam Siu Kwan
Publication venue: SJSU ScholarWorks
Publication date: 01/01/2012
Field of study

The mainstream acceptance of heterogeneous computing and cloud computing is prompting a future of distributed heterogeneous systems. With current software development tools, programming such complex systems is difficult and requires an extensive knowledge of network and processor architectures. Providing an abstraction of the underlying network, message-passing interface (MPI) has been the standard tool for developing distributed applications in the high performance community. The problem of MPI lies with its message-passing model, which is less expressive than the shared-memory model. Development of heterogeneous programming tools, such as OpenCL, has only begun recently. This thesis presents Phalanx, a framework that extends the virtual architecture of CUDA for distributed heterogeneous systems. Using MPI, Phalanx transparently handles intercommunication among distributed nodes. By using the shared-memory model, Phalanx simplifies the development of distributed applications without sacrificing the advantages of MPI. In one of the case studies, Phalanx achieves 28x speedup compared with serial execution on a Core-i7 processor

SJSU ScholarWorks

Saber: window-based hybrid stream processing for heterogeneous architectures

Author: Costa P
Fernandez R
Koliousis A
Pietzuch P
Weidlich M
Wolf A
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/09/2015
Field of study

Modern servers have become heterogeneous, often combining multicore CPUs with many-core GPGPUs. Such heterogeneous architectures have the potential to improve the performance of data-intensive stream processing applications, but they are not supported by current relational stream processing engines. For an engine to exploit a heterogeneous architecture, it must execute streaming SQL queries with sufficient data-parallelism to fully utilise all available heterogeneous processors, and decide how to use each in the most effective way. It must do this while respecting the semantics of streaming SQL queries, in particular with regard to window handling. We describe SABER, a hybrid high-performance relational stream processing engine for CPUs and GPGPUs. SABER executes windowbased streaming SQL queries in a data-parallel fashion using all available CPU and GPGPU cores. Instead of statically assigning query operators to heterogeneous processors, SABER employs a new adaptive heterogeneous lookahead scheduling strategy, which increases the share of queries executing on the processor that yields the highest performance. To hide data movement costs, SABER pipelines the transfer of stream data between different memory types and the CPU/GPGPU. Our experimental comparison against state-ofthe-art engines shows that SABER increases processing throughput while maintaining low latency for a wide range of streaming SQL queries with small and large windows sizes

Spiral - Imperial College Digital Repository

Scheduling Fork-Join Task Graphs to Heterogeneous Processors

Author: Sinnen Oliver
Wang Huijun
Publication venue
Publication date: 27/05/2023
Field of study

The scheduling of task graphs with communication delays has been extensively studied. Recently, new results for the common sub-case of fork-join shaped task graphs were published, including an EPTAS and polynomial algorithms for special cases. These new results modelled the target architecture to consist of homogeneous processors. However, forms of heterogeneity become more and more common in contemporary parallel systems, such as CPU--accelerator systems, with their two types of resources. In this work, we study the scheduling of fork-join task graphs with communication delays, which is representative of highly parallel workloads, onto heterogeneous systems of related processors. We present an EPAS, and some polynomial time algorithms for special cases, such as with equal processing costs or unlimited resources. Lastly, we briefly look at the above described case of two resource-types and its implications. It is interesting to note, that all results here also apply to scheduling independent tasks with release times and deadlines.Comment: 14 page

arXiv.org e-Print Archive

Dynamic Thermal Management Of Vertically Stacked Heterogeneous Processors

Author: Sharma Ajay
Publication venue: eGrove
Publication date: 01/01/2016
Field of study

eGrove (Univ. of Mississippi)