Search CORE

1,817 research outputs found

Optimal processor assignment for pipeline computations

Author: Choudhury Alok N.
Narahari Bhagirath
Nicol David M.
Simha Rahul
Publication venue
Publication date: 01/01/1991
Field of study

The availability of large scale multitasked parallel architectures introduces the following processor assignment problem for pipelined computations. Given a set of tasks and their precedence constraints, along with their experimentally determined individual responses times for different processor sizes, find an assignment of processor to tasks. Two objectives are of interest: minimal response given a throughput requirement, and maximal throughput given a response time requirement. These assignment problems differ considerably from the classical mapping problem in which several tasks share a processor; instead, it is assumed that a large number of processors are to be assigned to a relatively small number of tasks. Efficient assignment algorithms were developed for different classes of task structures. For a p processor system and a series parallel precedence graph with n constituent tasks, an O(np2) algorithm is provided that finds the optimal assignment for the response time optimization problem; it was found that the assignment optimizing the constrained throughput in O(np2log p) time. Special cases of linear, independent, and tree graphs are also considered

NASA Technical Reports Server

Syracuse University Research Facility and Collaborative Environment

Partitioning SKA Dataflows for Optimal Graph Execution

Author: Bateni M.
Cameron K.
Cong J.
Fulkerson D. R.
Liou J.-C.
Marcus D.
Towsley D.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 19/05/2018
Field of study

Optimizing data-intensive workflow execution is essential to many modern scientific projects such as the Square Kilometre Array (SKA), which will be the largest radio telescope in the world, collecting terabytes of data per second for the next few decades. At the core of the SKA Science Data Processor is the graph execution engine, scheduling tens of thousands of algorithmic components to ingest and transform millions of parallel data chunks in order to solve a series of large-scale inverse problems within the power budget. To tackle this challenge, we have developed the Data Activated Liu Graph Engine (DALiuGE) to manage data processing pipelines for several SKA pathfinder projects. In this paper, we discuss the DALiuGE graph scheduling sub-system. By extending previous studies on graph scheduling and partitioning, we lay the foundation on which we can develop polynomial time optimization methods that minimize both workflow execution time and resource footprint while satisfying resource constraints imposed by individual algorithms. We show preliminary results obtained from three radio astronomy data pipelines.Comment: Accepted in HPDC ScienceCloud 2018 Worksho

arXiv.org e-Print Archive

Crossref

Inherently workload-balanced clustered microarchitecture

Author: Abella Ferrer Jaume
González Colás Antonio María
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

The performance of clustered microarchitectures relies on steering schemes that try to find the best trade-off between workload balance and inter-cluster communication penalties. In previously proposed clustered processors, reducing communication penalties and balancing the workload are opposite targets, since improving one usually implies a detriment in the other. In this paper we propose a new clustered microarchitecture that can minimize communication penalties without compromising workload balance. The key idea is to arrange the clusters in a ring topology in such a way that results of one cluster can be forwarded to the neighbor cluster with a very short latency. In this way, minimizing communication penalties is favored when the producer of a value and its consumer are placed in adjacent clusters, which also favors workload balance. The proposed microarchitecture is shown to outperform a state-of-the-art clustered processor. For instance, for an 8-cluster configuration and just one fully pipelined unidirectional bus, 15% speedup is achieved on average for FP programs.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Enabling Network Security in HPC Systems Using Heterogeneous CMPs

Author: Balakrishnan S.
Datar S.
Kumar R.
Liu F.
Publication venue: 'Wiley'
Publication date: 01/01/2014
Field of study

This chapter explores the possibility of using heterogeneous chip multiprocessors (CMPs) for network and system security. It proposes an integer linear programming (ILP)-based methodology to mathematically analyze and provide heterogeneous CMP architectures and task distributions that can reduce the energy consumption of the system. It compares heterogeneous CMPs with homogeneous counterparts and provides experimental evaluation of using both on network security systems. The details of heterogeneous NoC (network-on-chip)-based CMP architecture are discussed in detail. The chapter also discusses the heterogeneous CMP-based network security processor design and advantages. It summarizes the related work on heterogeneous processors in general and their benefits, and explores the related studies on CMP network security processors. The chapter finally indicates that heterogeneous CMPs reduce the energy consumption dramatically compared to homogeneous CMPs. © 2014 John Wiley & Sons, Inc

Crossref

Bilkent University Institutional Repository