Search CORE

2,571 research outputs found

MARACAS: a real-time multicore VCPU scheduling framework

Author: Cheng Zhuoqun
West Richard
Ye Ying
Zhang Jingyi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

This paper describes a multicore scheduling and load-balancing framework called MARACAS, to address shared cache and memory bus contention. It builds upon prior work centered around the concept of virtual CPU (VCPU) scheduling. Threads are associated with VCPUs that have periodically replenished time budgets. VCPUs are guaranteed to receive their periodic budgets even if they are migrated between cores. A load balancing algorithm ensures VCPUs are mapped to cores to fairly distribute surplus CPU cycles, after ensuring VCPU timing guarantees. MARACAS uses surplus cycles to throttle the execution of threads running on specific cores when memory contention exceeds a certain threshold. This enables threads on other cores to make better progress without interference from co-runners. Our scheduling framework features a novel memory-aware scheduling approach that uses performance counters to derive an average memory request latency. We show that latency-based memory throttling is more effective than rate-based memory access control in reducing bus contention. MARACAS also supports cache-aware scheduling and migration using page recoloring to improve performance isolation amongst VCPUs. Experiments show how MARACAS reduces multicore resource contention, leading to improved task progress.http://www.cs.bu.edu/fac/richwest/papers/rtss_2016.pdfAccepted manuscrip

Crossref

Boston University Institutional Repository (OpenBU)

Functional requirements document for the Earth Observing System Data and Information System (EOSDIS) Scientific Computing Facilities (SCF) of the NASA/MSFC Earth Science and Applications Division, 1992

Author: Botts Michael E.
Parker John V.
Phillips Ron J.
Wright Patrick D.
Publication venue
Publication date
Field of study

Five scientists at MSFC/ESAD have EOS SCF investigator status. Each SCF has unique tasks which require the establishment of a computing facility dedicated to accomplishing those tasks. A SCF Working Group was established at ESAD with the charter of defining the computing requirements of the individual SCFs and recommending options for meeting these requirements. The primary goal of the working group was to determine which computing needs can be satisfied using either shared resources or separate but compatible resources, and which needs require unique individual resources. The requirements investigated included CPU-intensive vector and scalar processing, visualization, data storage, connectivity, and I/O peripherals. A review of computer industry directions and a market survey of computing hardware provided information regarding important industry standards and candidate computing platforms. It was determined that the total SCF computing requirements might be most effectively met using a hierarchy consisting of shared and individual resources. This hierarchy is composed of five major system types: (1) a supercomputer class vector processor; (2) a high-end scalar multiprocessor workstation; (3) a file server; (4) a few medium- to high-end visualization workstations; and (5) several low- to medium-range personal graphics workstations. Specific recommendations for meeting the needs of each of these types are presented

NASA Technical Reports Server

Fixed-Priority Memory-Centric Scheduler for COTS-Based Multiprocessors

Author: Bertogna Marko
Caccamo Marco
Gracioli Giovani
Kloda Tomasz
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 32nd Euromicro Conference on Real-Time Systems (ECRTS 2020)
Publication date: 01/01/2020
Field of study

Memory-centric scheduling attempts to guarantee temporal predictability on commercial-off-the-shelf (COTS) multiprocessor systems to exploit their high performance for real-time applications. Several solutions proposed in the real-time literature have hardware requirements that are not easily satisfied by modern COTS platforms, like hardware support for strict memory partitioning or the presence of scratchpads. However, even without said hardware support, it is possible to design an efficient memory-centric scheduler. In this article, we design, implement, and analyze a memory-centric scheduler for deterministic memory management on COTS multiprocessor platforms without any hardware support. Our approach uses fixed-priority scheduling and proposes a global "memory preemption" scheme to boost real-time schedulability. The proposed scheduling protocol is implemented in the Jailhouse hypervisor and Erika real-time kernel. Measurements of the scheduler overhead demonstrate the applicability of the proposed approach, and schedulability experiments show a 20% gain in terms of schedulability when compared to contention-based and static fair-share approaches

Dagstuhl Research Online Publication Server

Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics

Author: Babich Ronald
Clark Michael A.
Joó Bálint
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Graphics Processing Units (GPUs) are having a transformational effect on numerical lattice quantum chromodynamics (LQCD) calculations of importance in nuclear and particle physics. The QUDA library provides a package of mixed precision sparse matrix linear solvers for LQCD applications, supporting single GPUs based on NVIDIA's Compute Unified Device Architecture (CUDA). This library, interfaced to the QDP++/Chroma framework for LQCD calculations, is currently in production use on the "9g" cluster at the Jefferson Laboratory, enabling unprecedented price/performance for a range of problems in LQCD. Nevertheless, memory constraints on current GPU devices limit the problem sizes that can be tackled. In this contribution we describe the parallelization of the QUDA library onto multiple GPUs using MPI, including strategies for the overlapping of communication and computation. We report on both weak and strong scaling for up to 32 GPUs interconnected by InfiniBand, on which we sustain in excess of 4 Tflops.Comment: 11 pages, 7 figures, to appear in the Proceedings of Supercomputing 2010 (submitted April 12, 2010

arXiv.org e-Print Archive

CiteSeerX

Real-Time Task Migration for Dynamic Resource Management in Many-Core Systems

Author: Pourmohseni Behnaz
Smirnov Fedor
Wildermann Stefan
Publication venue: OASIcs - OpenAccess Series in Informatics. Workshop on Next Generation Real-Time Embedded Systems (NG-RES 2020)
Publication date: 01/01/2020
Field of study

Dagstuhl Research Online Publication Server

Jedule: A Tool for Visualizing Schedules of Parallel Applications

Author: Hoffmann Ralf
Hunold Sascha
Suter Frédéric
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/09/2010
Field of study

International audienceTask scheduling is one of the most prominent problems in the era of parallel computing. We find scheduling algorithms in every domain of computer science, e.g., mapping multiprocessor tasks to clusters, mapping jobs to grid resources, or mapping fine-grained tasks to cores of multicore processors. Many tools exist that help understand or debug an application by presenting visual representations of a certain program run, e.g., visualizations of MPI traces. However, often developers want to get a global and abstract view of their schedules first. In this paper we introduce Jedule, a tool dedicated to visualize schedules of parallel applications. We demonstrate the effectiveness of Jedule by showing how it helped analyzing problems in several case studies

HAL-IN2P3

A Parallel Mesh-Adaptive Framework for Hyperbolic Conservation Laws

Author: Berger
Friedel
Fryxell
Godunov
Grauer
Grauer
Groth
Hilbert
Jürgen Dreher
Keppens
Kurganov
Lax
MacNeice
Nessyahu
Powell
Rainer Grauer
Roe
Steiner
Toro
Tóth
Woodward
Ziegler
Zumbusch
Zumbusch
Publication venue: 'Elsevier BV'
Publication date: 01/02/2006
Field of study

We report on the development of a computational framework for the parallel, mesh-adaptive solution of systems of hyperbolic conservation laws like the time-dependent Euler equations in compressible gas dynamics or Magneto-Hydrodynamics (MHD) and similar models in plasma physics. Local mesh refinement is realized by the recursive bisection of grid blocks along each spatial dimension, implemented numerical schemes include standard finite-differences as well as shock-capturing central schemes, both in connection with Runge-Kutta type integrators. Parallel execution is achieved through a configurable hybrid of POSIX-multi-threading and MPI-distribution with dynamic load balancing. One- two- and three-dimensional test computations for the Euler equations have been carried out and show good parallel scaling behavior. The Racoon framework is currently used to study the formation of singularities in plasmas and fluids.Comment: late submissio

arXiv.org e-Print Archive

Crossref

CERN Document Server

An Initial Evaluation of the Tera Multithreaded Architecture and Programming System Using the C3I Parallel Benchmark Suite

Author: Burnett Sharon
Ellenbecker Marrq
Thornley John
Publication venue: IEEE Computer Society.
Publication date: 01/01/1998
Field of study

The Tera Multithreaded Architecture (MTA) is a radical new architecture intended to revolutionize high-performance computing in both the scientific and commercial marketplaces. Each processor supports 128 threads in hardware. Extremely fast thread switching is used to mask latency in a uniform-access memory system without caching. It is claimed that these hardware characteristics allow compilers to easily transform sequential programs into efficient multithreaded programs for the Tera MTA. In this paper, we attempt to provide an objective initial evaluation of the performance of the Tera multithreaded architecture and programming system for general-purpose applications. The basis of our investigation is two programs from the C3I Parallel Benchmark Suite (C3IPBS). Both these programs have previously been shown to have the potential for large-scale parallelization. We compare the performance of these programs on (i) a fast uniprocessor, (ii) two conventional shared-memory multiprocessors, and (iii) the first installed Tera MTA (at the San Diego Supercomputer Center). On these platforms, we compare the effectiveness of both automatic and manual parallelization

Crossref

Caltech Authors