Search CORE

762 research outputs found

Performance Analysis of Modified SRPT in Multiple-Processor Multitask Scheduling

Author: Li Wenxin
Publication venue
Publication date: 15/01/2021
Field of study

In this paper we study the multiple-processor multitask scheduling problem in both deterministic and stochastic models. We consider and analyze Modified Shortest Remaining Processing Time (M-SRPT) scheduling algorithm, a simple modification of SRPT, which always schedules jobs according to SRPT whenever possible, while processes tasks in an arbitrary order. The M-SRPT algorithm is proved to achieve a competitive ratio of

\Theta(\log \alpha +\beta)

for minimizing response time, where

\alpha

denotes the ratio between maximum job workload and minimum job workload,

\beta

represents the ratio between maximum non-preemptive task workload and minimum job workload. In addition, the competitive ratio achieved is shown to be optimal (up to a constant factor), when there are constant number of machines. We further consider the problem under Poisson arrival and general workload distribution (\ie,

M/GI/N

system), and show that M-SRPT achieves asymptotic optimal mean response time when the traffic intensity

\rho

approaches

1

, if job size distribution has finite support. Beyond finite job workload, the asymptotic optimality of M-SRPT also holds for infinite job size distributions with certain probabilistic assumptions, for example,

M/M/N

system with finite task workload

arXiv.org e-Print Archive

Towards a field configurable non-homogeneous multiprocessors architecture

Author: De Giusti Marisa Raquel
Jaquenod Guillermo A.
Villagarcía Wanza Horacio Alfredo
Publication venue
Publication date: 01/01/2001
Field of study

Standard microprocessors are generally designed to deal efficiently with different types of tasks; their general purpose architecture can lead to misuse of resources, creating a large gap between the computational efficiency of microprocessors and custom silicon. The ever increasing complexity of Field Programmable Logic devices is driving the industry to look for innovative System on a Chip solutions; using programmable logic, the whole design can be tuned to the application requirements. In this paper, under the acronym MPOC (Multiprocessors On a Chip) we propose some applicable ideas on multiprocessing embedded configurable architectures, targeting System on a Programmable Chip (SOPC) cost-effective designs. Using heterogeneous medium or low performance soft-core processors instead of a single high performance processor, and some standardized communication schemes to link these multiple processors, the “best” core can be chosen for each subtask using a computational efficiency criteria, and therefore improving silicon usage. System-level design is also considered: models of tasks and links, parameterized soft-core processors, and the use of a standard HDL for system description can lead to automatic generation of the final design

Centro de Servicios en Gestión de Información

NPM-BUNDLE: Non-Preemptive Multitask Scheduling for Jobs with BUNDLE-Based Thread-Level Scheduling

Author: Fisher Nathan
Tessler Corey
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st Euromicro Conference on Real-Time Systems (ECRTS 2019)
Publication date: 01/01/2019
Field of study

The BUNDLE and BUNDLEP scheduling algorithms are cache-cognizant thread-level scheduling algorithms and associated worst case execution time and cache overhead (WCETO) techniques for hard real-time multi-threaded tasks. The BUNDLE-based approaches utilize the inter-thread cache benefit to reduce WCETO values for jobs. Currently, the BUNDLE-based approaches are limited to scheduling a single task. This work aims to expand the applicability of BUNDLE-based scheduling to multiple task multi-threaded task sets. BUNDLE-based scheduling leverages knowledge of potential cache conflicts to selectively preempt one thread in favor of another from the same job. This thread-level preemption is a requirement for the run-time behavior and WCETO calculation to receive the benefit of BUNDLE-based approaches. This work proposes scheduling BUNDLE-based jobs non-preemptively according to the earliest deadline first (EDF) policy. Jobs are forbidden from preempting one another, while threads within a job are allowed to preempt other threads. An accompanying schedulability test is provided, named Threads Per Job (TPJ). TPJ is a novel schedulability test, input is a task set specification which may be transformed (under certain restrictions); dividing threads among tasks in an effort to find a feasible task set. Enhanced by the flexibility to transform task sets and taking advantage of the inter-thread cache benefit, the evaluation shows TPJ scheduling task sets fully preemptive EDF cannot

Dagstuhl Research Online Publication Server

A scheduling theory framework for GPU tasks eﬃcient execution

Author: A Allahverdi
AJ Lázaro-Muñoz
DS Palmer
E Taillard
J Framinan
J Zhong
JW Tukey
M Nawaz
M Pinedo
MR Garey
R Graham
R Ruiz
Publication venue
Publication date: 01/01/2018
Field of study

Concurrent execution of tasks in GPUs can reduce the computation time of a workload by overlapping data transfer and execution commands. However it is diﬃcult to implement an eﬃcient run- time scheduler that minimizes the workload makespan as many execution orderings should be evaluated. In this paper, we employ scheduling theory to build a model that takes into account the device capabili- ties, workload characteristics, constraints and objec- tive functions. In our model, GPU tasks schedul- ing is reformulated as a ﬂow shop scheduling prob- lem, which allow us to apply and compare well known methods already developed in the operations research ﬁeld. In addition we develop a new heuristic, specif- ically focused on executing GPU commands, that achieves better scheduling results than previous tech- niques. Finally, a comprehensive evaluation, showing the suitability and robustness of this new approach, is conducted in three diﬀerent NVIDIA architectures (Kepler, Maxwell and Pascal).Proyecto TIN2016- 0920R, Universidad de Málaga (Campus de Excelencia Internacional Andalucía Tech) y programa de donación de NVIDIA Corporation

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional Universidad de Málaga

The ROSACE Case Study: From Simulink Specification to Multi/Many-Core Execution

Author: Gratia Romain
Noulard Eric
Pagetti Claire
Saussié David
Siron Pierre
Publication venue
Publication date: 01/01/2014
Field of study

This paper presents a complete case study - named ROSACE for Research Open-Source Avionics and Control Engineering - that goes from a baseline flight controller, developed in MATLAB/SIMULINK, to a multi-periodic controller executing on a multi/many-core target. The interactions between control and computer engineers are highlighted during the development steps, in particular by investigating several multi-periodic configurations. We deduced ways to improve the discussion between engineers in order to ease the integration on the target. The whole case study is made available to the community under an open-source license

Open Archive Toulouse Archive Ouverte

PolyPublie

The exploitation of parallelism on shared memory multiprocessors

Author: Stoker Michael Allan
Publication venue: Newcastle University
Publication date: 01/01/1990
Field of study

PhD ThesisWith the arrival of many general purpose shared memory multiple processor (multiprocessor) computers into the commercial arena during the mid-1980's, a rift has opened between the raw processing power offered by the emerging hardware and the relative inability of its operating software to effectively deliver this power to potential users. This rift stems from the fact that, currently, no computational model with the capability to elegantly express parallel activity is mature enough to be universally accepted, and used as the basis for programming languages to exploit the parallelism that multiprocessors offer. To add to this, there is a lack of software tools to assist programmers in the processes of designing and debugging parallel programs. Although much research has been done in the field of programming languages, no undisputed candidate for the most appropriate language for programming shared memory multiprocessors has yet been found. This thesis examines why this state of affairs has arisen and proposes programming language constructs, together with a programming methodology and environment, to close the ever widening hardware to software gap. The novel programming constructs described in this thesis are intended for use in imperative languages even though they make use of the synchronisation inherent in the dataflow model by using the semantics of single assignment when operating on shared data, so giving rise to the term shared values. As there are several distinct parallel programming paradigms, matching flavours of shared value are developed to permit the concise expression of these paradigms.The Science and Engineering Research Council

Newcastle University eTheses

Efficient Exploration of Bus-Based System-on-Chip Architectures

Author: Ha Soonhoi
Kim Sungchan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2006
Field of study

Separation between computation and communication in system design allows system designers to explore the communication architecture independently after component selection and mapping decision is made. In this paper, we present an iterative two-step exploration methodology for bus-based on-chip communication architecture for multitask applications. We assume that the memory traces from the processing components are given. The proposed methodology uses a static performance estimation technique extended for multitask applications to reduce the design space quickly and drastically and applies a trace-driven simulation to the reduced set of design candidates for accurate performance estimation. For the case that local memory traffics as well as shared memory traffics are involved in bus contention, memory allocation is considered as an important axis of the design space in our technique. Experimental results show that the proposed methodology achieves significant performance gain by optimizing on-chip communication only, up to almost 100% compared with an initial single shared bus architecture, in both two real-life examples, a four-Channel digital video recorder and an equalizer for OFDM DVB-T receiverThis work was supported by the National Research Laboratory Program under Grant M1-0104-00-0015 and the IT Leading Research and Development Support Project funded by Korean MIC

SNU Open Repository and Archive

HAL-ASOS accelerator model: evolutive elasticity by design

Author: Cabral Jorge
Cardoso Paulo
Pinto Paulo
Silva Vítor Alberto Teixeira
Tavares Adriano
Publication venue: 'MDPI AG'
Publication date: 01/08/2021
Field of study

To address the integration of software threads and hardware accelerators into the Linux Operating System (OS) programming models, an accelerator architecture is proposed, based on micro-programmable hardware system calls, which fully export these resources into the Linux OS user-space through a design-specific virtual file system. The proposed HAL-ASOS accelerator model is split into a user-defined Hardware Task and a parameterizable Hardware Kernel with three differentiated transfer channels, aiming to explore distinct BUS technology interfaces and promote the accelerator to a first-class computing unit. This paper focuses on the Hardware Kernel and mainly its microcode control unit, which will leverage the elasticity to naturally evolve with Linux OS through key differentiating capabilities of field programmable gate arrays (FPGAs) when compared to the state of the art. To comply with the evolutive nature of Linux OS, or any Hardware Task incremental features, the proposed model generates page-faults signaling runtime errors that are handled at the kernel level as part of the virtual file system runtime. To evaluate the accelerator model’s programmability and its performance, a client-side application based on the AES 128-bit algorithm was implemented. Experiments demonstrate a flexible design approach in terms of hardware and software reconfiguration and significant performance increases consistent with rising processing demands or clock design frequencies.This work has been supported by FCT-Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020

Multidisciplinary Digital Publishing Institute

Universidade do Minho: RepositoriUM

Directory of Open Access Journals