Search CORE

20 research outputs found

Algorithms for Hierarchical and Semi-Partitioned Parallel Scheduling

Author: Bonifaci Vincenzo
Dangelo Gianlorenzo
Marchetti-Spaccamela Alberto
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

We propose a model for scheduling jobs in a parallel machine setting that takes into account the cost of migrations by assuming that the processing time of a job may depend on the specific set of machines among which the job is migrated. For the makespan minimization objective, the model generalizes classical scheduling problems such as unrelated parallel machine scheduling, as well as novel ones such as semi-partitioned and clustered scheduling. In the case of a hierarchical family of machines, we derive a compact integer linear programming formulation of the problem and leverage its fractional relaxation to obtain a polynomial-time 2-approximation algorithm. Extensions that incorporate memory capacity constraints are also discussed

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Efficient Machine-Independent Programming of High-Performance Multiprocessors

Author: Tseng Chau-Wen
Publication venue
Publication date: 15/10/1998
Field of study

Parallel computing is regarded by most computer scientists as the most likely approach for significantly improving computing power for scientists and engineers. Advances in programming languages and parallelizing compilers are making parallel computers easier to use by providing a high-level portable programming model that protects software investment. However, experience has shown that simply finding parallelism is not always sufficient for obtaining good performance from today's multiprocessors. The goal of this project is to develop advanced compiler analysis of data and computation decompositions, thread placement, communication, synchronization, and memory system effects needed in order to take advantage of performance-critical elements in modern parallel architectures

Digital Repository at the University of Maryland

High Performance Depthwise and Pointwise Convolutions on Mobile Devices

Author: Lo Eric
Lu Baotong
Zhang Pengfei
Publication venue
Publication date: 03/01/2020
Field of study

Lightweight convolutional neural networks (e.g., MobileNets) are specifically designed to carry out inference directly on mobile devices. Among the various lightweight models, depthwise convolution (DWConv) and pointwise convolution (PWConv) are their key operations. In this paper, we observe that the existing implementations of DWConv and PWConv are not well utilizing the ARM processors in the mobile devices, and exhibit lots of cache misses under multi-core and poor data reuse at register level. We propose techniques to re-optimize the implementations of DWConv and PWConv based on ARM architecture. Experimental results show that our implementation can respectively achieve a speedup of up to 5.5x and 2.1x against TVM (Chen et al. 2018) on DWConv and PWConv.Comment: 8 pages, Thirty-Four AAAI conference on Artificial Intelligenc

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

CRAUL: Compiler and Run-Time Integration for Adaptation under Load

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/1999
Field of study

Crossref

Un ordonnanceur flexible pour machines multiprocesseurs hiérarchiques

Author: Thibault Samuel
Publication venue: HAL CCSD
Publication date: 06/04/2005
Field of study

National audienceL'évolution des machines multiprocesseurs vers des architectures de plus en plus hiérarchiques impose, pour en tirer la quintessence, de répartir les flots d'exécution et les données avec une extrême précaution afin de réduire au maximum les accès mémoire non locaux. Les bibliothèques de multithreading actuelles fournissent très peu de fonctionnalités pour exprimer des directives de répartition au niveau applicatif, ce qui contraint les programmeurs à effectuer cette répartition explicitement en fonction de l'architecture sous-jacente, et donc de manière non portable. Dans cet article nous présentons: (1) un modèle permettant au programme d'exprimer dynamiquement la structure du calcul; (2) un ordonnanceur capable d'interpréter cette modélisation afin de prendre de judicieuses décisions de placement hiérarchisé ; (3) une implémentation au sein de la bibliothèque de threads utilisateur Marcel. Une expérimentation a été menée sur une application scientifique exécutée par une machine ccNUMA Bull NovaScale à 16 processeurs Intel Itanium II; les résultats obtenus montrent un gain de 50% par rapport à un ordonnanceur classique et sont comparables à ceux que l'on obtient en effectuant le placement « à la main », ce qui n'est pas portable

INRIA a CCSD electronic archive server

Oskar Bordeaux

A QoS Monitoring System for Dataflow Programs

Author: Frénot Stéphane
Marquet Kevin
Morel Lionel
Selva Manuel
Publication venue: HAL CCSD
Publication date: 15/01/2013
Field of study

National audienceWith the generalization of multi-core processors, dataflow programming is regaining a strong interest, especially in the context of compute intensive multimedia applications such as video decoding. How- ever, most studies focus on static approaches to the compilation and placement problems. We advocate for dynamic adaptation of dataflow applications. In this paper, we build the first step towards this goal, namely a monitoring mechanism for observing quality-of-service properties of programs at run- time. We propose a language extension for expressing simple QoS properties over dataflow programs together with a run-time mechanism for the observation of events meaningful to the QoS establishment. We show the limited impact of such mechanisms on the application overall performances

INRIA a CCSD electronic archive server