Search CORE

15 research outputs found

Many-Core Scheduling of Data Parallel Applications Using SMT Solvers

Author: Ioannis Galanommatis
Oded Maler
Peter Poplavko
Pranav Tendulkar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Abstract—To program recently developed many-core systems-on-chip two traditionally separate performance optimization problems have to be solved together. Firstly, it is the parallel scheduling on a shared-memory multi-core system. Secondly, it is the co-scheduling of network communication and processor computation. This is because many-core systems are networks of multi-core clusters. In this paper, we demonstrate the applicabil-ity of modern constraint solvers to efficiently schedule parallel applications on many-cores and validate the results by running benchmarks on a real many-core platform. Index Terms—task graph, scheduling, multiprocessor, DMA I

CiteSeerX

Crossref

Localizing FRBs through VLBI with the Algonquin Radio Observatory 10 m Telescope

Author: Baker D.
Bandura Kevin
Berger S.
Bij A.
Boyle P. J.
Brar Charanjot
Cary S.
Cassanelli T.
Chatterjee S.
Cubranic D.
Dobbs Matt
Fonseca E.
Gill A.
Good D. C.
Kaczmarek J. F.
Kaspi V. M.
Landecker T. L.
Lanman A. E.
Leung Calvin
Li Dongzi
Lin H. H.
Luo Jing
Masui Kiyoshi W.
McKee J. W.
Mena-Parra J.
Meyers B. W.
Michilli D.
Naidu Arun
Ng Cherry
Patel Chitrang
Pearlman Aaron B.
Pen U. L.
Pleunis Ziggy
Quine Brendan
Rahman M.
Renard A.
Sanghavi Pranav
Smith K. M.
Stairs Ingrid
Tendulkar Shriharsh P.
Vanderlinde K.
Publication venue: 'American Astronomical Society'
Publication date: 14/01/2022
Field of study

The Canadian Hydrogen Intensity Mapping Experiment (CHIME)/FRB experiment has detected thousands of fast radio bursts (FRBs) due to its sensitivity and wide field of view; however, its low angular resolution prevents it from localizing events to their host galaxies. Very long baseline interferometry (VLBI), triggered by FRB detections from CHIME/FRB will solve the challenge of localization for non-repeating events. Using a refurbished 10 m radio dish at the Algonquin Radio Observatory located in Ontario Canada, we developed a testbed for a VLBI experiment with a theoretical λ/D ≲ 30 mas. We provide an overview of the 10 m system and describe its refurbishment, the data acquisition, and a procedure for fringe fitting that simultaneously estimates the geometric delay used for localization and the dispersive delay from the ionosphere. Using single pulses from the Crab pulsar, we validate the system and localization procedure, and analyze the clock stability between sites, which is critical for coherently delay referencing an FRB event. We find a localization of ∼200 mas is possible with the performance of the current system (single-baseline). Furthermore, for sources with insufficient signal or restricted wideband to simultaneously measure both geometric and ionospheric delays, we show that the differential ionospheric contribution between the two sites must be measured to a precision of 1 × 10-8 pc cm-3 to provide a reasonable localization from a detection in the 400-800 MHz band. Finally we show detection of an FRB observed simultaneously in the CHIME and the Algonquin 10 m telescope, the first non-repeating FRB in this long baseline. This project serves as a testbed for the forthcoming CHIME/FRB Outriggers project

arXiv.org e-Print Archive

Repository@Hull - Worktribe

A fast radio burst localized at detection to a galactic disk using very long baseline interferometry

Author: Bandura Kevin
Berger Sabrina
Bhardwaj Mohit
Boyle P. J.
Brar Charanjot
Breitman Daniela
Cary Savannah
Cassanelli Tomas
Chatterjee Shami
Chawla Pragya
Curtin Alice P.
Dobbs Matt
Dong Fengqiu Adam
Fonseca Emmanuel
Gaensler B. M.
Ibik Adaeze
Kaczmarek Jane
Kaspi Victoria M.
Kholoud Khairy
Landecker T. L.
Lanman Adam E.
Lazda Mattias
Leung Calvin
Lin Hsiu-Hsien
Luo Jing
Masui Kiyoshi W.
Mckinven Ryan
Mena-Parra Juan
Meyers Bradley W.
Michilli Daniele
Milutinovic Nikola
Ng Cherry
Noble Gavin
Patel Chitrang
Pearlman Aaron B.
Pen Ue-Li
Peterson Jeffrey B.
Petroff Emily
Pleunis Ziggy
Quine Brendan
Rafiei-Ravandi Masoud
Rahman Mubdi
Renard Andre
Sand Ketan R.
Sanghavi Pranav
Schoen Eve
Scholz Paul
Shin Kaitlyn
Smith Kendrick M.
Stairs Ingrid
Tendulkar Shriharsh P.
Vanderlinde Keith
Publication venue
Publication date: 22/07/2023
Field of study

Fast radio bursts (FRBs) are millisecond-duration, luminous radio transients of extragalactic origin. These events have been used to trace the baryonic structure of the Universe using their dispersion measure (DM) assuming that the contribution from host galaxies can be reliably estimated. However, contributions from the immediate environment of an FRB may dominate the observed DM, thus making redshift estimates challenging without a robust host galaxy association. Furthermore, while at least one Galactic burst has been associated with a magnetar, other localized FRBs argue against magnetars as the sole progenitor model. Precise localization within the host galaxy can discriminate between progenitor models, a major goal of the field. Until now, localizations on this spatial scale have only been carried out in follow-up observations of repeating sources. Here we demonstrate the localization of FRB 20210603A with very long baseline interferometry (VLBI) on two baselines, using data collected only at the time of detection. We localize the burst to SDSS J004105.82+211331.9, an edge-on galaxy at

z\approx 0.177

, and detect recent star formation in the kiloparsec-scale vicinity of the burst. The edge-on inclination of the host galaxy allows for a unique comparison between the line of sight towards the FRB and lines of sight towards known Galactic pulsars. The DM, Faraday rotation measure (RM), and scattering suggest a progenitor coincident with the host galactic plane, strengthening the link between the environment of FRB 20210603A and the disk of its host galaxy. Single-pulse VLBI localizations of FRBs to within their host galaxies, following the one presented here, will further constrain the origins and host environments of one-off FRBs.Comment: 40 pages, 13 figures, submitted. Fixed typo in abstrac

arXiv.org e-Print Archive

CHIME/FRB Discovery of 25 Repeating Fast Radio Burst Sources

Author: :
Andersen Bridget C.
Bandura Kevin
Bhardwaj Mohit
Boyle P. J.
Brar Charanjot
Cassanelli Tomas
Chatterjee S.
Chawla Pragya
Collaboration The CHIME/FRB
Cook Amanda M.
Curtin Alice P.
Dobbs Matt
Dong Fengqiu Adam
Faber Jakob T.
Fandino Mateus
Fonseca Emmanuel
Gaensler B. M.
Giri Utkarsh
Herrera-Martin Antonio
Hill Alex S.
Ibik Adaeze
Josephy Alexander
Kaczmarek Jane F.
Kader Zarif
Kaspi Victoria
Landecker T. L.
Lanman Adam E.
Lazda Mattias
Leung Calvin
Lin Hsiu-Hsien
Masui Kiyoshi W.
Mckinven Ryan
Mena-Parra Juan
Meyers Bradley W.
Michilli D.
Ng Cherry
Pandhi Ayush
Pearlman Aaron B.
Pen Ue-Li
Petroff Emily
Pleunis Ziggy
Rafiei-Ravandi Masoud
Rahman Mubdi
Ransom Scott M.
Renard Andre
Sand Ketan R.
Sanghavi Pranav
Scholz Paul
Shah Vishwangi
Shin Kaitlyn
Siegel Seth
Stairs Ingrid
Su Jianing
Tendulkar Shriharsh P.
Vanderlinde Keith
Wang Haochen
Wulf Dallas
Zwaniga Andrew
Publication venue
Publication date: 01/01/2023
Field of study

We present the discovery of 25 new repeating fast radio burst (FRB) sources found among CHIME/FRB events detected between 2019 September 30 and 2021 May 1. The sources were found using a new clustering algorithm that looks for multiple events co-located on the sky having similar dispersion measures (DMs). The new repeaters have DMs ranging from

\sim

220 pc cm

^{-3}

\sim

1700 pc cm

^{-3}

, and include sources having exhibited as few as two bursts to as many as twelve. We report a statistically significant difference in both the DM and extragalactic DM (eDM) distributions between repeating and apparently nonrepeating sources, with repeaters having lower mean DM and eDM, and we discuss the implications. We find no clear bimodality between the repetition rates of repeaters and upper limits on repetition from apparently nonrepeating sources after correcting for sensitivity and exposure effects, although some active repeating sources stand out as anomalous. We measure the repeater fraction and find that it tends to an equilibrium of

2.6_{-2.6}^{+2.9}

% over our exposure thus far. We also report on 14 more sources which are promising repeating FRB candidates and which merit follow-up observations for confirmation.Comment: Submitted to ApJ. Comments are welcome and follow-up observations are encouraged

arXiv.org e-Print Archive

Directory of Open Access Journals

Sub-second periodicity in a fast radio burst

Author: Andersen Bridget C.
Bandura Kevin
Bhardwaj Mohit
Boyle P. J.
Brar Charanjot
Breitman Daniela
Cassanelli Tomas
Chatterjee Shami
Chawla Pragya
Cliche Jean-François
Collaboration The CHIME/FRB
Cubranic Davor
Curtin Alice P.
Deng Meiling
Dobbs Matt
Dong Fengqiu Adam
Fonseca Emmanuel
Gaensler B. M.
Giri Utkarsh
Good Deborah C.
Hill Alex S.
Josephy Alexander
Kaczmarek J. F.
Kader Zarif
Kania Joseph
Kaspi Victoria M.
Leung Calvin
Li D. Z.
Lin Hsiu-Hsien
Masui Kiyoshi W.
Mckinven Ryan
Mena-Parra Juan
Merryfield Marcus
Meyers B. W.
Michilli D.
Naidu Arun
Newburgh Laura
Ng C.
Ordog Anna
Patel Chitrang
Pearlman Aaron B.
Pen Ue-Li
Petroff Emily
Pleunis Ziggy
Rafiei-Ravandi Masoud
Rahman Mubdi
Ransom Scott
Renard Andre
Sanghavi Pranav
Scholz Paul
Shaw J. Richard
Shin Kaitlyn
Siegel Seth R.
Singh Saurabh
Smith Kendrick
Stairs Ingrid
Tan Chia Min
Tendulkar Shriharsh P.
Vanderlinde Keith
Wiebe D. V.
Wulf Dallas
Zwaniga Andrew
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/07/2022
Field of study

Fast radio bursts (FRBs) are millisecond-duration flashes of radio waves that are visible at distances of billions of light-years. The nature of their progenitors and their emission mechanism remain open astrophysical questions. Here we report the detection of the multi-component FRB 20191221A and the identification of a periodic separation of 216.8(1) ms between its components with a significance of 6.5 sigmas. The long (~3 s) duration and nine or more components forming the pulse profile make this source an outlier in the FRB population. Such short periodicity provides strong evidence for a neutron-star origin of the event. Moreover, our detection favours emission arising from the neutron-star magnetosphere, as opposed to emission regions located further away from the star, as predicted by some models.Comment: Updated to conform to the accepted versio

arXiv.org e-Print Archive

placement et ordonnancement sur les processeurs multi-core en utilisant un solveur SMT

Author: Tendulkar Pranav
Publication venue: HAL CCSD
Publication date: 13/10/2014
Field of study

In order to achieve performance gains in the software, computers have evolvedto multi-core and many-core platforms abounding with multiple processor cores.However the problem of finding efficient ways to execute parallel software onthese platform is hard. With a large number of processor cores available, thesoftware must orchestrate the communication, synchronization along with theexecution of the code. Communication corresponds to the transport of databetween different processors, which either can be handled transparently by thehardware or explicitly managed by the software. Synchronization is arequirement of proper selection of start time of computations eg. the conditionfor software tasks to begin execution only after all its dependencies aresatisfied.Models which represent the algorithms in a structured and formal way expose theavailable parallelism. Deployment of the software algorithms represented bysuch models needs a specification of which processor to execute the tasks on(mapping) and when to execute them (scheduling). Mapping andscheduling is a hard combinatorial problem to solve with a huge design spacecontaining exponential number of solutions. In addition, the solutions areevaluated according to different costs that need to be optimized, such asmemory consumption, time to execute, static power consumption, resources usedetc. Such a problem with multiple costs is called a multi-criteriaoptimization problem. The solution to this problem is not a unique singlesolution, but a set of incomparable solutions called Pareto solutions.In order to track multi-criteria problems, special algorithms are needed whichcan approximate the Pareto solutions in the design space.In this thesis we target a class of applications called streamingapplications, which process a continuous stream of data. These applicationstypically apply similar computation on different data items. A common class ofmodels called dataflow models conveniently expresses such applications.In this thesis, we deal with mapping and scheduling of dataflow applications onmany-core platforms. We encode this problem in form of logical constraints andpresent it to satisfiability modulo theory (SMT) solvers. SMT solvers,solve the encoded problem by using a combination of search techniques andconstraint propagation to find an assignment to the problem variablessatisfying the given cost constraints.In dataflow applications, the design space explodes with increased number oftasks and processors. In this thesis, we tackle this problem by introducingsymmetry reduction techniques and demonstrate that symmetry breakingaccelerates search in SMT solvers, increasing the size of the problem that canbe solved. Our design-space exploration algorithm approximates the Pareto frontof the problem and produces solutions with different cost trade-offs. Wevalidate these solutions by executing them on a real multi-core platform.Further we extend the scheduling problem to the many-core platforms which areassembled from multi-core clusters connected by network-on-chip. We provide adesign flow which performs mapping of the applications on such platforms andautomatic insertion of additional elements to model the communication. Wedemonstrate how communication with bounded memory can be performed by correctlymodeling the flow-control. We provide experimental results obtained on the256-processor Kalray MPPA-256 platform.Multi-core processors have typically a small amount of memory close to theprocessor. Generally application data does not fit in the local memory. Westudy a class of parallel applications having a regular data access pattern,with large amount of data to be processed by a uniform computation. Suchapplications are commonly found in image processing. The data must be broughtfrom main memory to local memory, processed and then the results written backto main memory, all in batches. Selecting the proper granularity of the datathat is brought into local memory is an optimization problem. We formalize thisproblem and provide a way to determine the optimal transfer granularitydepending on the characteristics of application and the hardware platform.Further we provide a technique to analyze different data exchange mechanismsfor the case where some data is shared between different computations.Applications in modern embedded systems can start and stop dynamically. Inorder to execute all these applications efficiently and to optimize globalcosts such as power consumption, execution time etc., the applications must bereconfigured at runtime. We present a predictable and composable way (executingindependently without affecting others) of migrating tasks according to thereconfiguration decision.Dans l'objectif d'augmenter les performances, l'architecture des processeurs a évolué vers des plate-formes "multi-core" et "many-core" composées de multiple unités de traitements. Toutefois, trouver des moyens efficaces pour exécuter du logiciel parallèle reste un problème difficile. Avec un grand nombre d'unités de calcul disponibles, le logiciel doit orchestrer la communication et assurer la synchronisation lors de l’exécution du code. La communication (transport des données entre les différents processeurs) est gérée de façon transparente par le matériel ou explicitement par le logiciel.Les modèles qui représentent les algorithmes de façon structurée et formelle mettent en évidence leur parallélisme inhérent. Le déploiement des logiciels représentés par ces modèles nécessite de spécifier placement (sur quel processeur s’exécute une certaine tâche) et l'ordonnancement (dans quel ordre sont exécutées les tâches). Le placement et l'ordonnancement sont des problèmes combinatoires difficile avec un nombre exponentiel de solutions. En outre, les solutions ont différents coûts qui doivent être optimisés : la consommation de mémoire, le temps d'exécution, les ressources utilisées, etc. C'est un problème d'optimisation multi-critères. La solution à ce problème est ce qu'on appelle un ensemble Pareto-optimal nécessitant des algorithmes spéciaux pour l’approximer.Nous ciblons une classe d'applications, appelées applications de streaming, qui traitent un flux continu de données. Ces applications qui appliquent un calcul similaire sur différents éléments de données successifs, peuvent être commodément exprimées par une classe de modèles appelés modèles de flux de données. Le problème du placement et de l'ordonnancement est codé sous forme de contraintes logiques et résolu par un solveur Satisfaisabilité Modulo Théories (SMT). Les solveurs SMT résolvent le problème en combinant des techniques de recherche et de la propagation de contraintes afin d'attribuer des valeurs aux variables du problème satisfaisant les contraintes de coût données.Dans les applications de flux de données, l'espace de conception explose avec l'augmentation du nombre de tâches et de processeurs. Dans cette thèse, nous nous attaquons à ce problème par l'introduction des techniques de réduction de symétrie et démontrons que la rupture de symétrie accélère la recherche dans un solveur SMT, permettant ainsi l'augmentation de la taille du problème qui peut être résolu. Notre algorithme d'exploration de l'espace de conception approxime le front de Pareto du problème et produit des solutions pour différents compromis de coûts. De plus, nous étendons le problème d'ordonnancement pour les plate-formes "many-core" qui sont une catégorie de plate-forme multi coeurs où les unités sont connectés par un réseau sur puce (NOC). Nous fournissons un flot de conception qui réalise le placement des applications sur de telles plate-formes et insert automatiquement des éléments supplémentaires pour modéliser la communication à l'aide de mémoires de taille bornée. Nous présentons des résultats expérimentaux obtenus sur deux plate-formes existantes : la machine Kalray à 256 processeurs et les Tilera TILE-64.Les processeurs multi-cœurs ont typiquement une faible quantité de mémoire proche du processeur. Celle ci est généralement insuffisante pour contenir toutes les données necessaires au calcul d'une tâche. Nous étudions une classe d'applications parallèles présentant un pattern régulier d'accès aux données et une grande quantité de données à traiter par un calcul uniforme. Les données doivent être acheminées depuis la mémoire principale vers la mémoire locale, traitées, puis, les résultats retournés en mémoire centrale, tout en lots. Fixer la bonne granularité des données acheminées en mémoire locale est un problème d'optimisation. Nous formalisons ce problème et proposons un moyen de déterminer la granularité de transfert optimale en fonction des caractéristiques de l'application et de la plate-forme matérielle.En plus des problèmes d'ordonnancement et de gestion de la mémoire locale, nous étudions une partie du problème de la gestion de l'exécution des applications. Dans les systèmes embarqués modernes, les applications peuvent démarrer et s'arrêter dynamiquement. Afin d'exécuter toutes les applications de manière efficace et d'optimiser les coûts globaux tels que la consommation d'énergie, temps d'exécution, etc., les applications nécessitent d'être reconfigurées dynamiquement à l'exécution. Nous présentons une manière prévisible et composable (exécution indépendamment sans affecter les autres) de réaliser la migration des tâches conformément à la décision de reconfiguration

Thèses en Ligne

Hal - Université Grenoble Alpes

Mapping and scheduling on multi-core processors using SMT solvers

Author: Tendulkar Pranav
Publication venue: HAL CCSD
Publication date: 13/10/2014
Field of study

In order to achieve performance gains, computers have evolved to multi-core and many-core platforms abounding with multiple processor cores. However the problem of finding efficient ways to execute parallel software on them is hard. With a large number of processor cores available, the software must orchestrate the communication, synchronization along with the code execution. Communication corresponds to the transport of data between different processors, handled transparently by the hardware or explicitly by the software.Models which represent the algorithms in a structured and formal way expose the available parallelism. Deployment of the software algorithms represented by such models needs a specification of which processor to execute the tasks on (mapping) and when to execute them (scheduling). Mapping and scheduling is a hard combinatorial problem with exponential number of solutions. In addition, the solutions have multiple costs that need to be optimized, such as memory consumption, time to execute, resources used etc. Such a problem with multiple costs is called a multi-criteria optimization problem. The solution to this problem is a set of incomparable solutions called Pareto solutions which need special algorithms to approximate them.We target a class of applications called streaming applications, which process a continuous stream of data. These applications apply similar computation on different data items, can be conveniently expressed by a class of models called dataflow models. We encode mapping and scheduling problem in form of logical constraints and present it to satisfiability modulo theory (SMT) solvers. SMT solvers, solve the encoded problem by using a combination of search techniques and constraint propagation to find an assignment to the problem variables satisfying the given cost constraints.In dataflow applications, the design space explodes with increased number of tasks and processors. In this thesis, we tackle this problem by introduction symmetry reduction techniques and demonstrate that symmetry breaking accelerates search in SMT solver, increasing the size of the problem that can be solved. Our design-space exploration algorithm approximates Pareto front of the problem and produces solutions with different cost trade-offs. Further we extend the scheduling problem to the many-core platforms which are a group of multi-core platforms connected by network-on-chip. We provide a design flow which performs mapping of the applications on such platforms and automatic insertion of additional elements to model the communication using bounded memory. We provide experimental results obtained on the 256-processor Kalray and the Tilera TILE-64 platforms.The multi-core processors have typically a small amount of memory close to the processor, generally insufficient for all application data to fit. We study a class of parallel applications having a regular data access pattern and large amount of data to be processed by a uniform computation. The data must be brought from main memory to local memory, processed and then the results written back to main memory, all in batches. Selecting the proper granularity of the data that is brought into local memory is an optimization problem. We formalize this problem and provide a way to determine the optimal transfer granularity depending on the characteristics of application and the hardware platform.In addition to the scheduling problems and local memory management, we study a part of the problem of runtime management of the applications. Applications in modern embedded systems can start and stop dynamically. In order to execute all the applications efficiently and to optimize global costs such as power consumption, execution time etc., the applications must be reconfigured dynamically at runtime. We present a predictable and composable (executing independently without affecting others) way of migrating tasks according to the reconfiguration decision.Dans l’objectif d’augmenter les performances, l’architecture des processeurs a évolué versdes plate-formes "multi-core" et "many-core" composées de multiple unités de traitements.Toutefois, trouver des moyens efficaces pour exécuter du logiciel parallèle reste un problèmedifficile. Avec un grand nombre d’unités de calcul disponibles, le logiciel doit orchestrer lacommunication et assurer la synchronisation lors de l’exécution du code. La communication(transport des données entre les différents processeurs) est gérée de façon transparente par lematériel ou explicitement par le logiciel.Les modèles qui représentent les algorithmes de façon structurée et formelle mettent enévidence leur parallélisme inhérent. Le déploiement des logiciels représentés par ces modèlesnécessite de spécifier placement (sur quel processeur s’exécute une certaine tâche) et l’ordonnancement(dans quel ordre sont exécutées les tâches). Le placement et l’ordonnancement sontdes problèmes combinatoires difficile avec un nombre exponentiel de solutions. En outre, lessolutions ont différents coûts qui doivent être optimisés : la consommation de mémoire, letemps d’exécution, les ressources utilisées, etc. C’est un problème d’optimisation multi-critères.La solution à ce problème est ce qu’on appelle un ensemble Pareto-optimal nécessitant desalgorithmes spéciaux pour l’approximer.Nous ciblons une classe d’applications, appelées applications de streaming, qui traitentun flux continu de données. Ces applications qui appliquent un calcul similaire sur différentséléments de données successifs, peuvent être commodément exprimées par une classe de modèlesappelés modèles de flux de données. Le problème du placement et de l’ordonnancementest codé sous forme de contraintes logiques et résolu par un solveur Satisfaisabilité ModuloThéories (SMT). Les solveurs SMT résolvent le problème en combinant des techniques derecherche et de la propagation de contraintes afin d’attribuer des valeurs aux variables duproblème satisfaisant les contraintes de coût données.Dans les applications de flux de données, l’espace de conception explose avec l’augmentationdu nombre de tâches et de processeurs. Dans cette thèse, nous nous attaquons à ceproblème par l’introduction des techniques de réduction de symétrie et démontrons que larupture de symétrie accélère la recherche dans un solveur SMT, permettant ainsi l’augmentationde la taille du problème qui peut être résolu. Notre algorithme d’exploration de l’espacede conception approxime le front de Pareto du problème et produit des solutions pour différentscompromis de coûts. De plus, nous étendons le problème d’ordonnancement pour lesplate-formes "many-core" qui sont une catégorie de plate-forme multi coeurs où les unités sontconnectés par un réseau sur puce (NoC). Nous fournissons un flot de conception qui réalise leplacement des applications sur de telles plate-formes et insert automatiquement des élémentssupplémentaires pour modéliser la communication à l’aide de mémoires de taille bornée. Nousprésentons des résultats expérimentaux obtenus sur deux plate-formes existantes : la machineKalray à 256 processeurs et les Tilera TILE-64

Thèses en Ligne

Hal - Université Grenoble Alpes

Allocation et ordonnancement sur des processeurs multi-coeur avec des solveurs SMT

Author: Tendulkar Pranav
Publication venue
Publication date: 13/10/2014
Field of study

Dans l’objectif d’augmenter les performances, l’architecture des processeurs a évolué versdes plate-formes "multi-core" et "many-core" composées de multiple unités de traitements.Toutefois, trouver des moyens efficaces pour exécuter du logiciel parallèle reste un problèmedifficile. Avec un grand nombre d’unités de calcul disponibles, le logiciel doit orchestrer lacommunication et assurer la synchronisation lors de l’exécution du code. La communication(transport des données entre les différents processeurs) est gérée de façon transparente par lematériel ou explicitement par le logiciel.Les modèles qui représentent les algorithmes de façon structurée et formelle mettent enévidence leur parallélisme inhérent. Le déploiement des logiciels représentés par ces modèlesnécessite de spécifier placement (sur quel processeur s’exécute une certaine tâche) et l’ordonnancement(dans quel ordre sont exécutées les tâches). Le placement et l’ordonnancement sontdes problèmes combinatoires difficile avec un nombre exponentiel de solutions. En outre, lessolutions ont différents coûts qui doivent être optimisés : la consommation de mémoire, letemps d’exécution, les ressources utilisées, etc. C’est un problème d’optimisation multi-critères.La solution à ce problème est ce qu’on appelle un ensemble Pareto-optimal nécessitant desalgorithmes spéciaux pour l’approximer.Nous ciblons une classe d’applications, appelées applications de streaming, qui traitentun flux continu de données. Ces applications qui appliquent un calcul similaire sur différentséléments de données successifs, peuvent être commodément exprimées par une classe de modèlesappelés modèles de flux de données. Le problème du placement et de l’ordonnancementest codé sous forme de contraintes logiques et résolu par un solveur Satisfaisabilité ModuloThéories (SMT). Les solveurs SMT résolvent le problème en combinant des techniques derecherche et de la propagation de contraintes afin d’attribuer des valeurs aux variables duproblème satisfaisant les contraintes de coût données.Dans les applications de flux de données, l’espace de conception explose avec l’augmentationdu nombre de tâches et de processeurs. Dans cette thèse, nous nous attaquons à ceproblème par l’introduction des techniques de réduction de symétrie et démontrons que larupture de symétrie accélère la recherche dans un solveur SMT, permettant ainsi l’augmentationde la taille du problème qui peut être résolu. Notre algorithme d’exploration de l’espacede conception approxime le front de Pareto du problème et produit des solutions pour différentscompromis de coûts. De plus, nous étendons le problème d’ordonnancement pour lesplate-formes "many-core" qui sont une catégorie de plate-forme multi coeurs où les unités sontconnectés par un réseau sur puce (NoC). Nous fournissons un flot de conception qui réalise leplacement des applications sur de telles plate-formes et insert automatiquement des élémentssupplémentaires pour modéliser la communication à l’aide de mémoires de taille bornée. Nousprésentons des résultats expérimentaux obtenus sur deux plate-formes existantes : la machineKalray à 256 processeurs et les Tilera TILE-64.In order to achieve performance gains, computers have evolved to multi-core and many-core platforms abounding with multiple processor cores. However the problem of finding efficient ways to execute parallel software on them is hard. With a large number of processor cores available, the software must orchestrate the communication, synchronization along with the code execution. Communication corresponds to the transport of data between different processors, handled transparently by the hardware or explicitly by the software.Models which represent the algorithms in a structured and formal way expose the available parallelism. Deployment of the software algorithms represented by such models needs a specification of which processor to execute the tasks on (mapping) and when to execute them (scheduling). Mapping and scheduling is a hard combinatorial problem with exponential number of solutions. In addition, the solutions have multiple costs that need to be optimized, such as memory consumption, time to execute, resources used etc. Such a problem with multiple costs is called a multi-criteria optimization problem. The solution to this problem is a set of incomparable solutions called Pareto solutions which need special algorithms to approximate them.We target a class of applications called streaming applications, which process a continuous stream of data. These applications apply similar computation on different data items, can be conveniently expressed by a class of models called dataflow models. We encode mapping and scheduling problem in form of logical constraints and present it to satisfiability modulo theory (SMT) solvers. SMT solvers, solve the encoded problem by using a combination of search techniques and constraint propagation to find an assignment to the problem variables satisfying the given cost constraints.In dataflow applications, the design space explodes with increased number of tasks and processors. In this thesis, we tackle this problem by introduction symmetry reduction techniques and demonstrate that symmetry breaking accelerates search in SMT solver, increasing the size of the problem that can be solved. Our design-space exploration algorithm approximates Pareto front of the problem and produces solutions with different cost trade-offs. Further we extend the scheduling problem to the many-core platforms which are a group of multi-core platforms connected by network-on-chip. We provide a design flow which performs mapping of the applications on such platforms and automatic insertion of additional elements to model the communication using bounded memory. We provide experimental results obtained on the 256-processor Kalray and the Tilera TILE-64 platforms.The multi-core processors have typically a small amount of memory close to the processor, generally insufficient for all application data to fit. We study a class of parallel applications having a regular data access pattern and large amount of data to be processed by a uniform computation. The data must be brought from main memory to local memory, processed and then the results written back to main memory, all in batches. Selecting the proper granularity of the data that is brought into local memory is an optimization problem. We formalize this problem and provide a way to determine the optimal transfer granularity depending on the characteristics of application and the hardware platform.In addition to the scheduling problems and local memory management, we study a part of the problem of runtime management of the applications. Applications in modern embedded systems can start and stop dynamically. In order to execute all the applications efficiently and to optimize global costs such as power consumption, execution time etc., the applications must be reconfigured dynamically at runtime. We present a predictable and composable (executing independently without affecting others) way of migrating tasks according to the reconfiguration decision

Theses.fr

A Case Study into Predictable and Composable MPSoC Reconfiguration

Author: Pranav Tendulkar
Sander Stuijk
Publication venue
Publication date: 01/01/2013
Field of study

Abstract—The number of applications running concurrently on a MPSoC is ever increasing. Moreover, the set of running applications is often unknown at design-time. Part of the resource allocation decisions must therefore be deferred to run-time. This requires a run-time manager to optimize the resource usage of the system to preserve energy and allow as many applications as possible to use the resources simultaneously. An effective resource manager should therefore be able to reconfigure the resource assignment of running applications. To this end, a run-time task migration mechanism is needed. A user should however not notice the reconfiguration, as this would impact the perceived quality of the system. Hence, the reconfiguration mechanism should provide timing guarantees on its operation and it should not interfere with other applications running on the same system (i.e., it should be predictable and composable). In this paper, we present a practical implementation of such a predictable and composable MPSoC reconfiguration mechanism. We demonstrate the use of this mechanism on a JPEG decoder whose tasks are migrated at run-time while running on a state-of-the-art MPSoC platform. Index Terms—Task migration, real time systems, timing guarantees I