15 research outputs found

    Many-Core Scheduling of Data Parallel Applications Using SMT Solvers

    Full text link
    Abstract—To program recently developed many-core systems-on-chip two traditionally separate performance optimization problems have to be solved together. Firstly, it is the parallel scheduling on a shared-memory multi-core system. Secondly, it is the co-scheduling of network communication and processor computation. This is because many-core systems are networks of multi-core clusters. In this paper, we demonstrate the applicabil-ity of modern constraint solvers to efficiently schedule parallel applications on many-cores and validate the results by running benchmarks on a real many-core platform. Index Terms—task graph, scheduling, multiprocessor, DMA I

    Localizing FRBs through VLBI with the Algonquin Radio Observatory 10 m Telescope

    Get PDF
    The Canadian Hydrogen Intensity Mapping Experiment (CHIME)/FRB experiment has detected thousands of fast radio bursts (FRBs) due to its sensitivity and wide field of view; however, its low angular resolution prevents it from localizing events to their host galaxies. Very long baseline interferometry (VLBI), triggered by FRB detections from CHIME/FRB will solve the challenge of localization for non-repeating events. Using a refurbished 10 m radio dish at the Algonquin Radio Observatory located in Ontario Canada, we developed a testbed for a VLBI experiment with a theoretical λ/D â‰Č 30 mas. We provide an overview of the 10 m system and describe its refurbishment, the data acquisition, and a procedure for fringe fitting that simultaneously estimates the geometric delay used for localization and the dispersive delay from the ionosphere. Using single pulses from the Crab pulsar, we validate the system and localization procedure, and analyze the clock stability between sites, which is critical for coherently delay referencing an FRB event. We find a localization of ∌200 mas is possible with the performance of the current system (single-baseline). Furthermore, for sources with insufficient signal or restricted wideband to simultaneously measure both geometric and ionospheric delays, we show that the differential ionospheric contribution between the two sites must be measured to a precision of 1 × 10-8 pc cm-3 to provide a reasonable localization from a detection in the 400-800 MHz band. Finally we show detection of an FRB observed simultaneously in the CHIME and the Algonquin 10 m telescope, the first non-repeating FRB in this long baseline. This project serves as a testbed for the forthcoming CHIME/FRB Outriggers project

    A fast radio burst localized at detection to a galactic disk using very long baseline interferometry

    Full text link
    Fast radio bursts (FRBs) are millisecond-duration, luminous radio transients of extragalactic origin. These events have been used to trace the baryonic structure of the Universe using their dispersion measure (DM) assuming that the contribution from host galaxies can be reliably estimated. However, contributions from the immediate environment of an FRB may dominate the observed DM, thus making redshift estimates challenging without a robust host galaxy association. Furthermore, while at least one Galactic burst has been associated with a magnetar, other localized FRBs argue against magnetars as the sole progenitor model. Precise localization within the host galaxy can discriminate between progenitor models, a major goal of the field. Until now, localizations on this spatial scale have only been carried out in follow-up observations of repeating sources. Here we demonstrate the localization of FRB 20210603A with very long baseline interferometry (VLBI) on two baselines, using data collected only at the time of detection. We localize the burst to SDSS J004105.82+211331.9, an edge-on galaxy at z≈0.177z\approx 0.177, and detect recent star formation in the kiloparsec-scale vicinity of the burst. The edge-on inclination of the host galaxy allows for a unique comparison between the line of sight towards the FRB and lines of sight towards known Galactic pulsars. The DM, Faraday rotation measure (RM), and scattering suggest a progenitor coincident with the host galactic plane, strengthening the link between the environment of FRB 20210603A and the disk of its host galaxy. Single-pulse VLBI localizations of FRBs to within their host galaxies, following the one presented here, will further constrain the origins and host environments of one-off FRBs.Comment: 40 pages, 13 figures, submitted. Fixed typo in abstrac

    CHIME/FRB Discovery of 25 Repeating Fast Radio Burst Sources

    Full text link
    We present the discovery of 25 new repeating fast radio burst (FRB) sources found among CHIME/FRB events detected between 2019 September 30 and 2021 May 1. The sources were found using a new clustering algorithm that looks for multiple events co-located on the sky having similar dispersion measures (DMs). The new repeaters have DMs ranging from ∌\sim220 pc cm−3^{-3} to ∌\sim1700 pc cm−3^{-3}, and include sources having exhibited as few as two bursts to as many as twelve. We report a statistically significant difference in both the DM and extragalactic DM (eDM) distributions between repeating and apparently nonrepeating sources, with repeaters having lower mean DM and eDM, and we discuss the implications. We find no clear bimodality between the repetition rates of repeaters and upper limits on repetition from apparently nonrepeating sources after correcting for sensitivity and exposure effects, although some active repeating sources stand out as anomalous. We measure the repeater fraction and find that it tends to an equilibrium of 2.6−2.6+2.92.6_{-2.6}^{+2.9}% over our exposure thus far. We also report on 14 more sources which are promising repeating FRB candidates and which merit follow-up observations for confirmation.Comment: Submitted to ApJ. Comments are welcome and follow-up observations are encouraged

    Sub-second periodicity in a fast radio burst

    Full text link
    Fast radio bursts (FRBs) are millisecond-duration flashes of radio waves that are visible at distances of billions of light-years. The nature of their progenitors and their emission mechanism remain open astrophysical questions. Here we report the detection of the multi-component FRB 20191221A and the identification of a periodic separation of 216.8(1) ms between its components with a significance of 6.5 sigmas. The long (~3 s) duration and nine or more components forming the pulse profile make this source an outlier in the FRB population. Such short periodicity provides strong evidence for a neutron-star origin of the event. Moreover, our detection favours emission arising from the neutron-star magnetosphere, as opposed to emission regions located further away from the star, as predicted by some models.Comment: Updated to conform to the accepted versio

    placement et ordonnancement sur les processeurs multi-core en utilisant un solveur SMT

    No full text
    In order to achieve performance gains in the software, computers have evolvedto multi-core and many-core platforms abounding with multiple processor cores.However the problem of finding efficient ways to execute parallel software onthese platform is hard. With a large number of processor cores available, thesoftware must orchestrate the communication, synchronization along with theexecution of the code. Communication corresponds to the transport of databetween different processors, which either can be handled transparently by thehardware or explicitly managed by the software. Synchronization is arequirement of proper selection of start time of computations eg. the conditionfor software tasks to begin execution only after all its dependencies aresatisfied.Models which represent the algorithms in a structured and formal way expose theavailable parallelism. Deployment of the software algorithms represented bysuch models needs a specification of which processor to execute the tasks on(mapping) and when to execute them (scheduling). Mapping andscheduling is a hard combinatorial problem to solve with a huge design spacecontaining exponential number of solutions. In addition, the solutions areevaluated according to different costs that need to be optimized, such asmemory consumption, time to execute, static power consumption, resources usedetc. Such a problem with multiple costs is called a multi-criteriaoptimization problem. The solution to this problem is not a unique singlesolution, but a set of incomparable solutions called Pareto solutions.In order to track multi-criteria problems, special algorithms are needed whichcan approximate the Pareto solutions in the design space.In this thesis we target a class of applications called streamingapplications, which process a continuous stream of data. These applicationstypically apply similar computation on different data items. A common class ofmodels called dataflow models conveniently expresses such applications.In this thesis, we deal with mapping and scheduling of dataflow applications onmany-core platforms. We encode this problem in form of logical constraints andpresent it to satisfiability modulo theory (SMT) solvers. SMT solvers,solve the encoded problem by using a combination of search techniques andconstraint propagation to find an assignment to the problem variablessatisfying the given cost constraints.In dataflow applications, the design space explodes with increased number oftasks and processors. In this thesis, we tackle this problem by introducingsymmetry reduction techniques and demonstrate that symmetry breakingaccelerates search in SMT solvers, increasing the size of the problem that canbe solved. Our design-space exploration algorithm approximates the Pareto frontof the problem and produces solutions with different cost trade-offs. Wevalidate these solutions by executing them on a real multi-core platform.Further we extend the scheduling problem to the many-core platforms which areassembled from multi-core clusters connected by network-on-chip. We provide adesign flow which performs mapping of the applications on such platforms andautomatic insertion of additional elements to model the communication. Wedemonstrate how communication with bounded memory can be performed by correctlymodeling the flow-control. We provide experimental results obtained on the256-processor Kalray MPPA-256 platform.Multi-core processors have typically a small amount of memory close to theprocessor. Generally application data does not fit in the local memory. Westudy a class of parallel applications having a regular data access pattern,with large amount of data to be processed by a uniform computation. Suchapplications are commonly found in image processing. The data must be broughtfrom main memory to local memory, processed and then the results written backto main memory, all in batches. Selecting the proper granularity of the datathat is brought into local memory is an optimization problem. We formalize thisproblem and provide a way to determine the optimal transfer granularitydepending on the characteristics of application and the hardware platform.Further we provide a technique to analyze different data exchange mechanismsfor the case where some data is shared between different computations.Applications in modern embedded systems can start and stop dynamically. Inorder to execute all these applications efficiently and to optimize globalcosts such as power consumption, execution time etc., the applications must bereconfigured at runtime. We present a predictable and composable way (executingindependently without affecting others) of migrating tasks according to thereconfiguration decision.Dans l'objectif d'augmenter les performances, l'architecture des processeurs a Ă©voluĂ© vers des plate-formes "multi-core" et "many-core" composĂ©es de multiple unitĂ©s de traitements. Toutefois, trouver des moyens efficaces pour exĂ©cuter du logiciel parallĂšle reste un problĂšme difficile. Avec un grand nombre d'unitĂ©s de calcul disponibles, le logiciel doit orchestrer la communication et assurer la synchronisation lors de l’exĂ©cution du code. La communication (transport des donnĂ©es entre les diffĂ©rents processeurs) est gĂ©rĂ©e de façon transparente par le matĂ©riel ou explicitement par le logiciel.Les modĂšles qui reprĂ©sentent les algorithmes de façon structurĂ©e et formelle mettent en Ă©vidence leur parallĂ©lisme inhĂ©rent. Le dĂ©ploiement des logiciels reprĂ©sentĂ©s par ces modĂšles nĂ©cessite de spĂ©cifier placement (sur quel processeur s’exĂ©cute une certaine tĂąche) et l'ordonnancement (dans quel ordre sont exĂ©cutĂ©es les tĂąches). Le placement et l'ordonnancement sont des problĂšmes combinatoires difficile avec un nombre exponentiel de solutions. En outre, les solutions ont diffĂ©rents coĂ»ts qui doivent ĂȘtre optimisĂ©s : la consommation de mĂ©moire, le temps d'exĂ©cution, les ressources utilisĂ©es, etc. C'est un problĂšme d'optimisation multi-critĂšres. La solution Ă  ce problĂšme est ce qu'on appelle un ensemble Pareto-optimal nĂ©cessitant des algorithmes spĂ©ciaux pour l’approximer.Nous ciblons une classe d'applications, appelĂ©es applications de streaming, qui traitent un flux continu de donnĂ©es. Ces applications qui appliquent un calcul similaire sur diffĂ©rents Ă©lĂ©ments de donnĂ©es successifs, peuvent ĂȘtre commodĂ©ment exprimĂ©es par une classe de modĂšles appelĂ©s modĂšles de flux de donnĂ©es. Le problĂšme du placement et de l'ordonnancement est codĂ© sous forme de contraintes logiques et rĂ©solu par un solveur SatisfaisabilitĂ© Modulo ThĂ©ories (SMT). Les solveurs SMT rĂ©solvent le problĂšme en combinant des techniques de recherche et de la propagation de contraintes afin d'attribuer des valeurs aux variables du problĂšme satisfaisant les contraintes de coĂ»t donnĂ©es.Dans les applications de flux de donnĂ©es, l'espace de conception explose avec l'augmentation du nombre de tĂąches et de processeurs. Dans cette thĂšse, nous nous attaquons Ă  ce problĂšme par l'introduction des techniques de rĂ©duction de symĂ©trie et dĂ©montrons que la rupture de symĂ©trie accĂ©lĂšre la recherche dans un solveur SMT, permettant ainsi l'augmentation de la taille du problĂšme qui peut ĂȘtre rĂ©solu. Notre algorithme d'exploration de l'espace de conception approxime le front de Pareto du problĂšme et produit des solutions pour diffĂ©rents compromis de coĂ»ts. De plus, nous Ă©tendons le problĂšme d'ordonnancement pour les plate-formes "many-core" qui sont une catĂ©gorie de plate-forme multi coeurs oĂč les unitĂ©s sont connectĂ©s par un rĂ©seau sur puce (NOC). Nous fournissons un flot de conception qui rĂ©alise le placement des applications sur de telles plate-formes et insert automatiquement des Ă©lĂ©ments supplĂ©mentaires pour modĂ©liser la communication Ă  l'aide de mĂ©moires de taille bornĂ©e. Nous prĂ©sentons des rĂ©sultats expĂ©rimentaux obtenus sur deux plate-formes existantes : la machine Kalray Ă  256 processeurs et les Tilera TILE-64.Les processeurs multi-cƓurs ont typiquement une faible quantitĂ© de mĂ©moire proche du processeur. Celle ci est gĂ©nĂ©ralement insuffisante pour contenir toutes les donnĂ©es necessaires au calcul d'une tĂąche. Nous Ă©tudions une classe d'applications parallĂšles prĂ©sentant un pattern rĂ©gulier d'accĂšs aux donnĂ©es et une grande quantitĂ© de donnĂ©es Ă  traiter par un calcul uniforme. Les donnĂ©es doivent ĂȘtre acheminĂ©es depuis la mĂ©moire principale vers la mĂ©moire locale, traitĂ©es, puis, les rĂ©sultats retournĂ©s en mĂ©moire centrale, tout en lots. Fixer la bonne granularitĂ© des donnĂ©es acheminĂ©es en mĂ©moire locale est un problĂšme d'optimisation. Nous formalisons ce problĂšme et proposons un moyen de dĂ©terminer la granularitĂ© de transfert optimale en fonction des caractĂ©ristiques de l'application et de la plate-forme matĂ©rielle.En plus des problĂšmes d'ordonnancement et de gestion de la mĂ©moire locale, nous Ă©tudions une partie du problĂšme de la gestion de l'exĂ©cution des applications. Dans les systĂšmes embarquĂ©s modernes, les applications peuvent dĂ©marrer et s'arrĂȘter dynamiquement. Afin d'exĂ©cuter toutes les applications de maniĂšre efficace et d'optimiser les coĂ»ts globaux tels que la consommation d'Ă©nergie, temps d'exĂ©cution, etc., les applications nĂ©cessitent d'ĂȘtre reconfigurĂ©es dynamiquement Ă  l'exĂ©cution. Nous prĂ©sentons une maniĂšre prĂ©visible et composable (exĂ©cution indĂ©pendamment sans affecter les autres) de rĂ©aliser la migration des tĂąches conformĂ©ment Ă  la dĂ©cision de reconfiguration

    Mapping and scheduling on multi-core processors using SMT solvers

    No full text
    In order to achieve performance gains, computers have evolved to multi-core and many-core platforms abounding with multiple processor cores. However the problem of finding efficient ways to execute parallel software on them is hard. With a large number of processor cores available, the software must orchestrate the communication, synchronization along with the code execution. Communication corresponds to the transport of data between different processors, handled transparently by the hardware or explicitly by the software.Models which represent the algorithms in a structured and formal way expose the available parallelism. Deployment of the software algorithms represented by such models needs a specification of which processor to execute the tasks on (mapping) and when to execute them (scheduling). Mapping and scheduling is a hard combinatorial problem with exponential number of solutions. In addition, the solutions have multiple costs that need to be optimized, such as memory consumption, time to execute, resources used etc. Such a problem with multiple costs is called a multi-criteria optimization problem. The solution to this problem is a set of incomparable solutions called Pareto solutions which need special algorithms to approximate them.We target a class of applications called streaming applications, which process a continuous stream of data. These applications apply similar computation on different data items, can be conveniently expressed by a class of models called dataflow models. We encode mapping and scheduling problem in form of logical constraints and present it to satisfiability modulo theory (SMT) solvers. SMT solvers, solve the encoded problem by using a combination of search techniques and constraint propagation to find an assignment to the problem variables satisfying the given cost constraints.In dataflow applications, the design space explodes with increased number of tasks and processors. In this thesis, we tackle this problem by introduction symmetry reduction techniques and demonstrate that symmetry breaking accelerates search in SMT solver, increasing the size of the problem that can be solved. Our design-space exploration algorithm approximates Pareto front of the problem and produces solutions with different cost trade-offs. Further we extend the scheduling problem to the many-core platforms which are a group of multi-core platforms connected by network-on-chip. We provide a design flow which performs mapping of the applications on such platforms and automatic insertion of additional elements to model the communication using bounded memory. We provide experimental results obtained on the 256-processor Kalray and the Tilera TILE-64 platforms.The multi-core processors have typically a small amount of memory close to the processor, generally insufficient for all application data to fit. We study a class of parallel applications having a regular data access pattern and large amount of data to be processed by a uniform computation. The data must be brought from main memory to local memory, processed and then the results written back to main memory, all in batches. Selecting the proper granularity of the data that is brought into local memory is an optimization problem. We formalize this problem and provide a way to determine the optimal transfer granularity depending on the characteristics of application and the hardware platform.In addition to the scheduling problems and local memory management, we study a part of the problem of runtime management of the applications. Applications in modern embedded systems can start and stop dynamically. In order to execute all the applications efficiently and to optimize global costs such as power consumption, execution time etc., the applications must be reconfigured dynamically at runtime. We present a predictable and composable (executing independently without affecting others) way of migrating tasks according to the reconfiguration decision.Dans l’objectif d’augmenter les performances, l’architecture des processeurs a Ă©voluĂ© versdes plate-formes "multi-core" et "many-core" composĂ©es de multiple unitĂ©s de traitements.Toutefois, trouver des moyens efficaces pour exĂ©cuter du logiciel parallĂšle reste un problĂšmedifficile. Avec un grand nombre d’unitĂ©s de calcul disponibles, le logiciel doit orchestrer lacommunication et assurer la synchronisation lors de l’exĂ©cution du code. La communication(transport des donnĂ©es entre les diffĂ©rents processeurs) est gĂ©rĂ©e de façon transparente par lematĂ©riel ou explicitement par le logiciel.Les modĂšles qui reprĂ©sentent les algorithmes de façon structurĂ©e et formelle mettent enĂ©vidence leur parallĂ©lisme inhĂ©rent. Le dĂ©ploiement des logiciels reprĂ©sentĂ©s par ces modĂšlesnĂ©cessite de spĂ©cifier placement (sur quel processeur s’exĂ©cute une certaine tĂąche) et l’ordonnancement(dans quel ordre sont exĂ©cutĂ©es les tĂąches). Le placement et l’ordonnancement sontdes problĂšmes combinatoires difficile avec un nombre exponentiel de solutions. En outre, lessolutions ont diffĂ©rents coĂ»ts qui doivent ĂȘtre optimisĂ©s : la consommation de mĂ©moire, letemps d’exĂ©cution, les ressources utilisĂ©es, etc. C’est un problĂšme d’optimisation multi-critĂšres.La solution Ă  ce problĂšme est ce qu’on appelle un ensemble Pareto-optimal nĂ©cessitant desalgorithmes spĂ©ciaux pour l’approximer.Nous ciblons une classe d’applications, appelĂ©es applications de streaming, qui traitentun flux continu de donnĂ©es. Ces applications qui appliquent un calcul similaire sur diffĂ©rentsĂ©lĂ©ments de donnĂ©es successifs, peuvent ĂȘtre commodĂ©ment exprimĂ©es par une classe de modĂšlesappelĂ©s modĂšles de flux de donnĂ©es. Le problĂšme du placement et de l’ordonnancementest codĂ© sous forme de contraintes logiques et rĂ©solu par un solveur SatisfaisabilitĂ© ModuloThĂ©ories (SMT). Les solveurs SMT rĂ©solvent le problĂšme en combinant des techniques derecherche et de la propagation de contraintes afin d’attribuer des valeurs aux variables duproblĂšme satisfaisant les contraintes de coĂ»t donnĂ©es.Dans les applications de flux de donnĂ©es, l’espace de conception explose avec l’augmentationdu nombre de tĂąches et de processeurs. Dans cette thĂšse, nous nous attaquons Ă  ceproblĂšme par l’introduction des techniques de rĂ©duction de symĂ©trie et dĂ©montrons que larupture de symĂ©trie accĂ©lĂšre la recherche dans un solveur SMT, permettant ainsi l’augmentationde la taille du problĂšme qui peut ĂȘtre rĂ©solu. Notre algorithme d’exploration de l’espacede conception approxime le front de Pareto du problĂšme et produit des solutions pour diffĂ©rentscompromis de coĂ»ts. De plus, nous Ă©tendons le problĂšme d’ordonnancement pour lesplate-formes "many-core" qui sont une catĂ©gorie de plate-forme multi coeurs oĂč les unitĂ©s sontconnectĂ©s par un rĂ©seau sur puce (NoC). Nous fournissons un flot de conception qui rĂ©alise leplacement des applications sur de telles plate-formes et insert automatiquement des Ă©lĂ©mentssupplĂ©mentaires pour modĂ©liser la communication Ă  l’aide de mĂ©moires de taille bornĂ©e. NousprĂ©sentons des rĂ©sultats expĂ©rimentaux obtenus sur deux plate-formes existantes : la machineKalray Ă  256 processeurs et les Tilera TILE-64

    Allocation et ordonnancement sur des processeurs multi-coeur avec des solveurs SMT

    No full text
    Dans l’objectif d’augmenter les performances, l’architecture des processeurs a Ă©voluĂ© versdes plate-formes "multi-core" et "many-core" composĂ©es de multiple unitĂ©s de traitements.Toutefois, trouver des moyens efficaces pour exĂ©cuter du logiciel parallĂšle reste un problĂšmedifficile. Avec un grand nombre d’unitĂ©s de calcul disponibles, le logiciel doit orchestrer lacommunication et assurer la synchronisation lors de l’exĂ©cution du code. La communication(transport des donnĂ©es entre les diffĂ©rents processeurs) est gĂ©rĂ©e de façon transparente par lematĂ©riel ou explicitement par le logiciel.Les modĂšles qui reprĂ©sentent les algorithmes de façon structurĂ©e et formelle mettent enĂ©vidence leur parallĂ©lisme inhĂ©rent. Le dĂ©ploiement des logiciels reprĂ©sentĂ©s par ces modĂšlesnĂ©cessite de spĂ©cifier placement (sur quel processeur s’exĂ©cute une certaine tĂąche) et l’ordonnancement(dans quel ordre sont exĂ©cutĂ©es les tĂąches). Le placement et l’ordonnancement sontdes problĂšmes combinatoires difficile avec un nombre exponentiel de solutions. En outre, lessolutions ont diffĂ©rents coĂ»ts qui doivent ĂȘtre optimisĂ©s : la consommation de mĂ©moire, letemps d’exĂ©cution, les ressources utilisĂ©es, etc. C’est un problĂšme d’optimisation multi-critĂšres.La solution Ă  ce problĂšme est ce qu’on appelle un ensemble Pareto-optimal nĂ©cessitant desalgorithmes spĂ©ciaux pour l’approximer.Nous ciblons une classe d’applications, appelĂ©es applications de streaming, qui traitentun flux continu de donnĂ©es. Ces applications qui appliquent un calcul similaire sur diffĂ©rentsĂ©lĂ©ments de donnĂ©es successifs, peuvent ĂȘtre commodĂ©ment exprimĂ©es par une classe de modĂšlesappelĂ©s modĂšles de flux de donnĂ©es. Le problĂšme du placement et de l’ordonnancementest codĂ© sous forme de contraintes logiques et rĂ©solu par un solveur SatisfaisabilitĂ© ModuloThĂ©ories (SMT). Les solveurs SMT rĂ©solvent le problĂšme en combinant des techniques derecherche et de la propagation de contraintes afin d’attribuer des valeurs aux variables duproblĂšme satisfaisant les contraintes de coĂ»t donnĂ©es.Dans les applications de flux de donnĂ©es, l’espace de conception explose avec l’augmentationdu nombre de tĂąches et de processeurs. Dans cette thĂšse, nous nous attaquons Ă  ceproblĂšme par l’introduction des techniques de rĂ©duction de symĂ©trie et dĂ©montrons que larupture de symĂ©trie accĂ©lĂšre la recherche dans un solveur SMT, permettant ainsi l’augmentationde la taille du problĂšme qui peut ĂȘtre rĂ©solu. Notre algorithme d’exploration de l’espacede conception approxime le front de Pareto du problĂšme et produit des solutions pour diffĂ©rentscompromis de coĂ»ts. De plus, nous Ă©tendons le problĂšme d’ordonnancement pour lesplate-formes "many-core" qui sont une catĂ©gorie de plate-forme multi coeurs oĂč les unitĂ©s sontconnectĂ©s par un rĂ©seau sur puce (NoC). Nous fournissons un flot de conception qui rĂ©alise leplacement des applications sur de telles plate-formes et insert automatiquement des Ă©lĂ©mentssupplĂ©mentaires pour modĂ©liser la communication Ă  l’aide de mĂ©moires de taille bornĂ©e. NousprĂ©sentons des rĂ©sultats expĂ©rimentaux obtenus sur deux plate-formes existantes : la machineKalray Ă  256 processeurs et les Tilera TILE-64.In order to achieve performance gains, computers have evolved to multi-core and many-core platforms abounding with multiple processor cores. However the problem of finding efficient ways to execute parallel software on them is hard. With a large number of processor cores available, the software must orchestrate the communication, synchronization along with the code execution. Communication corresponds to the transport of data between different processors, handled transparently by the hardware or explicitly by the software.Models which represent the algorithms in a structured and formal way expose the available parallelism. Deployment of the software algorithms represented by such models needs a specification of which processor to execute the tasks on (mapping) and when to execute them (scheduling). Mapping and scheduling is a hard combinatorial problem with exponential number of solutions. In addition, the solutions have multiple costs that need to be optimized, such as memory consumption, time to execute, resources used etc. Such a problem with multiple costs is called a multi-criteria optimization problem. The solution to this problem is a set of incomparable solutions called Pareto solutions which need special algorithms to approximate them.We target a class of applications called streaming applications, which process a continuous stream of data. These applications apply similar computation on different data items, can be conveniently expressed by a class of models called dataflow models. We encode mapping and scheduling problem in form of logical constraints and present it to satisfiability modulo theory (SMT) solvers. SMT solvers, solve the encoded problem by using a combination of search techniques and constraint propagation to find an assignment to the problem variables satisfying the given cost constraints.In dataflow applications, the design space explodes with increased number of tasks and processors. In this thesis, we tackle this problem by introduction symmetry reduction techniques and demonstrate that symmetry breaking accelerates search in SMT solver, increasing the size of the problem that can be solved. Our design-space exploration algorithm approximates Pareto front of the problem and produces solutions with different cost trade-offs. Further we extend the scheduling problem to the many-core platforms which are a group of multi-core platforms connected by network-on-chip. We provide a design flow which performs mapping of the applications on such platforms and automatic insertion of additional elements to model the communication using bounded memory. We provide experimental results obtained on the 256-processor Kalray and the Tilera TILE-64 platforms.The multi-core processors have typically a small amount of memory close to the processor, generally insufficient for all application data to fit. We study a class of parallel applications having a regular data access pattern and large amount of data to be processed by a uniform computation. The data must be brought from main memory to local memory, processed and then the results written back to main memory, all in batches. Selecting the proper granularity of the data that is brought into local memory is an optimization problem. We formalize this problem and provide a way to determine the optimal transfer granularity depending on the characteristics of application and the hardware platform.In addition to the scheduling problems and local memory management, we study a part of the problem of runtime management of the applications. Applications in modern embedded systems can start and stop dynamically. In order to execute all the applications efficiently and to optimize global costs such as power consumption, execution time etc., the applications must be reconfigured dynamically at runtime. We present a predictable and composable (executing independently without affecting others) way of migrating tasks according to the reconfiguration decision

    A Case Study into Predictable and Composable MPSoC Reconfiguration

    No full text
    Abstract—The number of applications running concurrently on a MPSoC is ever increasing. Moreover, the set of running applications is often unknown at design-time. Part of the resource allocation decisions must therefore be deferred to run-time. This requires a run-time manager to optimize the resource usage of the system to preserve energy and allow as many applications as possible to use the resources simultaneously. An effective resource manager should therefore be able to reconfigure the resource assignment of running applications. To this end, a run-time task migration mechanism is needed. A user should however not notice the reconfiguration, as this would impact the perceived quality of the system. Hence, the reconfiguration mechanism should provide timing guarantees on its operation and it should not interfere with other applications running on the same system (i.e., it should be predictable and composable). In this paper, we present a practical implementation of such a predictable and composable MPSoC reconfiguration mechanism. We demonstrate the use of this mechanism on a JPEG decoder whose tasks are migrated at run-time while running on a state-of-the-art MPSoC platform. Index Terms—Task migration, real time systems, timing guarantees I
    corecore