17 research outputs found
Quantitative Models of Parallel Algorithms
This thesis presents ANDES, a technique for quantitative modelling of parallel algorithms and programs. An ANDES model is a valued DAG composed of computation nodes. The precedence is modelled by arcs. Data flow is modelled through input/output logics. Still, ANDES allows the representation of some non-determinist characteristics like branching. Hierarchical and regular models can also be constructed. Some examples are given. ANDES is introduced after an overview of other modelling techniques available in the literature (e.g. GMB). The ANDES-C library is used to represent ANDES models which are indeed C programs. One of the advantage of this textual representation is the possibility of easy description of thousands of computation nodes. ANDES can be employed inside different contexts of performance evaluation, mainly in order to describe a workload. This workload model can be given, for example, to a simulator or to an analytical model (e.g., a queueing system). In this work, ANDES is used to generate a synthetic load to be executed on a real parallel machine. This environment for generation and execution is called ANDES-Synth. Inside this environment, it is also possible to model a parallel machine which is ``emulated'' on the real multiprocessor. We use ANDES-Synth for the evaluation of static mapping strategies (four greedy heuristics and two iterative algorithms). A clustering technique (used in the Pyrros tool) is employed in order to apply the mapping strategies to the ANDES models. -- Taille du fichier postscript : 1521,932 KBCette thèse présente ANDES, une technique de modélisation quantitative d'algorithmes et de programmes parallèles. Le modèle est un graphe orienté et valué sans circuit composé de noeuds de calcul. Les arcs modélisent la précédence. Par le moyen de logiques d'entrée et de sortie, il est possible de modéliser le flot de données. ANDES prévoit la modélisation de certaines caractéristiques non-déterministes des algorithmes (e.g. branchement). Un support pour la description hiérarchique et regulière est aussi prevu. Des exemples de modèles ANDES sont présentés. La description du modèle est faite à partir d'une étude des autres techniques disponibles dans la littérature (e.g., GMB). La bibliothèque ANDES-C est utilisé pour la description de modèles ANDES. Avec cette bibliothèque, un modèle ANDES est décrit comme un programme C. L'avantage de cette représentation textuelle est, entre autres, la possibilité de décrire, de façon compacte, de modèles avec de milliers de noeuds de calcul. Le modèle ANDES peut être utilisé dans différents contextes d'évaluation de performances, principalement comme une forme de modélisation d'une charge de travail. Ce modèle de la charge de travail peut être donné, par exemple, à un simulateur ou à un modèle analytique (e.g., un système de files d'attente). Dans ce travail, nous utilisons ANDES afin de générer, à partir des modèles, des charges synthétiques exécutées par une vraie machine parallèle. Cet environnement de transformation et d'exécution d'une charge synthétique est appelé ANDES-Synth. A part le modèle de la charge de travail, il est possible de modéliser aussi une machine parallèle qui est "émulée" par la machine parallèle cible. ANDES-Synth est utilisé, dans ce travail de thèse, pour l'évaluation de stratégies de placement statique (quatre heuristiques gloutonnes et deux itératives). Un algorithme de regroupement (utilisé dans l'outil Pyrros) est utilisé afin de permettre l'application des stratégies de placement aux modèles ANDES
Tempos de comunicação em multiprocessadores
Na pesquisa por novas maneiras de se obter maior poder de processamento dos computadores, o paralelismo é considerado uma alternativa viável. Mas a replicação de processadores não representa por si só um avanço nestas pesquisas. Problemas surgiram, antes Inexistentes no paradigma seqüencial: paralelização da solução, mapeamento no arquitetura alvo, balanceamento da carga da maquina paralela, comunicação e sincronização, entre outros. Em particular, a comunicação entre processos em um multiprocessador fracamente acoplado é um aspecto crucial que afeta o desempenho deste tipo de sistema como um todo. Quatro estratégias de comunicação entre processadores são apreciadas neste trabalho: comutação de mensagens ("message switching"), "virtual cut — through", "rendez — vous" "wormhole". Para cada caso, modelos analÃticos (baseados em teoria de filas) e de simulação discreta são desenvolvidos e aplicados a fim de determinar, dentro de certos contextos, qual a melhor estratégia. O "cut — through" e a comutacão de mensagens (este última não depende de hardware especifico) são as melhores polÃticas para sistemas com elevado grau de comunicação (os modelos destas estratégias, utilizados neste trabalho, já foram desenvolvidos na literatura por Kerman) e Kielnrock). O "wormhole", que apresenta caracterÃsticas de reserve, pode ser apropriado para sistemas com pouca troca de mensagens. "Rendez—vous" não depende de hardware especial, mas apresenta maior tempo de comunlcação em relação as outras estratégias. Os modelos descritos foram construÃdos de acordo com uma metodologia passo-a-passo e modular. Esta metodologia é também apresentada e fundamenta a linha de raciocÃnio desenvolvida durante a apresentac5o dos diferentes capÃtulos desta dissertação.In the research for more computer processing power, parallelism is a feasible alternative. But the processor replication alone doesn't represent an advance In this field. New problems, absent in the sequential paradigm, have appeared: solution paralleilzatIon, mapping, load balancing, synchronization, communication and others. The communication between processes In loosely - -coupled multiprocessors affects the system performance as a whole. Four Interprocessor communication strategies are analyzed in this work: message switching, virtual cut- -through, "rendez — vous" and wormhole. For each case, analytic (based on queueing theory) and simulation models are developed and applied In order to determine which strategy is the best and under which contexts. Cut—through and message switching (this last strategy doesn't depend on specific hardware) are better for heavy — loaded systems (these strategies were already modelled by Kerman) and Kleinrock). Wormhole (presenting blocking and reserving aspects) can be more suitable for systems with low communication level. "Rendez — vous" doesn't depend on special hardware, but generates longer communication times than those generated by the other communication strategies. The models described were developed according to a step — by — step and modular methodologyThis method Is also presented and gives logical support to the work through the different chapters
Tempos de comunicação em multiprocessadores
Na pesquisa por novas maneiras de se obter maior poder de processamento dos computadores, o paralelismo é considerado uma alternativa viável. Mas a replicação de processadores não representa por si só um avanço nestas pesquisas. Problemas surgiram, antes Inexistentes no paradigma seqüencial: paralelização da solução, mapeamento no arquitetura alvo, balanceamento da carga da maquina paralela, comunicação e sincronização, entre outros. Em particular, a comunicação entre processos em um multiprocessador fracamente acoplado é um aspecto crucial que afeta o desempenho deste tipo de sistema como um todo. Quatro estratégias de comunicação entre processadores são apreciadas neste trabalho: comutação de mensagens ("message switching"), "virtual cut — through", "rendez — vous" "wormhole". Para cada caso, modelos analÃticos (baseados em teoria de filas) e de simulação discreta são desenvolvidos e aplicados a fim de determinar, dentro de certos contextos, qual a melhor estratégia. O "cut — through" e a comutacão de mensagens (este última não depende de hardware especifico) são as melhores polÃticas para sistemas com elevado grau de comunicação (os modelos destas estratégias, utilizados neste trabalho, já foram desenvolvidos na literatura por Kerman) e Kielnrock). O "wormhole", que apresenta caracterÃsticas de reserve, pode ser apropriado para sistemas com pouca troca de mensagens. "Rendez—vous" não depende de hardware especial, mas apresenta maior tempo de comunlcação em relação as outras estratégias. Os modelos descritos foram construÃdos de acordo com uma metodologia passo-a-passo e modular. Esta metodologia é também apresentada e fundamenta a linha de raciocÃnio desenvolvida durante a apresentac5o dos diferentes capÃtulos desta dissertação.In the research for more computer processing power, parallelism is a feasible alternative. But the processor replication alone doesn't represent an advance In this field. New problems, absent in the sequential paradigm, have appeared: solution paralleilzatIon, mapping, load balancing, synchronization, communication and others. The communication between processes In loosely - -coupled multiprocessors affects the system performance as a whole. Four Interprocessor communication strategies are analyzed in this work: message switching, virtual cut- -through, "rendez — vous" and wormhole. For each case, analytic (based on queueing theory) and simulation models are developed and applied In order to determine which strategy is the best and under which contexts. Cut—through and message switching (this last strategy doesn't depend on specific hardware) are better for heavy — loaded systems (these strategies were already modelled by Kerman) and Kleinrock). Wormhole (presenting blocking and reserving aspects) can be more suitable for systems with low communication level. "Rendez — vous" doesn't depend on special hardware, but generates longer communication times than those generated by the other communication strategies. The models described were developed according to a step — by — step and modular methodologyThis method Is also presented and gives logical support to the work through the different chapters
Performance Evaluation of a Parallel Tabu Search Task Scheduling Algorithm
This paper presents the solution quality analysis of a parallel tabu search algorithm for the task scheduling problem on heterogeneous processors under precedence constraints. We evaluate the achieved makespan reduction of different parallel applications relatively to the results obtained by the best greedy algorithm in the literature, as a function of parameters such as problem size, system heterogeneity, and number of processors. Our results show that the tabu search algorithm is superior to the greedy algorithm in many cases where the latter is not capable of profiting from the inherent application parallelism and system heterogeneity
Experimental Analysis of a Parallel Quicksort-Based Algorithm for Suffix Array Generation
. This paper presents experiments performed with an implementation of a quicksort-based parallel indexing algorithm. Besides the expected reduction in execution time, it was observed that the word frequency distribution of the input textual database has a strong influence on performance. Communication and computational load balances are achieved by processing the same quantity of text on each processor. This effectively occurs due to the auto-similar feature of texts, verified experimentally in this work. Also, as seen by the experiments, the auto-similarity of the word frequency distribution implies that this distribution is independent of the text size. In terms of implementation, the knowledge a priori of this word frequency may improve the indexing time by eliminating certain parts of the algorithm. Keywords: Parallel Processing, Information Retrieval, Index Generation, Auto-Similarity, Message Passing. 1 Introduction Information retrieval is a research area of growing interest by..