14 research outputs found

    Dynamic load balancing of parallel road traffic simulation

    Get PDF
    The objective of this research was to investigate, develop and evaluate dynamic load-balancing strategies for parallel execution of microscopic road traffic simulations. Urban road traffic simulation presents irregular, and dynamically varying distributed computational load for a parallel processor system. The dynamic nature of road traffic simulation systems lead to uneven load distribution during simulation, even for a system that starts off with even load distributions. Load balancing is a potential way of achieving improved performance by reallocating work from highly loaded processors to lightly loaded processors leading to a reduction in the overall computational time. In dynamic load balancing, workloads are adjusted continually or periodically throughout the computation. In this thesis load balancing strategies were evaluated and some load balancing policies developed. A load index and a profitability determination algorithms were developed. These were used to enhance two load balancing algorithms. One of the algorithms exhibits local communications and distributed load evaluation between the neighbour partitions (diffusion algorithm) and the other algorithm exhibits both local and global communications while the decision making is centralized (MaS algorithm). The enhanced algorithms were implemented and synthesized with a research parallel traffic simulation. The performance of the research parallel traffic simulator, optimized with the two modified dynamic load balancing strategies were studied

    Decomposition of unstructured meshes for efficient parallel computation

    Get PDF

    Dynamic load balancing of parallel road traffic simulation

    Get PDF
    The objective of this research was to investigate, develop and evaluate dynamic load-balancing strategies for parallel execution of microscopic road traffic simulations. Urban road traffic simulation presents irregular, and dynamically varying distributed computational load for a parallel processor system. The dynamic nature of road traffic simulation systems lead to uneven load distribution during simulation, even for a system that starts off with even load distributions. Load balancing is a potential way of achieving improved performance by reallocating work from highly loaded processors to lightly loaded processors leading to a reduction in the overall computational time. In dynamic load balancing, workloads are adjusted continually or periodically throughout the computation. In this thesis load balancing strategies were evaluated and some load balancing policies developed. A load index and a profitability determination algorithms were developed. These were used to enhance two load balancing algorithms. One of the algorithms exhibits local communications and distributed load evaluation between the neighbour partitions (diffusion algorithm) and the other algorithm exhibits both local and global communications while the decision making is centralized (MaS algorithm). The enhanced algorithms were implemented and synthesized with a research parallel traffic simulation. The performance of the research parallel traffic simulator, optimized with the two modified dynamic load balancing strategies were studied.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Code Generation and Global Optimization Techniques for a Reconfigurable PRAM-NUMA Multicore Architecture

    Full text link

    PT-Scotch and libScotch 5.1 User's Guide

    Get PDF
    76 pagesUser's manualThis document describes the capabilities and operations of PT-Scotch and libScotch, a software package and a software library which compute parallel static mappings and parallel sparse matrix block orderings of graphs. It gives brief descriptions of the algorithms, details the input/output formats, instructions for use, installation procedures, and provides a number of examples. PT-Scotch is distributed as free/libre software, and has been designed such that new partitioning or ordering methods can be added in a straightforward manner. It can therefore be used as a testbed for the easy and quick coding and testing of such new methods, and may also be redistributed, as a library, along with third-party software that makes use of it, either in its original or in updated forms

    Energy-aware scheduling in heterogeneous computing systems

    Get PDF
    In the last decade, the grid computing systems emerged as useful provider of the computing power required for solving complex problems. The classic formulation of the scheduling problem in heterogeneous computing systems is NP-hard, thus approximation techniques are required for solving real-world scenarios of this problem. This thesis tackles the problem of scheduling tasks in a heterogeneous computing environment in reduced execution times, considering the schedule length and the total energy consumption as the optimization objectives. An efficient multithreading local search algorithm for solving the multi-objective scheduling problem in heterogeneous computing systems, named MEMLS, is presented. The proposed method follows a fully multi-objective approach, applying a Pareto-based dominance search that is executed in parallel by using several threads. The experimental analysis demonstrates that the new multithreading algorithm outperforms a set of fast and accurate two-phase deterministic heuristics based on the traditional MinMin. The new ME-MLS method is able to achieve significant improvements in both makespan and energy consumption objectives in reduced execution times for a large set of testbed instances, while exhibiting very good scalability. The ME-MLS was evaluated solving instances comprised of up to 2048 tasks and 64 machines. In order to scale the dimension of the problem instances even further and tackle large-sized problem instances, the Graphical Processing Unit (GPU) architecture is considered. This line of future work has been initially tackled with the gPALS: a hybrid CPU/GPU local search algorithm for efficiently tackling a single-objective heterogeneous computing scheduling problem. The gPALS shows very promising results, being able to tackle instances of up to 32768 tasks and 1024 machines in reasonable execution times.En la última década, los sistemas de computación grid se han convertido en útiles proveedores de la capacidad de cálculo necesaria para la resolución de problemas complejos. En su formulación clásica, el problema de la planificación de tareas en sistemas heterogéneos es un problema NP difícil, por lo que se requieren técnicas de resolución aproximadas para atacar instancias de tamaño realista de este problema. Esta tesis aborda el problema de la planificación de tareas en sistemas heterogéneos, considerando el largo de la planificación y el consumo energético como objetivos a optimizar. Para la resolución de este problema se propone un algoritmo de búsqueda local eficiente y multihilo. El método propuesto se trata de un enfoque plenamente multiobjetivo que consiste en la aplicación de una búsqueda basada en dominancia de Pareto que se ejecuta en paralelo mediante el uso de varios hilos de ejecución. El análisis experimental demuestra que el algoritmo multithilado propuesto supera a un conjunto de heurísticas deterministas rápidas y e caces basadas en el algoritmo MinMin tradicional. El nuevo método, ME-MLS, es capaz de lograr mejoras significativas tanto en el largo de la planificación y como en consumo energético, en tiempos de ejecución reducidos para un gran número de casos de prueba, mientras que exhibe una escalabilidad muy promisoria. El ME-MLS fue evaluado abordando instancias de hasta 2048 tareas y 64 máquinas. Con el n de aumentar la dimensión de las instancias abordadas y hacer frente a instancias de gran tamaño, se consideró la utilización de la arquitectura provista por las unidades de procesamiento gráfico (GPU). Esta línea de trabajo futuro ha sido abordada inicialmente con el algoritmo gPALS: un algoritmo híbrido CPU/GPU de búsqueda local para la planificación de tareas en en sistemas heterogéneos considerando el largo de la planificación como único objetivo. La evaluación del algoritmo gPALS ha mostrado resultados muy prometedores, siendo capaz de abordar instancias de hasta 32768 tareas y 1024 máquinas en tiempos de ejecución razonables

    Resistance Distance, Kirchhoff Index, Foster's Theorems, and Generalizations

    Get PDF
    The emerging area of network science studies structural characteristics of networks and dynamical processes on networks such as spread of epidemics, vulnerability of power grids to cascading failures etc. In this area, several measures of network performance have been introduced and studied. In this dissertation, we study two measures, namely, resistance distance and Kirchhoff index. Treating each element of a graph as a resistance, resistance distance between two nodes u and v is the effective resistance across u and v. Kirchhoff index defined by the chemistry community is the sum of the effective resistances across all pairs of nodes of the graph. Kirchhoff index, also called network criticality, has been studied by the communication network community. Kirchhoff index has been studied using the graph Laplacian matrix which is the same as the indefinite admittance matrix of a resistance network. Our research is on reducing the computational effort in calculating the Kirchhoff index in networks. First a simpler formula for Kirchhoff index based on the properties of node-to-datum resistance matrix is presented. To avoid computational complexity and extraneous efforts of Moore-Penrose pseudoinverse, Kirchhoff index is calculated in terms of the inverse of the reduced Laplacian matrix. The notion of Laplacian matrix is then generalized using the fundamental cutset matrix of a graph. Two approaches to compute Kirchhoff index are presented: The first approach is based on a matrix transformation, and the second approach uses the concept of Kirchhoff polynomial of a graph. Kirchhoff polynomial of a graph introduced in this work is defined for each spanning tree of the graph. In 1949 and 1961 Foster established two theorems that give identities involving resistance distances. We introduce the concept of Weighted Kirchhoff index of a graph and study its relationship to Foster’s theorems. We present a generalization of Foster’s theorems that retains the circuit-theoretic flavor and elegance of Foster’s theorems, and develop a dual form of this theorem. Kirchhoff index captures the effect of topological structure on the performance of networks. It also captures the path diversity between nodes in a network. Kirchhoff index can be used to determine node betweenness in networks that are of interest in network vulnerability studies. In view of this, an efficient methodology to compute Kirchhoff index is required. For this purpose, we propose sequential and parallel algorithms. In addition, we introduce a novel 3-step approximation algorithm for calculation of resistance distance and Kirchhoff index

    Scotch and libScotch 5.1 User's Guide

    Get PDF
    127 pagesUser's manualThis document describes the capabilities and operations of Scotch and libScotch, a software package and a software library devoted to static mapping, partitioning, and sparse matrix block ordering of graphs and meshes/hypergraphs. It gives brief descriptions of the algorithms, details the input/output formats, instructions for use, installation procedures, and provides a number of examples. Scotch is distributed as free/libre software, and has been designed such that new partitioning or ordering methods can be added in a straightforward manner. It can therefore be used as a testbed for the easy and quick coding and testing of such new methods, and may also be redistributed, as a library, along with third-party software that makes use of it, either in its original or in updated forms

    Procesamiento paralelo : Balance de carga dinámico en algoritmo de sorting

    Get PDF
    Algunas técnicas de sorting intentan balancear la carga mediante un muestreo inicial de los datos a ordenar y una distribución de los mismos de acuerdo a pivots. Otras redistribuyen listas parcialmente ordenadas de modo que cada procesador almacene un número aproximadamente igual de claves, y todos tomen parte del proceso de merge durante la ejecución. Esta Tesis presenta un nuevo método que balancea dinámicamente la carga basado en un enfoque diferente, buscando realizar una distribución del trabajo utilizando un estimador que permita predecir la carga de trabajo pendiente. El método propuesto es una variante de Sorting by Merging Paralelo, esto es, una técnica basada en comparación. Las ordenaciones en los bloques se realizan mediante el método de Burbuja o Bubble Sort con centinela. En este caso, el trabajo a realizar -en términos de comparaciones e intercambios- se encuentra afectada por el grado de desorden de los datos. Se estudió la evolución de la cantidad de trabajo en cada iteración del algoritmo para diferentes tipos de secuencias de entrada, n datos con valores de a n sin repetición, datos al azar con distribución normal, observándose que el trabajo disminuye en cada iteración. Esto se utilizó para obtener una estimación del trabajo restante esperado a partir de una iteración determinada, y basarse en el mismo para corregir la distribución de la carga. Con esta idea, el métoEs revisado por: http://sedici.unlp.edu.ar/handle/10915/9500Facultad de Ciencias Exacta

    Structured parallelism discovery with hybrid static-dynamic analysis and evaluation technique

    Get PDF
    Parallel computer architectures have dominated the computing landscape for the past two decades; a trend that is only expected to continue and intensify, with increasing specialization and heterogeneity. This creates huge pressure across the software stack to produce programming languages, libraries, frameworks and tools which will efficiently exploit the capabilities of parallel computers, not only for new software, but also revitalizing existing sequential code. Automatic parallelization, despite decades of research, has had limited success in transforming sequential software to take advantage of efficient parallel execution. This thesis investigates three approaches that use commutativity analysis as the enabler for parallelization. This has the potential to overcome limitations of traditional techniques. We introduce the concept of liveness-based commutativity for sequential loops. We examine the use of a practical analysis utilizing liveness-based commutativity in a symbolic execution framework. Symbolic execution represents input values as groups of constraints, consequently deriving the output as a function of the input and enabling the identification of further program properties. We employ this feature to develop an analysis and discern commutativity properties between loop iterations. We study the application of this approach on loops taken from real-world programs in the OLDEN and NAS Parallel Benchmark (NPB) suites, and identify its limitations and related overheads. Informed by these findings, we develop Dynamic Commutativity Analysis (DCA), a new technique that leverages profiling information from program execution with specific input sets. Using profiling information, we track liveness information and detect loop commutativity by examining the code’s live-out values. We evaluate DCA against almost 1400 loops of the NPB suite, discovering 86% of them as parallelizable. Comparing our results against dependence-based methods, we match the detection efficacy of two dynamic and outperform three static approaches, respectively. Additionally, DCA is able to automatically detect parallelism in loops which iterate over Pointer-Linked Data Structures (PLDSs), taken from wide range of benchmarks used in the literature, where all other techniques we considered failed. Parallelizing the discovered loops, our methodology achieves an average speedup of 3.6× across NPB (and up to 55×) and up to 36.9× for the PLDS-based loops on a 72-core host. We also demonstrate that our methodology, despite relying on specific input values for profiling each program, is able to correctly identify parallelism that is valid for all potential input sets. Lastly, we develop a methodology to utilize liveness-based commutativity, as implemented in DCA, to detect latent loop parallelism in the shape of patterns. Our approach applies a series of transformations which subsequently enable multiple applications of DCA over the generated multi-loop code section and match its loop commutativity outcomes against the expected criteria for each pattern. Applying our methodology on sets of sequential loops, we are able to identify well-known parallel patterns (i.e., maps, reduction and scans). This extends the scope of parallelism detection to loops, such as those performing scan operations, which cannot be determined as parallelizable by simply evaluating liveness-based commutativity conditions on their original form
    corecore