165 research outputs found

    Supernode Transformation On Parallel Systems With Distributed Memory – An Analytical Approach

    Get PDF
    Supernode transformation, or tiling, is a technique that partitions algorithms to improve data locality and parallelism by balancing computation and inter-processor communication costs to achieve shortest execution or running time. It groups multiple iterations of nested loops into supernodes to be assigned to processors for processing in parallel. A supernode transformation can be described by supernode size and shape. This research focuses on supernode transformation on multi-processor architectures with distributed memory, including computer cluster systems and General Purpose Graphic Processing Units (GPGPUs). The research involves supernode scheduling, supernode mapping to processors, and the finding of the optimal supernode size, for achieving the shortest total running time. The algorithms considered are two nested loops with regular data dependencies. The Longest Common Subsequence problem is used as an illustration. A novel mathematical model for the total running time is established as a function of the supernode size, algorithm parameters such as the problem size and the data dependence, the computation time of each loop iteration, architecture parameters such as the number of processors, and the communication cost. The optimal supernode size is derived from this closed form model. The model and the optimal supernode size provide better results than previous researches and are verified by simulations on multi-processor systems including computer cluster systems and GPGPUs

    HIERARCHICAL MAPPING TECHNIQUES FOR SIGNAL PROCESSING SYSTEMS ON PARALLEL PLATFORMS

    Get PDF
    Dataflow models are widely used for expressing the functionality of digital signal processing (DSP) applications due to their useful features, such as providing formal mechanisms for description of application functionality, imposing minimal data-dependency constraints in specifications, and exposing task and data level parallelism effectively. Due to the increased complexity of dynamics in modern DSP applications, dataflow-based design methodologies require significant enhancements in modeling and scheduling techniques to provide for efficient and flexible handling of dynamic behavior. To address this problem, in this thesis, we propose an innovative framework for mode- and dynamic-parameter-based modeling and scheduling. We apply, in a systematically integrated way, the structured mode-based dataflow modeling capability of dynamic behavior together with the features of dynamic parameter reconfiguration and quasi-static scheduling. Moreover, in our proposed framework, we present a new design method called parameterized multidimensional design hierarchy mapping (PMDHM), which is targeted to the flexible, multi-level reconfigurability, and intensive real-time processing requirements of emerging dynamic DSP systems. The proposed approach allows designers to systematically represent and transform multi-level specifications of signal processing applications from a common, dataflow-based application-level model. In addition, we propose a new technique for mapping optimization that helps designers derive efficient, platform-specific parameters for application-to-architecture mapping. These parameters help to maximize system performance on state-of-the-art parallel platforms for embedded signal processing. To further enhance the scalability of our design representations and implementation techniques, we present a formal method for analysis and mapping of parameterized DSP flowgraph structures, called topological patterns, into efficient implementations. The approach handles an important class of parameterized schedule structures in a form that is intuitive for representation and efficient for implementation. We demonstrate our methods with case studies in the fields of wireless communication and computer vision. Experimental results from these case studies show that our approaches can be used to derive optimized implementations on parallel platforms, and enhance trade-off analysis during design space exploration. Furthermore, their basis in formal modeling and analysis techniques promotes the applicability of our proposed approaches to diverse signal processing applications and architectures

    Beyond Reuse Distance Analysis: Dynamic Analysis for Characterization of Data Locality Potential

    Get PDF
    Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak processing rate to memory bandwidth) as highlighted by recent studies on Exascale architectural trends. Further, flops are getting cheaper while the energy cost of data movement is increasingly dominant. The understanding and characterization of data locality properties of computations is critical in order to guide efforts to enhance data locality. Reuse distance analysis of memory address traces is a valuable tool to perform data locality characterization of programs. A single reuse distance analysis can be used to estimate the number of cache misses in a fully associative LRU cache of any size, thereby providing estimates on the minimum bandwidth requirements at different levels of the memory hierarchy to avoid being bandwidth bound. However, such an analysis only holds for the particular execution order that produced the trace. It cannot estimate potential improvement in data locality through dependence preserving transformations that change the execution schedule of the operations in the computation. In this article, we develop a novel dynamic analysis approach to characterize the inherent locality properties of a computation and thereby assess the potential for data locality enhancement via dependence preserving transformations. The execution trace of a code is analyzed to extract a computational directed acyclic graph (CDAG) of the data dependences. The CDAG is then partitioned into convex subsets, and the convex partitioning is used to reorder the operations in the execution trace to enhance data locality. The approach enables us to go beyond reuse distance analysis of a single specific order of execution of the operations of a computation in characterization of its data locality properties. It can serve a valuable role in identifying promising code regions for manual transformation, as well as assessing the effectiveness of compiler transformations for data locality enhancement. We demonstrate the effectiveness of the approach using a number of benchmarks, including case studies where the potential shown by the analysis is exploited to achieve lower data movement costs and better performance.Comment: Transaction on Architecture and Code Optimization (2014

    Automatic parallelisation for a class of URE problems

    Get PDF
    PhD ThesisThis thesis deals with the methodology and software of automatic parallelisation for numerical supercomputing and supercomputers. Basically, we focus on the problem of Uniform Recurrence Equations (URE) which exists widely in numerical computations. vVepropose a complete methodology of automatic generation of parallel programs for regular array designs. The methodology starts with an introduction of a set of canonical dependencies which generates a general modelling of the various URE problems. Based on these canonical dependencies, partitioning and mapping methods are developed which gives the foundation of the universal design process. Using the theoretical results we propose the structures of parallel programs and eventually generate automatically parallel codes which run correctly and efficiently on transputer array. The achievements presented in this thesis can be regarded as a significant progress in the area of automatic generation of parallel codes and regular (systolic) array design. This methodology is integrated and self-contained, and may be the only practical working package in this area.The Research Committee of University of Newcastle upon Tyne: CVCP Overseas Research Students Awards Scheme

    Hybrid mapping for static and non-static indoor environments

    Get PDF
    Mención Internacional en el título de doctorIndoor environments populated by humans, such as houses, offices or universities, involve a great complexity due to the diversity of geometries and situations that they may present. Apart from the size of the environment, they can contain multiple rooms distributed into floors and corridors, repetitive structures and loops, and they can get as complicated as one can imagine. In addition, the structure and situations that the environment present may vary over time as objects could be moved, doors can be frequently opened or closed and places can be used for different purposes. Mobile robots need to solve these challenging situations in order to successfully operate in the environment. The main tools that a mobile robot has for dealing with these situations relate to navigation and perception and comprise mapping, localization, path planning and map adaptation. In this thesis, we try to address some of the open problems in robot navigation in non-static indoor environments. We focus on house-like environments as the work is framed into the HEROITEA research project that aims attention at helping elderly people with their everyday-life activities at their homes. This thesis contributes to HEROITEA with a complete robotic mapping system and map adaptation that grants safe navigation and understanding of the environment. Moreover, we provide localization and path planning strategies within the resulting map to further operate in the environment. The first problem tackled in this thesis is robot mapping in static indoor environments. We propose a hybrid mapping method that structures the information gathered from the environment into several maps. The hybrid map contains diverse knowledge of the environment such as its structure, the navigable and blocked paths, and semantic knowledge, such as the objects or scenes in the environment. All this information is separated into different components of the hybrid map that are interconnected so the system can, at any time, benefit from the information contained in every component. In addition to the conceptual conception of the hybrid map, we have also developed building procedures and an exploration algorithm to autonomous build the hybrid map. However, indoor environments populated by humans are far from being static as the environment may change over time. For this reason, the second problem tackled in this thesis is the adaptation of the map to non-static environments. We propose an object-based probabilistic map adaptation that calculates the likelihood of moving or remaining in its place for the different objects in the environment. Finally, a map is just a description of the environment whose importance is mostly related to how the map is used. In addition, map representations are more valuable as long as they offer a wider range of applications. Therefore, the third problem that we approach in this thesis is exploiting the intrinsic characteristics of the hybrid map in order to enhance the performance of localization and path planning methods. The particular objectives of these approaches are precision for robot localization and efficiency for path planning in terms of execution time and traveled distance. We evaluate our proposed methods in a diversity of simulated and real-world indoor environments. In this extensive evaluation, we show that hybrid maps can be efficiently built and maintained over time and they open up for new possibilities for localization and path planning. In this thesis, we show an increase in localization precision and robustness and an improvement in path planning performance. In sum, this thesis makes several contributions in the context of robot navigation in indoor environments, and especially in hybrid mapping. Hybrid maps offer higher efficiency during map building and other applications such as localization and path planning. In addition, we highlight the necessity of dealing with the dynamics of indoor environments and the benefits of combining topological, semantic and metric information to the autonomy of a mobile robot.Los entornos de interiores habitados por personas, como casas, oficinas o universidades, entrañan una gran complejidad por la diversidad de geometrías y situaciones que pueden ocurrir. Aparte de las diferencias en tamaño, estos entornos pueden contener muchas habitaciones organizadas en diferentes plantas o pasillos, pueden presentar estructuras repetitivas o bucles de tal forma que los entornos pueden llegar a ser tan complejos como uno se pueda imaginar. Además, la estructura y el estado del entorno pueden variar con el tiempo, ya que los objetos pueden moverse, las puertas pueden estar cerradas o abiertas y diferentes espacios pueden ser usados para diferentes propósitos. Los robots móviles necesitan resolver estas situaciones difíciles para poder funcionar de una forma satisfactoria. Las principales herramientas que tiene un robot móvil para manejar estas situaciones están relacionadas con la navegación y la percepción y comprenden el mapeado, la localización, la planificación de trayectorias y la adaptación del mapa. En esta tesis, abordamos algunos de los problemas sin resolver de la navegación de robots móviles en entornos de interiores no estáticos. Nos centramos en entornos tipo casa ya que este trabajo se enmarca en el proyecto de investigación HEROITEA que se enfoca en ayudar a personas ancianas en tareas cotidianas del hogar. Esta tesis contribuye al proyecto HEROITEA con un sistema completo de mapeado y adaptación del mapa que asegura una navegación segura y la comprensión del entorno. Además, aportamos métodos de localización y planificación de trayectorias usando el mapa construido para realizar nuevas tareas en el entorno. El primer problema que se aborda en esta tesis es el mapeado de entornos de interiores estáticos por parte de un robot. Proponemos un método de mapeado híbrido que estructura la información capturada en varios mapas. El mapa híbrido contiene información sobre la estructura del entorno, las trayectorias libres y bloqueadas y también incluye información semántica, como los objetos y escenas en el entorno. Toda esta información está separada en diferentes componentes del mapa híbrido que están interconectados de tal forma que el sistema puede beneficiarse en cualquier momento de la información contenida en cada componente. Además de la definición conceptual del mapa híbrido, hemos desarrollado unos procedimientos para construir el mapa y un algoritmo de exploración que permite que esta construcción se realice autónomamente. Sin embargo, los entornos de interiores habitados por personas están lejos de ser estáticos ya que pueden cambiar a lo largo del tiempo. Por esta razón, el segundo problema que intentamos solucionar en esta tesis es la adaptación del mapa para entornos no estáticos. Proponemos un método probabilístico de adaptación del mapa basado en objetos que calcula la probabilidad de que cada objeto en el entorno haya sido movido o permanezca en su posición anterior. Para terminar, un mapa es simplemente una descripción del entorno cuya importancia está principalmente relacionada con su uso. Por ello, los mapas más valiosos serán los que ofrezcan un rango mayor de aplicaciones. Para abordar este asunto, el tercer problema que intentamos solucionar es explotar las características intrínsecas del mapa híbrido para mejorar el desempeño de métodos de localización y de planificación de trayectorias usando el mapa híbrido. El objetivo principal de estos métodos es aumentar la precisión en la localización del robot y la eficiencia en la planificación de trayectorias en relación al tiempo de ejecución y la distancia recorrida. Hemos evaluado los métodos propuestos en una variedad de entornos de interiores simulados y reales. En esta extensa evaluación, mostramos que los mapas híbridos pueden construirse y mantenerse en el tiempo de forma eficiente y que dan lugar a nuevas posibilidades en cuanto a localización y planificación de trayectorias. En esta tesis, mostramos un aumento en la precisión y robustez en la localización y una mejora en el desempeño de la planificación de trayectorias. En resumen, esta tesis lleva a cabo diversas contribuciones en el ámbito de la navegación de robots móviles en entornos de interiores, y especialmente en mapeado híbrido. Los mapas híbridos ofrecen más eficiencia durante la construcción del mapa y en otras tareas como la localización y la planificación de trayectorias. Además, resaltamos la necesidad de tratar los cambios en entornos de interiores y los beneficios de combinar información topológica, semántica y métrica para la autonomía del robot.Programa de Doctorado en Ingeniería Eléctrica, Electrónica y Automática por la Universidad Carlos III de MadridPresidente: Carlos Balaguer Bernaldo de Quirós.- Secretario: Javier González Jiménez.- Vocal: Nancy Marie Amat

    Optimization techniques for fine-grained communication in PGAS environments

    Get PDF
    Partitioned Global Address Space (PGAS) languages promise to deliver improved programmer productivity and good performance in large-scale parallel machines. However, adequate performance for applications that rely on fine-grained communication without compromising their programmability is difficult to achieve. Manual or compiler assistance code optimization is required to avoid fine-grained accesses. The downside of manually applying code transformations is the increased program complexity and hindering of the programmer productivity. On the other hand, compiler optimizations of fine-grained accesses require knowledge of physical data mapping and the use of parallel loop constructs. This thesis presents optimizations for solving the three main challenges of the fine-grain communication: (i) low network communication efficiency; (ii) large number of runtime calls; and (iii) network hotspot creation for the non-uniform distribution of network communication, To solve this problems, the dissertation presents three approaches. First, it presents an improved inspector-executor transformation to improve the network efficiency through runtime aggregation. Second, it presents incremental optimizations to the inspector-executor loop transformation to automatically remove the runtime calls. Finally, the thesis presents a loop scheduling loop transformation for avoiding network hotspots and the oversubscription of nodes. In contrast to previous work that use static coalescing, prefetching, limited privatization, and caching, the solutions presented in this thesis focus cover all the aspect of fine-grained communication, including reducing the number of calls generated by the compiler and minimizing the overhead of the inspector-executor optimization. A performance evaluation with various microbenchmarks and benchmarks, aiming at predicting scaling and absolute performance numbers of a Power 775 machine, indicates that applications with regular accesses can achieve up to 180% of the performance of hand-optimized versions, while in applications with irregular accesses the transformations are expected to yield from 1.12X up to 6.3X speedup. The loop scheduling shows performance gains from 3-25% for NAS FT and bucket-sort benchmarks, and up to 3.4X speedup for the microbenchmarks
    • …
    corecore