75 research outputs found

    Beyond shared memory loop parallelism in the polyhedral model

    Get PDF
    2013 Spring.Includes bibliographical references.With the introduction of multi-core processors, motivated by power and energy concerns, parallel processing has become main-stream. Parallel programming is much more difficult due to its non-deterministic nature, and because of parallel programming bugs that arise from non-determinacy. One solution is automatic parallelization, where it is entirely up to the compiler to efficiently parallelize sequential programs. However, automatic parallelization is very difficult, and only a handful of successful techniques are available, even after decades of research. Automatic parallelization for distributed memory architectures is even more problematic in that it requires explicit handling of data partitioning and communication. Since data must be partitioned among multiple nodes that do not share memory, the original memory allocation of sequential programs cannot be directly used. One of the main contributions of this dissertation is the development of techniques for generating distributed memory parallel code with parametric tiling. Our approach builds on important contributions to the polyhedral model, a mathematical framework for reasoning about program transformations. We show that many affine control programs can be uniformized only with simple techniques. Being able to assume uniform dependences significantly simplifies distributed memory code generation, and also enables parametric tiling. Our approach implemented in the AlphaZ system, a system for prototyping analyses, transformations, and code generators in the polyhedral model. The key features of AlphaZ are memory re-allocation, and explicit representation of reductions. We evaluate our approach on a collection of polyhedral kernels from the PolyBench suite, and show that our approach scales as well as PLuTo, a state-of-the-art shared memory automatic parallelizer using the polyhedral model. Automatic parallelization is only one approach to dealing with the non-deterministic nature of parallel programming that leaves the difficulty entirely to the compiler. Another approach is to develop novel parallel programming languages. These languages, such as X10, aim to provide highly productive parallel programming environment by including parallelism into the language design. However, even in these languages, parallel bugs remain to be an important issue that hinders programmer productivity. Another contribution of this dissertation is to extend the array dataflow analysis to handle a subset of X10 programs. We apply the result of dataflow analysis to statically guarantee determinism. Providing static guarantees can significantly increase programmer productivity by catching questionable implementations at compile-time, or even while programming

    HDeepRM: Deep Reinforcement Learning para la Gestión de Cargas de Trabajo en Clústeres Heterogéneos

    Get PDF
    ABSTRACT: High Performance Computing (HPC) environments offer users computational capability as a service. They are constituted by computing clusters, which are groups of resources available for processing jobs sent by the users. Heterogeneous configurations of these clusters allow for providing resources fitted to a wider spectrum of workloads, superior to that of traditional homogeneous approaches. This in turn improves the computational and energetic efficiency of the service. Scheduling of resources for incoming jobs is undertaken by a workload manager following a established policy. Classic policies have been developed for homogeneous environments, with literature focusing on improving job selection policies. Nevertheless, in heterogeneous configurations the resource selection is as relevant for optimizing the offered service. Complexity of scheduling policies grows with the number of resources and degree of heterogeneity in the service. Deep Reinforcement Learning (DRL) has been recently evaluated in homogeneous workload management scenarios as an alternative to deal with complex patterns. It introduces an artificial agent which estimates via learning the optimal scheduling policy for a given system. In this thesis, HDeepRM, a novel framework for the study of DRL agents in heterogeneous clusters is designed, implemented, tested and distributed. This leverages a state-of-the-art simulator, and offers users a clean interface for developing their own bespoke agents, as well as evaluating them before going into production. Evaluations have been undertaken to demonstrate the validity of the framework. Two agents based on well-known reinforcement learning algorithms are implemented over HDeepRM, and results show the research potential in this area for the scientific community.RESUMEN: Los entornos de High Performance Computing (HPC) ofrecen capacidad computacional como servicio a sus usuarios. Están formados por clústeres de cómputo, grupos de recursos que aceptan y procesan trabajos enviados por los usuarios. Las configuraciones heterogéneas permiten disponer de recursos adecuados a un espectro de cargas de trabajo superior al de los clústeres homogéneos tradicionales, mejorando la eficiencia computacional y energética del servicio. La asociación de trabajos con recursos del sistema es llevada a cabo por un gestor de cargas de trabajo siguiendo una política de planificación. Las políticas clásicas han sido desarrolladas para entornos homogéneos, y la literatura se centra en la selección del trabajo. Sin embargo, en entornos heterogéneos la selección del recurso es de relevancia para la optimización del servicio. La complejidad de las políticas de planificación crece con el número de recursos y la heterogeneidad del sistema. El Aprendizaje Profundo por Refuerzo o Deep Reinforcement Learning (DRL) ha sido recientemente objeto de estudio como alternativa para la gestión de cargas de trabajo. En él, se propone un agente artificial que estima mediante aprendizaje la política de planificación óptima para un determinado sistema. En esta tesis se describe el proceso de creación de HDeepRM, un nuevo marco de trabajo cuyo objetivo es el estudio de agentes basados en DRL para la estimación de políticas de planificación en clústeres heterogéneos. Implementado sobre un simulador actual, HDeepRM permite crear y evaluar nuevos agentes antes de llevarlos a producción. Se ha llevado a cabo el diseño, implementación, pruebas y empaquetado del software para poder distribuirlo a la comunidad científica. Finalmente, en las evaluaciones se demuestra la validez del marco de trabajo, y se implementan sobre él dos agentes basados en algoritmos de DRL. La comparación de estos con políticas clásicas muestra el potencial de investigación en este área.Máster en Ingeniería Informátic

    Visualization of unsteady computational fluid dynamics

    Get PDF
    A brief summary of the computer environment used for calculating three dimensional unsteady Computational Fluid Dynamic (CFD) results is presented. This environment requires a super computer as well as massively parallel processors (MPP's) and clusters of workstations acting as a single MPP (by concurrently working on the same task) provide the required computational bandwidth for CFD calculations of transient problems. The cluster of reduced instruction set computers (RISC) is a recent advent based on the low cost and high performance that workstation vendors provide. The cluster, with the proper software can act as a multiple instruction/multiple data (MIMD) machine. A new set of software tools is being designed specifically to address visualizing 3D unsteady CFD results in these environments. Three user's manuals for the parallel version of Visual3, pV3, revision 1.00 make up the bulk of this report

    Specialised global methods for binocular and trinocular stereo matching

    Get PDF
    The problem of estimating depth from two or more images is a fundamental problem in computer vision, which is commonly referred as to stereo matching. The applications of stereo matching range from 3D reconstruction to autonomous robot navigation. Stereo matching is particularly attractive for applications in real life because of its simplicity and low cost, especially compared to costly laser range finders/scanners, such as for the case of 3D reconstruction. However, stereo matching has its very unique problems like convergence issues in the optimisation methods, and challenges to find matches accurately due to changes in lighting conditions, occluded areas, noisy images, etc. It is precisely because of these challenges that stereo matching continues to be a very active field of research. In this thesis we develop a binocular stereo matching algorithm that works with rectified images (i.e. scan lines in two images are aligned) to find a real valued displacement (i.e. disparity) that best matches two pixels. To accomplish this our research has developed techniques to efficiently explore a 3D space, compare potential matches, and an inference algorithm to assign the optimal disparity to each pixel in the image. The proposed approach is also extended to the trinocular case. In particular, the trinocular extension deals with a binocular set of images captured at the same time and a third image displaced in time. This approach is referred as to t +1 trinocular stereo matching, and poses the challenge of recovering camera motion, which is addressed by a novel technique we call baseline recovery. We have extensively validated our binocular and trinocular algorithms using the well known KITTI and Middlebury data sets. The performance of our algorithms is consistent across different data sets, and its performance is among the top performers in the KITTI and Middlebury datasets. The time-stamped results of our algorithms as reported in this thesis can be found at: • LCU on Middlebury V2 (https://web.archive.org/web/20150106200339/http://vision.middlebury. edu/stereo/eval/). • LCU on Middlebury V3 (https://web.archive.org/web/20150510133811/http://vision.middlebury. edu/stereo/eval3/). • LPU on Middlebury V3 (https://web.archive.org/web/20161210064827/http://vision.middlebury. edu/stereo/eval3/). • LPU on KITTI 2012 (https://web.archive.org/web/20161106202908/http://cvlibs.net/datasets/ kitti/eval_stereo_flow.php?benchmark=stereo). • LPU on KITTI 2015 (https://web.archive.org/web/20161010184245/http://cvlibs.net/datasets/ kitti/eval_scene_flow.php?benchmark=stereo). • TBR on KITTI 2012 (https://web.archive.org/web/20161230052942/http://cvlibs.net/datasets/ kitti/eval_stereo_flow.php?benchmark=stereo)

    Energy-aware task scheduling on heterogeneous computing systems with time constraint

    Get PDF
    As a technique to help achieve high performance in parallel and distributed heterogeneous computing systems, task scheduling has attracted considerable interest. In this paper, we propose an effective Cuckoo Search algorithm based on Gaussian random walk and Adaptive discovery probability which combined with a cost-to-time ratio Modification strategy (GACSM), to address task scheduling on heterogeneous multiprocessor systems using Dynamic Voltage and Frequency Scaling (DVFS). First, to overcome the shortcomings of poor performance in exploitation of the cuckoo search algorithm, we use chaos variables to initialize populations to maintain the population diversity, a Gaussian random walk strategy to balance the exploration and exploitation capabilities of the algorithm, and an adaptive discovery probability strategy to improve population diversity. Then, we apply the improved Cuckoo Search (CS) algorithm to assign tasks to resources, and a widely used downward rank heuristic strategy to find the corresponding scheduling sequence. Finally, we apply a cost-to-time ratio improvement strategy to further improve the performance of the improved CS algorithm. Extensive experiments are conducted to evaluate the effectiveness and efficiency of our method. The results validate our approach and show its superiority in comparison with the state-of-the-art methods.Zexi Deng, Zihan Yan, Huimin Huang, Hong Shen ... et al

    Computer-aided analysis and design of the shape rolling process for producing turbine engine airfoils

    Get PDF
    Mild steel (AISI 1018) was selected as model cold rolling material and Ti-6A1-4V and Inconel 718 were selected as typical hot rolling and cold rolling alloys, respectively. The flow stress and workability of these alloys were characterized and friction factor at the roll/workpiece interface was determined at their respective working conditions by conducting ring tests. Computer-aided mathematical models for predicting metal flow and stresses, and for simulating the shape rolling process were developed. These models utilized the upper bound and the slab methods of analysis, and were capable of predicting the lateral spread, roll separating force, roll torque, and local stresses, strains and strain rates. This computer-aided design system was also capable of simulating the actual rolling process, and thereby designing the roll pass schedule in rolling of an airfoil or a similar shape

    Formal approaches to number in Slavic and beyond (Volume 5)

    Get PDF
    The goal of this collective monograph is to explore the relationship between the cognitive notion of number and various grammatical devices expressing this concept in natural language with a special focus on Slavic. The book aims at investigating different morphosyntactic and semantic categories including plurality and number-marking, individuation and countability, cumulativity, distributivity and collectivity, numerals, numeral modifiers and classifiers, as well as other quantifiers. It gathers 19 contributions tackling the main themes from different theoretical and methodological perspectives in order to contribute to our understanding of cross-linguistic patterns both in Slavic and non-Slavic languages
    corecore