78 research outputs found

    Efficient multicore implementation of NAS benchmarks with FastFlow

    Get PDF
    The thesis describes an efficient implementation of a subset of the NPB algorithms for the multicore architecture with the FastFlow framework. The NPB is a specification of numeric benchmarks to compare different environments and implementations. FastFlow is a framework, targeted to shared memory systems, to sustain parallel algorithms based on structured parallel programming. Starting from the NPB specification, the thesis selects a subset of the NPB algorithms and discerns an efficient implementation for both the sequential and parallel algorithms, through FastFlow. Finally, experiments on state of the art multicore architectures compare the derived code with the reference implementation, provided by the NPB authors

    Suporte de Computações Octave Independentes em Multiprocessadores de Memória Partilhada

    Get PDF
    O objetivo deste projeto é desenvolver uma biblioteca para o Octave capaz de paralelizar funções em sistemas de memória partilhada, neste caso, o problema principal que a biblioteca está a resolver são algoritmos de otimização de funções sem derivadas. O Octave é open source, ou seja, qualquer pessoa pode criar uma biblioteca para qualquer problema que tenha, por isso existem várias, nomeadamente bibliotecas que correm fun- ções Octave em paralelo, como a parallel package. No entanto, esta biblioteca foi desenvolvida principalmente para paralelizar usando diferentes máquinas em sistemas distribuídos resultando num sistema de paralelização de memória partilhada bastante simples onde apenas se cria os processos, corre-se a função uma vez e devolve-se os resultados. Portanto para este projeto criei uma biblioteca do zero. Difere da que já existe porque funciona mais como uma worker pool, isto é, cria o número requerido de workers que ficam inifinitamente à espera de receber novos jobs, jobs, no contexto deste projeto são um novo conjunto de valores de entrada que os workers recebem para correr a função. A biblioteca tem suporte para funções implementadas em Octave (funções interpretadas) e funções implementadas em C/C++ (funções compiladas) compiladas em bibliotecas dinâmicas com duas interfaces diferentes, mas usadas da mesma maneira. Para implementar a biblioteca usei octfiles que são pedaços de código C++ compilados com o API do Octave e podem ser usados como funções normais no Octave. Além disso, usei bibliotecas padrão do C e sistemas do Linux: os threads do C++11 e a system call fork do Linux para a paralelização. Os threads foram usados na interface para funções compiladas e os processos para a interface de funções interpretadas. Para verificar a eficácia e correção da biblioteca modifiquei um algoritmo de otimização BoostDMS para, no passo que avalia a função com os vários valores de entrada gerados, sequencialmente, usar a minha biblioteca e correr este passo concorrentemente e corri-o em máquinas com processadores de vários núcleos de computação onde obti resultados positivos. Os resultados mostram uma têndencia a diminuir o tempo de computação do algoritmo com o aumento de workers utilizados maximizando quando o número de workers é igual ou superior ao número valores gerados para analisar.The objective of this project is to develop a library for Octave capable of parallelizing functions on shared memory systems. In this case, the main problem that the library is addressing is optimization algorithms for functions without derivatives. Octave is open source, which means anyone can create a library for any problem they may have, which is why there are several, including libraries that run Octave functions in parallel, such as the ’parallel’ package. However, this library was mainly developed to parallelize using different machines in distributed systems, resulting in a very simple shared memory parallelization system where processes are created, the function is run once, and the results are returned. Therefore, for this project, I created a library from scratch. It differs from what already exists because it works more like a worker pool, that is, it creates the required number of workers that are infinitely waiting to receive new jobs. In the context of this project, jobs are a new set of input values that workers receive to run the function. The library supports functions implemented in Octave (interpreted functions) and functions implemented in C/C++ (compiled functions) compiled into dynamic libraries with two different interfaces, but used in the same way. To implement the library, I used octfiles, which are pieces of C++ code compiled with the Octave API and can be used as regular functions in Octave. In addition to this, I used standard C libraries and Linux systems: C++11 threads and the Linux fork system call, for parallelization. Threads were used in the interface for compiled functions, and processes were used in the interface for interpreted functions. To verify the effectiveness and correctness of the library, I modified an optimization algorithm named BoostDMS to use my library to run the step that evaluates the function with the various generated input values concurrently instead of sequentially. I ran this modified algorithm on machines with processors with multiple computing cores and obtained positive results. The results show a tendency to decrease the computation time of the algorithm with the increase of workers used, maximizing when the number of workers is equal to or greater than the number of generated values to analyze

    Software for Exascale Computing - SPPEXA 2016-2019

    Get PDF
    This open access book summarizes the research done and results obtained in the second funding phase of the Priority Program 1648 "Software for Exascale Computing" (SPPEXA) of the German Research Foundation (DFG) presented at the SPPEXA Symposium in Dresden during October 21-23, 2019. In that respect, it both represents a continuation of Vol. 113 in Springer’s series Lecture Notes in Computational Science and Engineering, the corresponding report of SPPEXA’s first funding phase, and provides an overview of SPPEXA’s contributions towards exascale computing in today's sumpercomputer technology. The individual chapters address one or more of the research directions (1) computational algorithms, (2) system software, (3) application software, (4) data management and exploration, (5) programming, and (6) software tools. The book has an interdisciplinary appeal: scholars from computational sub-fields in computer science, mathematics, physics, or engineering will find it of particular interest

    A data dependency recovery system for a heterogeneous multicore processor

    Get PDF
    Multicore processors often increase the performance of applications. However, with their deeper pipelining, they have proven increasingly difficult to improve. In an attempt to deliver enhanced performance at lower power requirements, semiconductor microprocessor manufacturers have progressively utilised chip-multicore processors. Existing research has utilised a very common technique known as thread-level speculation. This technique attempts to compute results before the actual result is known. However, thread-level speculation impacts operation latency, circuit timing, confounds data cache behaviour and code generation in the compiler. We describe an software framework codenamed Lyuba that handles low-level data hazards and automatically recovers the application from data hazards without programmer and speculation intervention for an asymmetric chip-multicore processor. The problem of determining correct execution of multiple threads when data hazards occur on conventional symmetrical chip-multicore processors is a significant and on-going challenge. However, there has been very little focus on the use of asymmetrical (heterogeneous) processors with applications that have complex data dependencies. The purpose of this thesis is to: (i) define the development of a software framework for an asymmetric (heterogeneous) chip-multicore processor; (ii) present an optimal software control of hardware for distributed processing and recovery from violations;(iii) provides performance results of five applications using three datasets. Applications with a small dataset showed an improvement of 17% and a larger dataset showed an improvement of 16% giving overall 11% improvement in performance

    Optimization of Pattern Matching Algorithms for Multi- and Many-Core Platforms

    Get PDF
    Image and video compression play a major role in the world today, allowing the storage and transmission of large multimedia content volumes. However, the processing of this information requires high computational resources, hence the improvement of the computational performance of these compression algorithms is very important. The Multidimensional Multiscale Parser (MMP) is a pattern-matching-based compression algorithm for multimedia contents, namely images, achieving high compression ratios, maintaining good image quality, Rodrigues et al. [2008]. However, in comparison with other existing algorithms, this algorithm takes some time to execute. Therefore, two parallel implementations for GPUs were proposed by Ribeiro [2016] and Silva [2015] in CUDA and OpenCL-GPU, respectively. In this dissertation, to complement the referred work, we propose two parallel versions that run the MMP algorithm in CPU: one resorting to OpenMP and another that converts the existing OpenCL-GPU into OpenCL-CPU. The proposed solutions are able to improve the computational performance of MMP by 3 and 2:7 , respectively. The High Efficiency Video Coding (HEVC/H.265) is the most recent standard for compression of image and video. Its impressive compression performance, makes it a target for many adaptations, particularly for holoscopic image/video processing (or light field). Some of the proposed modifications to encode this new multimedia content are based on geometry-based disparity compensations (SS), developed by Conti et al. [2014], and a Geometric Transformations (GT) module, proposed by Monteiro et al. [2015]. These compression algorithms for holoscopic images based on HEVC present an implementation of specific search for similar micro-images that is more efficient than the one performed by HEVC, but its implementation is considerably slower than HEVC. In order to enable better execution times, we choose to use the OpenCL API as the GPU enabling language in order to increase the module performance. With its most costly setting, we are able to reduce the GT module execution time from 6.9 days to less then 4 hours, effectively attaining a speedup of 45

    Concurrency Platforms for Real-Time and Cyber-Physical Systems

    Get PDF
    Parallel processing is an important way to satisfy the increasingly demanding computational needs of modern real-time and cyber-physical systems, but existing parallel computing technologies primarily emphasize high-throughput and average-case performance metrics, which are largely unsuitable for direct application to real-time, safety-critical contexts. This work contrasts two concurrency platforms designed to achieve predictable worst case parallel performance for soft real-time workloads with millisecond periods and higher. One of these is then the basis for the CyberMech platform, which enables parallel real-time computing for a novel yet representative application called Real-Time Hybrid Simulation (RTHS). RTHS combines demanding parallel real-time computation with real-time simulation and control in an earthquake engineering laboratory environment, and results concerning RTHS characterize a reasonably comprehensive survey of parallel real-time computing in the static context, where the size, shape, timing constraints, and computational requirements of workloads are fixed prior to system runtime. Collectively, these contributions constitute the first published implementations and evaluations of general-purpose concurrency platforms for real-time and cyber-physical systems, explore two fundamentally different design spaces for such systems, and successfully demonstrate the utility and tradeoffs of parallel computing for statically determined real-time and cyber-physical systems

    Development of a PC-Based Object-Oriented Real-Time Robotics Controller

    Get PDF
    The industrial world of robotics requires leading-edge controllers to match the speed of new manipulators. At the University of Waterloo, a three degree-of-freedom ultra high-speed cable-based robot was created called Deltabot. In order to improve the performance of the Deltabot, a new controller called the QNX Multi-Axis Robotic Controller (QMARC) was developed. QMARC is a PC-based controller built for the replacement of the existing commercial controller called PMAC, manufactured by Delta Tau Data Systems. Although the PMAC has its own real-time processor, the rigid and complex internal structure of the PMAC makes it difficult to apply advanced control algorithms and interpolation methods. Adding unconventional hardware to PMAC, such as a camera and vision system is also quite challenging. With the development of QMARC, the flexibility issue of the controller is resolved. QMARC?s open-sourced object-oriented software structure allows the addition of new control and interpolation techniques as required. In addition, the software structure of the main Controller process is decoupled for the hardware, so that any hardware change does not affect the main controller, just the hardware drivers. QMARC is also equipped with a user-friendly graphical user interface, and many safety protocols to make it a safe and easy-to-use system. Experimental tests has proven QMARC to be a safe and reliable controller. The stable software foundation created by the QMARC will allow for future development of the controller as research on the Deltabot progresses

    Embedded System Design

    Get PDF
    A unique feature of this open access textbook is to provide a comprehensive introduction to the fundamental knowledge in embedded systems, with applications in cyber-physical systems and the Internet of things. It starts with an introduction to the field and a survey of specification models and languages for embedded and cyber-physical systems. It provides a brief overview of hardware devices used for such systems and presents the essentials of system software for embedded systems, including real-time operating systems. The author also discusses evaluation and validation techniques for embedded systems and provides an overview of techniques for mapping applications to execution platforms, including multi-core platforms. Embedded systems have to operate under tight constraints and, hence, the book also contains a selected set of optimization techniques, including software optimization techniques. The book closes with a brief survey on testing. This fourth edition has been updated and revised to reflect new trends and technologies, such as the importance of cyber-physical systems (CPS) and the Internet of things (IoT), the evolution of single-core processors to multi-core processors, and the increased importance of energy efficiency and thermal issues
    • …
    corecore