35 research outputs found

    Scalable Applications on Heterogeneous System Architectures: A Systematic Performance Analysis Framework

    Get PDF
    The efficient parallel execution of scientific applications is a key challenge in high-performance computing (HPC). With growing parallelism and heterogeneity of compute resources as well as increasingly complex software, performance analysis has become an indispensable tool in the development and optimization of parallel programs. This thesis presents a framework for systematic performance analysis of scalable, heterogeneous applications. Based on event traces, it automatically detects the critical path and inefficiencies that result in waiting or idle time, e.g. due to load imbalances between parallel execution streams. As a prerequisite for the analysis of heterogeneous programs, this thesis specifies inefficiency patterns for computation offloading. Furthermore, an essential contribution was made to the development of tool interfaces for OpenACC and OpenMP, which enable a portable data acquisition and a subsequent analysis for programs with offload directives. At present, these interfaces are already part of the latest OpenACC and OpenMP API specification. The aforementioned work, existing preliminary work, and established analysis methods are combined into a generic analysis process, which can be applied across programming models. Based on the detection of wait or idle states, which can propagate over several levels of parallelism, the analysis identifies wasted computing resources and their root cause as well as the critical-path share for each program region. Thus, it determines the influence of program regions on the load balancing between execution streams and the program runtime. The analysis results include a summary of the detected inefficiency patterns and a program trace, enhanced with information about wait states, their cause, and the critical path. In addition, a ranking, based on the amount of waiting time a program region caused on the critical path, highlights program regions that are relevant for program optimization. The scalability of the proposed performance analysis and its implementation is demonstrated using High-Performance Linpack (HPL), while the analysis results are validated with synthetic programs. A scientific application that uses MPI, OpenMP, and CUDA simultaneously is investigated in order to show the applicability of the analysis

    Development of a Night Vision Goggle Heads Up Display For Paratrooper Guidance

    Get PDF
    This thesis provides the proof of concept for the development and implementation of a Global Positioning System (GPS) display via Night Vision Goggles (NVG) Heads-Up Display (HUD) for paratroopers. The system has been designed for soldiers who will be able to utilize the technology in the form of a processing system worn in an ammo pouch and displayed via NVG HUD as a tunnel in the sky. The tunnel in the sky display design is essentially a series of boxes displayed within the goggle\u27s HUD leading the paratrooper to the desired Landing Zone (LZ). The algorithm developed receives GPS and inertial sensor data (both position and attitude), and displays the guidance information in the paratrooper\u27s NVG HUD as the tunnel in the sky. The primary goal of the project is to provide a product which allows military personnel to reach a desired LZ in obscured visibility conditions such as darkness, clouds, smoke, and other unforeseen situations. This allows missions to be carried out around the clock, even in adverse visibility conditions which would normally halt operations

    Reusable Reentry Satellite (RRS) system design study

    Get PDF
    The Reusable Reentry Satellite (RRS) is intended to provide investigators in several biological disciplines with a relatively inexpensive method to access space for up to 60 days with eventual recovery on Earth. The RRS will permit totally intact, relatively soft, recovery of the vehicle, system refurbishment, and reflight with new and varied payloads. The RRS is to be capable of three reflights per year over a 10-year program lifetime. The RRS vehicle will have a large and readily accessible volume near the vehicle center of gravity for the Payload Module (PM) containing the experiment hardware. The vehicle is configured to permit the experimenter late access to the PM prior to launch and rapid access following recovery. The RRS will operate in one of two modes: (1) as a free-flying spacecraft in orbit, and will be allowed to drift in attitude to provide an acceleration environment of less than 10(exp -5) g. the acceleration environment during orbital trim maneuvers will be less than 10(exp -3) g; and (2) as an artificial gravity system which spins at controlled rates to provide an artificial gravity of up to 1.5 Earth g. The RRS system will be designed to be rugged, easily maintained, and economically refurbishable for the next flight. Some systems may be designed to be replaced rather than refurbished, if cost effective and capable of meeting the specified turnaround time. The minimum time between recovery and reflight will be approximately 60 days. The PMs will be designed to be relatively autonomous, with experiments that require few commands and limited telemetry. Mass data storage will be accommodated in the PM. The hardware development and implementation phase is currently expected to start in 1991 with a first launch in late 1993

    System-level power management using online machine learning for prediction and adaptation

    No full text
    Nowadays embedded devices have the need to be portable, battery powered and high performance. This need for high performance makes power management a matter of critical priority. Power management algorithms exist, but most of the approaches focus on an energy-performance trade-off oblivious to the applications running on the system. Others are application-specific and their solution cannot be applied to other applications.This work proposes Shepherd, a cross-layer runtime management system for reduction of energy consumption whilst offering soft real-time performance. It is cross-layer because it takes the performance requirements from the application, and learns to adjust the power management knobs to provide the expected performance at the minimum cost of energy. Shepherd is implemented as a Linux governor running at OS level, this layer offers a low-overhead interface to change the CPU voltage and frequency dynamically.As opposed to the reactive behaviour of Linux Governors, Shepherd adapts to the application-specific performance requirements dynamically, and proactively selects the power state that fulfils these requirements while consuming the least power. Proactiveness is achieved by using AEWMA for adapting to the upcoming workload. These adaptations are facilitated using a model-free reinforcement learning algorithm, that once it learns the optimal decisions it starts exploiting them. This work enables Shepherd to work with different applications. A programming framework was designed to allow programmers to develop their applications to be power-aware, by enabling them to send their performance requirements and annotations to Shepherd and provide the cross-layer soft real-time performance desired.Shepherd is implemented within the Linux Kernel 3.7.10, interfacing with the application and hardware to select an appropriate voltage-frequency control for the executing application. The performance of Shepherd is demonstrated on an ARM Cortex-A8 processor. Experiments conducted with multimedia applications demonstrate that Shepherd minimises energy consumption by up to 30% against existing Governors. Also, the framework has been used to adapt example applications to work with Shepherd, achieving 60% energy savings compared to the existing approaches

    Run-time management for future MPSoC platforms

    Get PDF
    In recent years, we are witnessing the dawning of the Multi-Processor Systemon- Chip (MPSoC) era. In essence, this era is triggered by the need to handle more complex applications, while reducing overall cost of embedded (handheld) devices. This cost will mainly be determined by the cost of the hardware platform and the cost of designing applications for that platform. The cost of a hardware platform will partly depend on its production volume. In turn, this means that ??exible, (easily) programmable multi-purpose platforms will exhibit a lower cost. A multi-purpose platform not only requires ??exibility, but should also combine a high performance with a low power consumption. To this end, MPSoC devices integrate computer architectural properties of various computing domains. Just like large-scale parallel and distributed systems, they contain multiple heterogeneous processing elements interconnected by a scalable, network-like structure. This helps in achieving scalable high performance. As in most mobile or portable embedded systems, there is a need for low-power operation and real-time behavior. The cost of designing applications is equally important. Indeed, the actual value of future MPSoC devices is not contained within the embedded multiprocessor IC, but in their capability to provide the user of the device with an amount of services or experiences. So from an application viewpoint, MPSoCs are designed to ef??ciently process multimedia content in applications like video players, video conferencing, 3D gaming, augmented reality, etc. Such applications typically require a lot of processing power and a signi??cant amount of memory. To keep up with ever evolving user needs and with new application standards appearing at a fast pace, MPSoC platforms need to be be easily programmable. Application scalability, i.e. the ability to use just enough platform resources according to the user requirements and with respect to the device capabilities is also an important factor. Hence scalability, ??exibility, real-time behavior, a high performance, a low power consumption and, ??nally, programmability are key components in realizing the success of MPSoC platforms. The run-time manager is logically located between the application layer en the platform layer. It has a crucial role in realizing these MPSoC requirements. As it abstracts the platform hardware, it improves platform programmability. By deciding on resource assignment at run-time and based on the performance requirements of the user, the needs of the application and the capabilities of the platform, it contributes to ??exibility, scalability and to low power operation. As it has an arbiter function between different applications, it enables real-time behavior. This thesis details the key components of such an MPSoC run-time manager and provides a proof-of-concept implementation. These key components include application quality management algorithms linked to MPSoC resource management mechanisms and policies, adapted to the provided MPSoC platform services. First, we describe the role, the responsibilities and the boundary conditions of an MPSoC run-time manager in a generic way. This includes a de??nition of the multiprocessor run-time management design space, a description of the run-time manager design trade-offs and a brief discussion on how these trade-offs affect the key MPSoC requirements. This design space de??nition and the trade-offs are illustrated based on ongoing research and on existing commercial and academic multiprocessor run-time management solutions. Consequently, we introduce a fast and ef??cient resource allocation heuristic that considers FPGA fabric properties such as fragmentation. In addition, this thesis introduces a novel task assignment algorithm for handling soft IP cores denoted as hierarchical con??guration. Hierarchical con??guration managed by the run-time manager enables easier application design and increases the run-time spatial mapping freedom. In turn, this improves the performance of the resource assignment algorithm. Furthermore, we introduce run-time task migration components. We detail a new run-time task migration policy closely coupled to the run-time resource assignment algorithm. In addition to detailing a design-environment supported mechanism that enables moving tasks between an ISP and ??ne-grained recon??gurable hardware, we also propose two novel task migration mechanisms tailored to the Network-on-Chip environment. Finally, we propose a novel mechanism for task migration initiation, based on reusing debug registers in modern embedded microprocessors. We propose a reactive on-chip communication management mechanism. We show that by exploiting an injection rate control mechanism it is possible to provide a communication management system capable of providing a soft (reactive) QoS in a NoC. We introduce a novel, platform independent run-time algorithm to perform quality management, i.e. to select an application quality operating point at run-time based on the user requirements and the available platform resources, as reported by the resource manager. This contribution also proposes a novel way to manage the interaction between the quality manager and the resource manager. In order to have a the realistic, reproducible and ??exible run-time manager testbench with respect to applications with multiple quality levels and implementation tradev offs, we have created an input data generation tool denoted Pareto Surfaces For Free (PSFF). The the PSFF tool is, to the best of our knowledge, the ??rst tool that generates multiple realistic application operating points either based on pro??ling information of a real-life application or based on a designer-controlled random generator. Finally, we provide a proof-of-concept demonstrator that combines these concepts and shows how these mechanisms and policies can operate for real-life situations. In addition, we show that the proposed solutions can be integrated into existing platform operating systems

    System-level modeling and analysis of multimedia-soc platforms

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    EKKO: an open-source RISC-V soft-core microcontroller

    Get PDF
    Dissertação de mestrado em Engenharia Eletrónica Industrial e Computadores (especialização em Sistemas Embebidos e Computadores)Com o surgimento da Internet das Coisas (IoT em inglês) nos últimos anos, o número de “coisas” conectadas está a crescer a um ritmo bastante rápido. Estes dispositivos tornaram-se rapidamente parte do nosso dia a dia e já podem ser encontrados nos mais diversos domínios de aplicação, tais como, telecomunicações, saúde, agricultura, e automação industrial. Devido a este crescimento exponencial, a demanda por sistemas embebidos é cada vez maior, trazendo assim diversos desafios no seu desenvolvimento. De todos os desafios, o time-to-market e os custos de desenvolvimento são de inegável importância, logo, a escolha de uma plataforma de desenvolvimento adequada é essencial no desenho destes sistemas. Devido a este novo paradigma, o grupo de investigação da Universidade do Minho onde esta dissertação se insere tem desenvolvido aplicações neste domínio. No entanto, as atuais plataformas de desenvolvimento utilizadas são complexas, têm custos associados e são de código fechado. Por estas razões, o grupo de investigação tem interesse em ter a sua própria plataforma de desenvolvimento. De modo a solucionar os problemas enumerados acima, esta dissertação tem como objetivo desenvolver uma plataforma de desenvolvimento tanto para hardware como para software. A plataforma deve ser simples de utilizar e open-source, reduzindo assim os custos e a tornando a gestão de licenças mais simples. Para além disto, o facto de o sistema ser de código aberto faz também com que este possa ser facilmente estendido e customizado de acordo com os requisitos da aplicação. Neste sentido, esta dissertação apresenta um soft-core microcontroller, o qual contem um processador RISC-V, uma RAM, uma unidade de depuração, um temporizador, um periférico I2C e um barra mento AXI. Em adição, este contem também um kit de desenvolvimento de software (SDK em inglês), o qual inclui um depurador, a opção de utilizar o sistema operativo Azure RTOS ThreadX, e outras ferramentas importantes, tornando o ciclo de desenvolvimento mais fácil, rápido e seguro.With the advent of the Internet of Things (IoT) in most recent years, the number of connected “things” is increasing quickly. These devices rapidly became part of our daily lives and can be found in the most different applications domains, such as telecommunications, health care, agriculture and industrial automation. With this exponential growth, the demand for embedded devices is increasing, bringing several challenges to the development of these systems. From these challenges, the time-to-market and development costs are undeniable extremely important. Thus, choosing a suitable development platform is essential when designing an embedded system. Due to this new paradigm, the University of Minho research group where this dissertation fits has been developing applications in this domain. However, the current development platforms are complex, have associated costs and are closed-source. For these reasons, the research group has interesting in having its development platform. To solve these problems, this dissertation aims to build a development platform for both hardware and software. The platform must be simple and open-source, reducing development costs and simplifying license management. Besides, due to its open nature, it will also be easier to extend and modify the system according to the application’s needs. In this context, this dissertation presents EKKO, an open-source soft-core microcontroller that contains a RISC-V core, an on-chip RAM, a debug unit, a timer and an I2C peripheral, and an AXI bus. In addition, it also contains a Software Development Kit (SDK), which includes a debugger, the option to use Azure RTOS ThreadX, and other crucial tools, turning the development cycle more accessible, faster and safer
    corecore