164 research outputs found

    HAL-ASOS accelerator model: evolutive elasticity by design

    Get PDF
    To address the integration of software threads and hardware accelerators into the Linux Operating System (OS) programming models, an accelerator architecture is proposed, based on micro-programmable hardware system calls, which fully export these resources into the Linux OS user-space through a design-specific virtual file system. The proposed HAL-ASOS accelerator model is split into a user-defined Hardware Task and a parameterizable Hardware Kernel with three differentiated transfer channels, aiming to explore distinct BUS technology interfaces and promote the accelerator to a first-class computing unit. This paper focuses on the Hardware Kernel and mainly its microcode control unit, which will leverage the elasticity to naturally evolve with Linux OS through key differentiating capabilities of field programmable gate arrays (FPGAs) when compared to the state of the art. To comply with the evolutive nature of Linux OS, or any Hardware Task incremental features, the proposed model generates page-faults signaling runtime errors that are handled at the kernel level as part of the virtual file system runtime. To evaluate the accelerator model’s programmability and its performance, a client-side application based on the AES 128-bit algorithm was implemented. Experiments demonstrate a flexible design approach in terms of hardware and software reconfiguration and significant performance increases consistent with rising processing demands or clock design frequencies.This work has been supported by FCT-Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020

    The FEM-2 design method

    Get PDF
    The FEM-2 parallel computer is designed using methods differing from those ordinarily employed in parallel computer design. The major distinguishing aspects are: (1) a top-down rather than bottom-up design process; (2) the design considers the entire system structure in terms of layers of virtual machines; and (3) each layer of virtual machine is defined formally during the design process. The result is a complete hardware/software system design. The basic design method is discussed and the advantages of the method are considered. A status report on the FEM-2 design is included

    Enabling preemptive multiprogramming on GPUs

    Get PDF
    GPUs are being increasingly adopted as compute accelerators in many domains, spanning environments from mobile systems to cloud computing. These systems are usually running multiple applications, from one or several users. However GPUs do not provide the support for resource sharing traditionally expected in these scenarios. Thus, such systems are unable to provide key multiprogrammed workload requirements, such as responsiveness, fairness or quality of service. In this paper, we propose a set of hardware extensions that allow GPUs to efficiently support multiprogrammed GPU workloads. We argue for preemptive multitasking and design two preemption mechanisms that can be used to implement GPU scheduling policies. We extend the architecture to allow concurrent execution of GPU kernels from different user processes and implement a scheduling policy that dynamically distributes the GPU cores among concurrently running kernels, according to their priorities. We extend the NVIDIA GK110 (Kepler) like GPU architecture with our proposals and evaluate them on a set of multiprogrammed workloads with up to eight concurrent processes. Our proposals improve execution time of high-priority processes by 15.6x, the average application turnaround time between 1.5x to 2x, and system fairness up to 3.4x.We would like to thank the anonymous reviewers, Alexan- der Veidenbaum, Carlos Villavieja, Lluis Vilanova, Lluc Al- varez, and Marc Jorda on their comments and help improving our work and this paper. This work is supported by Euro- pean Commission through TERAFLUX (FP7-249013), Mont- Blanc (FP7-288777), and RoMoL (GA-321253) projects, NVIDIA through the CUDA Center of Excellence program, Spanish Government through Programa Severo Ochoa (SEV-2011-0067) and Spanish Ministry of Science and Technology through TIN2007-60625 and TIN2012-34557 projects.Peer ReviewedPostprint (author’s final draft

    Developing a support for FPGAs in the Controller parallel programming model

    Get PDF
    La computación heterogénea se presenta como la solución para conseguir supercomputadores cada vez más rápidos capaces de resolver problemas más grandes y complejos en diferentes áreas de conocimiento. Para ello, integra aceleradores con distintas arquitecturas capaces de explotar las características de los problemas desde distintos enfoques obteniendo, de este modo, un mayor rendimiento. Las FPGAs son hardware reconfigurable, i.e., es posible modificarlas después de su fabricación. Esto permite una gran flexibilidad y una máxima adaptación al problema en cuestión. Además, tienen un consumo energético muy bajo. Todas estas ventajas tienen el gran inconveniente de una más difícil programaci ón mediante los propensos a errores HDLs (Hardware Description Language), tales como Verilog o VHDL, y requisitos de conocimientos avanzados de electrónica digital. En los últimos años los principales fabricantes de FPGAs han enfocado sus esfuerzos en desarrollar herramientas HLS (High Level Synthesis) que permiten programarlas a través de lenguajes de programación de alto nivel estilo C. Esto ha favorecido su adopción por la comunidad HPC y su integración en los nuevos supercomputadores. Sin embargo, el programador aún tiene que ocuparse de aspectos como la gestión de colas de comandos, parámetros de lanzamiento o transferencias de datos. El modelo Controller es una librería que facilita la gestión de la coordinación, comunicación y los detalles de lanzamiento de los kernels en aceleradores hardware. Explota de forma transparente sus modelos de programación nativos, en concreto OpenCL y CUDA, y, por tanto, consigue un alto rendimiento independientemente del compilador. Permite al programador utilizar los distintos recursos hardware disponibles de forma combinada en entornos heterogéneos. Este trabajo extiende el modelo Controller mediante el desarrollo de un backend que permite la integración de FPGAs, manteniendo los cambios sobre la interfaz de usuario al mínimo. A través de los resultados experimentales se comprueba que se consigue una disminución del esfuerzo de programación significativa en comparación con la implementación nativa en OpenCL. Del mismo modo, se consigue un elevado solapamiento entre computación y comunicación y un sobrecoste por el uso de la librería despreciable.Heterogeneous computing appears to be the solution to achieve ever faster computers capable of solving bigger and more complex problems in difierent fields of knowledge. To that end, it integrates accelerators with difierent architectures capable of exploiting the features of problems from difierent perspectives thus achieving higher performance. FPGAs are reconfigurable hardware, i.e., it is possible to modify them after manufacture. This allows great flexibility and maximum adaptability to the given problem. In addition, they have low power consumption. All these advantages have the great objection of more dificult programming with the errorprone HDLs (Hardware Description Language), such as Verilog or VHDL, and the requirement of advanced knowledge of digital electronics. The main FPGA vendors have concentrated on developing HLS (High Level Synthesis) tools that allow to program them with C-like high level programming languages. This favoured their adoption by the HPC community and their integration in new supercomputers. However, the programmer still has to take care of aspects such as management of command queues, launching parameters or data transfers. The Controller model is a library to easily manage the coordination, communication and kernel launching details on hardware accelerators. It transparently exploits their native or vendor specific programming models, namely OpenCL and CUDA, thus enabling the potential performance obtained by using them in a compiler agnostic way. It is intended to enable the programmer to make use of the diferent available hardware resources in combination in heterogeneous environments. This work extends the Controller model through the development of a backend that allows the integration of FPGAs, keeping the changes over the user-facing interface to the minimum. The experimental results validate that a significant decrease in programming effort compared to the native OpenCL implementation is achieved. Similarly, high overlap of computation and communication and a negligible overhead due to the use of the library are attained.Grado en Ingeniería Informátic

    Timing Architecture for ESS

    Get PDF
    Programa Oficial de Doutoramento en Investigación en Tecnoloxías da Información. 5023V01[Resumo] O sistema de temporización é unha compoñente fundamental para o control e sincronización de instalacións industriais e científicas, coma aceleradores de partículas. Nesta tese traballamos na especificación e desenvolvemento do sistema de temporización para a European Spallation Source (ESS), a maior fonte de neutróns actualmente en construción. Abordamos este tra­ ballo a dous niveis: a especificación do sistema de temporización, e a imple­ mentación física de sistemas de control empregando circuítos reconfigurables. Con respecto á especificación do sistema de temporización, deseñamos e implementamos a configuración do protocolo de temporización para cumprir cos requirimentos do ESS e ideamos un modo de operación e unha aplicación para a configuración e control do sistema de temporización. Tamén presentamos unha ferramenta e unha metodoloxía para imple­ mentar sistemas de control empregando FPGAs, coma os nodos do sistema de temporización. ámbalas <lúas están baseadas en statecharts, unha repre­ sentación gráfica de sistemas que expande o concepto de máquinas de estados finitos, orientada a sistemas que necesitan ser reconfigurados rápidamente en múltiples localizacións minimizando a posibilidade de erros. A ferramenta crea automaticamente código VHDL sintetizable a partir do statechart do sistema. A metodoloxía explica o procedemento para implementar o state­ chart como unha arquitectura microprogramada en FPGAs.[Resumen] El sistema de temporización es un componente fundamental para el control y sincronización de instalaciones industriales y científicas, como aceleradores e partículas. En esta tesis trabajamos en la especificación y desarrollo el sistema de temporización para la European Spallation Source (ESS), la mayor fuente de neutrones actualmente en construcción. Abordamos este trabajo en dos niveles: la especificación del sistema de temporización, y la mplementación física de sistemas de control empleando circuitos reconfig­ rables. Con respecto a la especificación del sistema de temporización, diseñamos e implementamos la configuración del protocolo de temporización para cumplir on los requisitos de ESS e ideamos un modo de operación y una aplicación ara la configuración y control del sistema de temporización. También presentamos una herramienta y una metodología para imple­ entar sistemas de control empleando FPGAs, como los nodos del sistema e temporización. Ambas están basadas en statecharts) una representación gráfica de sistemas que expande el concepto de máquinas de estados fini­ os, orientada a sistemas que necesitan ser reconfigurados rápidamente en últiples localizaciones minimizando la posibilidad de errores. La her­ramienta crea automáticamente código VHDL sintetizable a partir del state­chart del sistema. La metodología explica el procedimiento para implemen­tar el statechart como una arquitectura microprogramada en FPGAs.[Abstract] The timing system is a key component for the control and synchronization of industrial and scientific facilities, such as particle accelerators. In this thesis we tackle the specification and development of the timing system for the European Spallation Source (ESS), the largest neutron source currently in construction. We approach this work at two levels: the specification of the timing system and the physical implementation of control systems using reconfigurable hardware. Regarding the specification of the timing system, we designed and imple­ mented the configuration of the timing protocol to fulfil the requirements of ESS and devised an operation mode andan application for the configuration and control of the timing system. We also present one too! and one methodology to implement control systems using FPGAs, such as the nodes of the timing system. Both are based on statecharts, a graphical representation of systems that expand the concepts of Finite State Machines, targeted at systems that need to be re­ configured quickly in multiple locations minimizing the chance of errors. The too! automatically creates synthesizable VHDL code from a statechart of the system. The methodology explains the procedure to implement the statechart as a microprogrammed architecture in FPGAs

    The mission oriented terminal area simulation facility

    Get PDF
    The Mission Oriented Terminal Area Simulation (MOTAS) was developed to provide an ATC environment in which flight management and flight operations research studies can be conducted with a high degree of realism. This facility provides a flexible and comprehensive simulation of the airborne, ground-based and communication aspects of the airport terminal area environment. Major elements of the simulation are: an airport terminal area environment model, two air traffic controller stations, several aircraft models and simulator cockpits, four pseudo pilot stations, and a realistic air-ground communications network. MOTAS has been used for one study with the DC-9 simulator and a series of data link studies are planned in the near future

    Survey of currently available high-resolution raster graphics systems

    Get PDF
    Presented are data obtained on high-resolution raster graphics engines currently available on the market. The data were obtained through survey responses received from various vendors and also from product literature. The questionnaire developed for this survey was basically a list of characteristics desired in a high performance color raster graphics system which could perform real-time aircraft simulations. Several vendors responded to the survey, with most reporting on their most advanced high-performance, high-resolution raster graphics engine

    A Multi-Precision Bit-Serial Hardware Accelerator IP for Deep Learning Enabled Internet-of-Things

    Get PDF
    Deep Neural Networks (DNNs) computation-hungry algorithms demand hardware platforms capable of meeting rigid power and timing requirements. We introduce the Serial-MAC-engine (SMAC-engine), a fully-digital hardware accelerator for inference of quantized DNNs suitable for integration in a heterogeneous System-on-Chip (SoC). The accelerator is completely embedded in the form of a Hardware Processing Engine (HWPE) in the PULPissimo platform, a RISCV-based programmable architecture that targets the computational requirements of IoT applications. The SMAC-engine supports configurable precision for both weights (8/6/4 bits) and activations (8/4 bits), with scalable performance. Results in 65 nm technology demonstrate that the serial-MAC approach enables the accelerator to achieve a maximum throughput of 14.28 GMAC/s, consuming 0.58 pJ/MAC @ 1.0 V when operating at a precision of 4 bits for weights and 8 bits for activations

    The Second Hungarian Workshop on Image Analysis : Budapest, June 7-9, 1988.

    Get PDF
    corecore