164 research outputs found
Recommended from our members
A microprogrammed operating system kernel
The subject of the thesis is the design and implementation of an operating system kernel for the Cambridge Capability Computer (CAP). The kernel of an operating syst em is its most primitive level of facilities and forms the foundation stone a round which t he rest of the system is structured.
The particular emphasis of the CAP kernel is concerned with protection - the control of access to information. The kernel uses the notion of capabilities to provide a flexible and controlled mechanism for the sharing of information within a computer system. The protection mechanisms include provision for the efficient control of access to memory as well as facilities for handling abstract resources like files and virtual peripherals. The kernel allows the introduction of new types of resources in addition to the basic set of hardware resourcee to permit user extension of the system. Attention is given to the problem of recall of privilege or revocation in capability systems and the kernel includes operations for both permanent and temporary revocation of particular access rights to information in a selective manner.
In the past many of these functions have only been found in kernels implemented in user-level software which arc frequently cumbersome and inefficient. An examination is made of why this should be and·how efficiency and simplicity can be gained by a microprogrammed implementation. The thesis draws on the experience of a number of soft.ware kernels to discover the various design decisions that have to be made and the techniques that may be used to implement a successful kernel.
The feasibility of the design arrived at by considering these issues is demonstratec1 by describinq its implementation on the Cambridge Capability Computer in terms of the primitives provided and the internal organisation of the proposed kernel. In an evaluation, the kernel is examined in the light of the analysis of other kernels to point out its strength s and weaknesses and to gain insights into the utility of the deign as a practical operating system kernel.Digitisation of this thesis was sponsored by Arcadia Fund, a charitable fund of Lisbet Rausing and Peter Baldwin
HAL-ASOS accelerator model: evolutive elasticity by design
To address the integration of software threads and hardware accelerators into the Linux Operating System (OS) programming models, an accelerator architecture is proposed, based on micro-programmable hardware system calls, which fully export these resources into the Linux OS user-space through a design-specific virtual file system. The proposed HAL-ASOS accelerator model is split into a user-defined Hardware Task and a parameterizable Hardware Kernel with three differentiated transfer channels, aiming to explore distinct BUS technology interfaces and promote the accelerator to a first-class computing unit. This paper focuses on the Hardware Kernel and mainly its microcode control unit, which will leverage the elasticity to naturally evolve with Linux OS through key differentiating capabilities of field programmable gate arrays (FPGAs) when compared to the state of the art. To comply with the evolutive nature of Linux OS, or any Hardware Task incremental features, the proposed model generates page-faults signaling runtime errors that are handled at the kernel level as part of the virtual file system runtime. To evaluate the accelerator model’s programmability and its performance, a client-side application based on the AES 128-bit algorithm was implemented. Experiments demonstrate a flexible design approach in terms of hardware and software reconfiguration and significant performance increases consistent with rising processing demands or clock design frequencies.This work has been supported by FCT-Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020
The FEM-2 design method
The FEM-2 parallel computer is designed using methods differing from those ordinarily employed in parallel computer design. The major distinguishing aspects are: (1) a top-down rather than bottom-up design process; (2) the design considers the entire system structure in terms of layers of virtual machines; and (3) each layer of virtual machine is defined formally during the design process. The result is a complete hardware/software system design. The basic design method is discussed and the advantages of the method are considered. A status report on the FEM-2 design is included
Enabling preemptive multiprogramming on GPUs
GPUs are being increasingly adopted as compute accelerators in many domains, spanning environments from mobile systems to cloud computing. These systems are usually running multiple applications, from one or several users. However GPUs do not provide the support for resource sharing traditionally expected in these scenarios. Thus, such systems are unable to provide key multiprogrammed workload requirements, such as responsiveness, fairness or quality of service. In this paper, we propose a set of hardware extensions that allow GPUs to efficiently support multiprogrammed GPU workloads. We argue for preemptive multitasking and design two preemption mechanisms that can be used to implement GPU scheduling policies. We extend the architecture to allow concurrent execution of GPU kernels from different user processes and implement a scheduling policy that dynamically distributes the GPU cores among concurrently running kernels, according to their priorities. We extend the NVIDIA GK110 (Kepler) like GPU architecture with our proposals and evaluate them on a set of multiprogrammed workloads with up to eight concurrent processes. Our proposals improve execution time of high-priority processes by 15.6x, the average application turnaround time between 1.5x to 2x, and system fairness up to 3.4x.We would like to thank the anonymous reviewers, Alexan-
der Veidenbaum, Carlos Villavieja, Lluis Vilanova, Lluc Al-
varez, and Marc Jorda on their comments and help improving
our work and this paper. This work is supported by Euro-
pean Commission through TERAFLUX (FP7-249013), Mont-
Blanc (FP7-288777), and RoMoL (GA-321253) projects,
NVIDIA through the CUDA Center of Excellence program,
Spanish Government through Programa Severo Ochoa (SEV-2011-0067) and Spanish Ministry of Science and Technology
through TIN2007-60625 and TIN2012-34557 projects.Peer ReviewedPostprint (author’s final draft
Developing a support for FPGAs in the Controller parallel programming model
La computación heterogénea se presenta como la solución para conseguir supercomputadores cada vez
más rápidos capaces de resolver problemas más grandes y complejos en diferentes áreas de conocimiento.
Para ello, integra aceleradores con distintas arquitecturas capaces de explotar las características de los
problemas desde distintos enfoques obteniendo, de este modo, un mayor rendimiento.
Las FPGAs son hardware reconfigurable, i.e., es posible modificarlas después de su fabricación. Esto
permite una gran flexibilidad y una máxima adaptación al problema en cuestión. Además, tienen un
consumo energético muy bajo. Todas estas ventajas tienen el gran inconveniente de una más difícil programaci
ón mediante los propensos a errores HDLs (Hardware Description Language), tales como Verilog o
VHDL, y requisitos de conocimientos avanzados de electrónica digital. En los últimos años los principales
fabricantes de FPGAs han enfocado sus esfuerzos en desarrollar herramientas HLS (High Level Synthesis)
que permiten programarlas a través de lenguajes de programación de alto nivel estilo C. Esto ha favorecido
su adopción por la comunidad HPC y su integración en los nuevos supercomputadores. Sin embargo, el
programador aún tiene que ocuparse de aspectos como la gestión de colas de comandos, parámetros de
lanzamiento o transferencias de datos.
El modelo Controller es una librería que facilita la gestión de la coordinación, comunicación y los
detalles de lanzamiento de los kernels en aceleradores hardware. Explota de forma transparente sus modelos
de programación nativos, en concreto OpenCL y CUDA, y, por tanto, consigue un alto rendimiento
independientemente del compilador. Permite al programador utilizar los distintos recursos hardware
disponibles de forma combinada en entornos heterogéneos.
Este trabajo extiende el modelo Controller mediante el desarrollo de un backend que permite la
integración de FPGAs, manteniendo los cambios sobre la interfaz de usuario al mínimo. A través de los
resultados experimentales se comprueba que se consigue una disminución del esfuerzo de programación
significativa en comparación con la implementación nativa en OpenCL. Del mismo modo, se consigue
un elevado solapamiento entre computación y comunicación y un sobrecoste por el uso de la librería
despreciable.Heterogeneous computing appears to be the solution to achieve ever faster computers capable of solving
bigger and more complex problems in difierent fields of knowledge. To that end, it integrates accelerators
with difierent architectures capable of exploiting the features of problems from difierent perspectives thus
achieving higher performance.
FPGAs are reconfigurable hardware, i.e., it is possible to modify them after manufacture. This allows
great flexibility and maximum adaptability to the given problem. In addition, they have low power
consumption. All these advantages have the great objection of more dificult programming with the errorprone
HDLs (Hardware Description Language), such as Verilog or VHDL, and the requirement of advanced
knowledge of digital electronics. The main FPGA vendors have concentrated on developing HLS (High
Level Synthesis) tools that allow to program them with C-like high level programming languages. This
favoured their adoption by the HPC community and their integration in new supercomputers. However,
the programmer still has to take care of aspects such as management of command queues, launching
parameters or data transfers.
The Controller model is a library to easily manage the coordination, communication and kernel launching
details on hardware accelerators. It transparently exploits their native or vendor specific programming
models, namely OpenCL and CUDA, thus enabling the potential performance obtained by using them in
a compiler agnostic way. It is intended to enable the programmer to make use of the diferent available
hardware resources in combination in heterogeneous environments.
This work extends the Controller model through the development of a backend that allows the integration
of FPGAs, keeping the changes over the user-facing interface to the minimum. The experimental
results validate that a significant decrease in programming effort compared to the native OpenCL implementation
is achieved. Similarly, high overlap of computation and communication and a negligible
overhead due to the use of the library are attained.Grado en Ingeniería Informátic
Timing Architecture for ESS
Programa Oficial de Doutoramento en Investigación en Tecnoloxías da Información. 5023V01[Resumo]
O sistema de temporización é unha compoñente fundamental para o control e sincronización de
instalacións industriais e científicas, coma aceleradores de partículas. Nesta tese
traballamos na especificación e desenvolvemento do sistema de temporización para a European
Spallation Source (ESS), a maior fonte de neutróns actualmente en construción. Abordamos
este tra ballo a dous niveis: a especificación do sistema de temporización, e a imple mentación
física de sistemas de control empregando circuítos reconfigurables.
Con respecto á especificación do sistema de temporización, deseñamos e implementamos a
configuración do protocolo de temporización para cumprir cos requirimentos do ESS e ideamos un modo
de operación e unha aplicación para a configuración e control do sistema de temporización.
Tamén presentamos unha ferramenta e unha metodoloxía para imple mentar sistemas de
control empregando FPGAs, coma os nodos do sistema de temporización. ámbalas <lúas están baseadas
en statecharts, unha repre sentación gráfica de sistemas que expande o concepto de máquinas de
estados finitos, orientada a sistemas que necesitan ser reconfigurados rápidamente en múltiples
localizacións minimizando a posibilidade de erros. A ferramenta crea automaticamente código
VHDL sintetizable a partir do statechart do sistema. A metodoloxía explica o procedemento
para implementar o state chart como unha arquitectura microprogramada en FPGAs.[Resumen]
El sistema de temporización es un componente fundamental para el control y sincronización de
instalaciones industriales y científicas, como aceleradores e partículas. En esta tesis
trabajamos en la especificación y desarrollo el sistema de temporización para la European
Spallation Source (ESS), la mayor fuente de neutrones actualmente en construcción.
Abordamos este trabajo en dos niveles: la especificación del sistema de temporización, y la
mplementación física de sistemas de control empleando circuitos reconfig rables.
Con respecto a la especificación del sistema de temporización, diseñamos
e implementamos la configuración del protocolo de temporización para cumplir on los requisitos de
ESS e ideamos un modo de operación y una aplicación ara la configuración y control del sistema
de temporización.
También presentamos una herramienta y una metodología para imple entar sistemas de control
empleando FPGAs, como los nodos del sistema e temporización. Ambas están basadas en statecharts)
una representación gráfica de sistemas que expande el concepto de máquinas de estados
fini os, orientada a sistemas que necesitan ser reconfigurados rápidamente en últiples
localizaciones minimizando la posibilidad de errores. La herramienta crea
automáticamente código VHDL sintetizable a partir del statechart del sistema. La metodología
explica el procedimiento para implementar el statechart como una arquitectura microprogramada en FPGAs.[Abstract]
The timing system is a key component for the control and synchronization of industrial and
scientific facilities, such as particle accelerators. In this thesis we tackle the
specification and development of the timing system for the European Spallation Source (ESS), the
largest neutron source currently in construction. We approach this work at two levels:
the specification of the timing system and the physical implementation of control systems using
reconfigurable hardware.
Regarding the specification of the timing system, we designed and imple mented the configuration
of the timing protocol to fulfil the requirements of ESS and devised an operation mode andan
application for the configuration and control of the timing system.
We also present one too! and one methodology to implement control systems using FPGAs,
such as the nodes of the timing system. Both are based on statecharts, a graphical
representation of systems that expand the concepts of Finite State Machines, targeted at
systems that need to be re configured quickly in multiple locations minimizing the
chance of errors. The too! automatically creates synthesizable VHDL code from a statechart of
the system. The methodology explains the procedure to implement the statechart as a
microprogrammed architecture in FPGAs
The mission oriented terminal area simulation facility
The Mission Oriented Terminal Area Simulation (MOTAS) was developed to provide an ATC environment in which flight management and flight operations research studies can be conducted with a high degree of realism. This facility provides a flexible and comprehensive simulation of the airborne, ground-based and communication aspects of the airport terminal area environment. Major elements of the simulation are: an airport terminal area environment model, two air traffic controller stations, several aircraft models and simulator cockpits, four pseudo pilot stations, and a realistic air-ground communications network. MOTAS has been used for one study with the DC-9 simulator and a series of data link studies are planned in the near future
Survey of currently available high-resolution raster graphics systems
Presented are data obtained on high-resolution raster graphics engines currently available on the market. The data were obtained through survey responses received from various vendors and also from product literature. The questionnaire developed for this survey was basically a list of characteristics desired in a high performance color raster graphics system which could perform real-time aircraft simulations. Several vendors responded to the survey, with most reporting on their most advanced high-performance, high-resolution raster graphics engine
A Multi-Precision Bit-Serial Hardware Accelerator IP for Deep Learning Enabled Internet-of-Things
Deep Neural Networks (DNNs) computation-hungry algorithms demand hardware platforms capable of meeting rigid power and timing requirements.
We introduce the Serial-MAC-engine (SMAC-engine), a fully-digital hardware accelerator for inference of quantized DNNs suitable for integration in a heterogeneous System-on-Chip (SoC). The accelerator is completely embedded in the form of a Hardware Processing Engine (HWPE) in the PULPissimo platform, a RISCV-based programmable architecture that targets the computational requirements of IoT applications. The SMAC-engine supports configurable precision for both weights (8/6/4 bits) and activations (8/4 bits), with scalable performance. Results in 65 nm technology demonstrate that the serial-MAC approach enables the accelerator to achieve a maximum throughput of 14.28 GMAC/s, consuming 0.58 pJ/MAC @ 1.0 V when operating at a precision of 4 bits for weights and 8 bits for activations
- …