246 research outputs found

    Just In Time Assembly (JITA) - A Run Time Interpretation Approach for Achieving Productivity of Creating Custom Accelerators in FPGAs

    Get PDF
    The reconfigurable computing community has yet to be successful in allowing programmers to access FPGAs through traditional software development flows. Existing barriers that prevent programmers from using FPGAs include: 1) knowledge of hardware programming models, 2) the need to work within the vendor specific CAD tools and hardware synthesis. This thesis presents a series of published papers that explore different aspects of a new approach being developed to remove the barriers and enable programmers to compile accelerators on next generation reconfigurable manycore architectures. The approach is entitled Just In Time Assembly (JITA) of hardware accelerators. The approach has been defined to allow hardware accelerators to be built and run through software compilation and run time interpretation outside of CAD tools and without requiring each new accelerator to be synthesized. The approach advocates the use of libraries of pre-synthesized components that can be referenced through symbolic links in a similar fashion to dynamically linked software libraries. Synthesis still must occur but is moved out of the application programmers software flow and into the initial coding process that occurs when programming patterns that define a Domain Specific Language (DSL) are first coded. Programmers see no difference between creating software or hardware functionality when using the DSL. A new run time interpreter is introduced to assemble the individual pre-synthesized hardware accelerators that comprise the accelerator functionality within a configurable tile array of partially reconfigurable slots at run time. Quantitative results are presented that compares utilization, performance, and productivity of the approach to what would be achieved by full custom accelerators created through traditional CAD flows using hardware programming models and passing through synthesis

    Achieving a better balance between productivity and performance on FPGAs through Heterogeneous Extensible Multiprocessor Systems

    Get PDF
    Field Programmable Gate Arrays (FPGAs) were first introduced circa 1980, and they held the promise of delivering performance levels associated with customized circuits, but with productivity levels more closely associated with software development. Achieving both performance and productivity objectives has been a long standing challenge problem for the reconfigurable computing community and remains unsolved today. On one hand, Vendor supplied design flows have tended towards achieving the high levels of performance through gate level customization, but at the cost of very low productivity. On the other hand, FPGA densities are following Moore\u27s law and and can now support complete multiprocessor system architectures. Thus FPGAs can be turned into an architecture with programmable processors which brings productivity but sacrifices the peak performance advantages of custom circuits. In this thesis we explore how the two use cases can be combined to achieve the best from both. The flexibility of the FPGAs to host a heterogeneous multiprocessor system with different types of programmable processors and custom accelerators allows the software developers to design a platform that matches the unique performance needs of their application. However, currently no automated approaches are publicly available to create such heterogeneous architectures as well as the software support for these platforms. Creating base architectures, configuring multiple tool chains, and repetitive engineering design efforts can and should be automated. This thesis introduces Heterogeneous Extensible Multiprocessor System (HEMPS) template approach which allows an FPGA to be programmed with productivity levels close to those associated with parallel processing, and with performance levels close to those associated with customized circuits. The work in this thesis introduces an ArchGen script to automate the generation of HEMPS systems as well as a library of portable and self tuning polymorphic functions. These tools will abstract away the HW/SW co-design details and provide a transparent programming language to capture different levels of parallelisms, without sacrificing productivity or portability

    Pervasive handheld computing systems

    Get PDF
    The technological role of handheld devices is fundamentally changing. Portable computers were traditionally application specific. They were designed and optimised to deliver a specific task. However, it is now commonly acknowledged that future handheld devices need to be multi-functional and need to be capable of executing a range of high-performance applications. This thesis has coined the term pervasive handheld computing systems to refer to this type of mobile device. Portable computers are faced with a number of constraints in trying to meet these objectives. They are physically constrained by their size, their computational power, their memory resources, their power usage, and their networking ability. These constraints challenge pervasive handheld computing systems in achieving their multi-functional and high-performance requirements. This thesis proposes a two-pronged methodology to enable pervasive handheld computing systems meet their future objectives. The methodology is a fusion of two independent and yet complementary concepts. The first step utilises reconfigurable technology to enhance the physical hardware resources within the environment of a handheld device. This approach recognises that reconfigurable computing has the potential to dynamically increase the system functionality and versatility of a handheld device without major loss in performance. The second step of the methodology incorporates agent-based middleware protocols to support handheld devices to effectively manage and utilise these reconfigurable hardware resources within their environment. The thesis asserts the combined characteristics of reconfigurable computing and agent technology can meet the objectives of pervasive handheld computing systems

    Deploying RIOT operating system on a reconfigurable Internet of Things end-device

    Get PDF
    Dissertação de mestrado integrado em Engenharia Eletrónica Industrial e ComputadoresThe Internet of Everything (IoE) is enabling the connection of an infinity of physical objects to the Internet, and has the potential to connect every single existing object in the world. This empowers a market with endless opportunities where the big players are forecasting, by 2020, more than 50 billion connected devices, representing an 8 trillion USD market. The IoE is a broad concept that comprises several technological areas and will certainly, include more in the future. Some of those already existing fields are the Internet of Energy related with the connectivity of electrical power grids, Internet of Medical Things (IoMT), for instance, enables patient monitoring, Internet of Industrial Things (IoIT), which is dedicated to industrial plants, and the Internet of Things (IoT) that focus on the connection of everyday objects (e.g. home appliances, wearables, transports, buildings, etc.) to the Internet. The diversity of scenarios where IoT can be deployed, and consequently the different constraints associated to each device, leads to a heterogeneous network composed by several communication technologies and protocols co-existing on the same physical space. Therefore, the key requirements of an IoT network are the connectivity and the interoperability between devices. Such requirement is achieved by the adoption of standard protocols and a well-defined lightweight network stack. Due to the adoption of a standard network stack, the data processed and transmitted between devices tends to increase. Because most of the devices connected are resource constrained, i.e., low memory, low processing capabilities, available energy, the communication can severally decrease the device’s performance. Hereupon, to tackle such issues without sacrificing other important requirements, this dissertation aims to deploy an operating system (OS) for IoT, the RIOT-OS, while providing a study on how network-related tasks can benefit from hardware accelerators (deployed on reconfigurable technology), specially designed to process and filter packets received by an IoT device.O conceito Internet of Everything (IoE) permite a conexão de uma infinidade de objetos à Internet e tem o potencial de conectar todos os objetos existentes no mundo. Favorecendo assim o aparecimento de novos mercados e infinitas possibilidades, em que os grandes intervenientes destes mercados preveem até 2020 a conexão de mais de 50 mil milhões de dispositivos, representando um mercado de 8 mil milhões de dólares. IoE é um amplo conceito que inclui várias áreas tecnológicas e irá certamente incluir mais no futuro. Algumas das áreas já existentes são: a Internet of Energy relacionada com a conexão de redes de transporte e distribuição de energia à Internet; Internet of Medical Things (IoMT), que possibilita a monotorização de pacientes; Internet of Industrial Things (IoIT), dedicada a instalações industriais e a Internet of Things (IoT), que foca na conexão de objetos do dia-a-dia (e.g. eletrodomésticos, wearables, transportes, edifícios, etc.) à Internet. A diversidade de cenários à qual IoT pode ser aplicado, e consequentemente, as diferentes restrições aplicadas a cada dispositivo, levam à criação de uma rede heterogénea composto por diversas tecnologias de comunicação e protocolos a coexistir no mesmo espaço físico. Desta forma, os requisitos chave aplicados às redes IoT são a conectividade e interoperabilidade entre dispositivos. Estes requisitos são atingidos com a adoção de protocolos standard e pilhas de comunicação bem definidas. Com a adoção de pilhas de comunicação standard, a informação processada e transmitida entre dispostos tende a aumentar. Visto que a maioria dos dispositivos conectados possuem escaços recursos, i.e., memória reduzida, baixa capacidade de processamento, pouca energia disponível, o aumento da capacidade de comunicação pode degradar o desempenho destes dispositivos. Posto isto, para lidar com estes problemas e sem sacrificar outros requisitos importantes, esta dissertação pretende fazer o porting de um sistema operativo IoT, o RIOT, para uma solução reconfigurável, o CUTE mote. O principal objetivo consiste na realização de um estudo sobre os benefícios que as tarefas relacionadas com as camadas de rede podem ter ao serem executadas em hardware via aceleradores dedicados. Estes aceleradores são especialmente projetados para processar e filtrar pacotes de dados provenientes de uma interface radio em redes IoT periféricas

    Integrating Reconfigurable Hardware-Based Grid for High Performance Computing

    Get PDF
    FPGAs have shown several characteristics that make them very attractive for high performance computing (HPC). The impressive speed-up factors that they are able to achieve, the reduced power consumption, and the easiness and flexibility of the design process with fast iterations between consecutive versions are examples of benefits obtained with their use. However, there are still some difficulties when using reconfigurable platforms as accelerator that need to be addressed: the need of an in-depth application study to identify potential acceleration, the lack of tools for the deployment of computational problems in distributed hardware platforms, and the low portability of components, among others. This work proposes a complete grid infrastructure for distributed high performance computing based on dynamically reconfigurable FPGAs. Besides, a set of services designed to facilitate the application deployment is described. An example application and a comparison with other hardware and software implementations are shown. Experimental results show that the proposed architecture offers encouraging advantages for deployment of high performance distributed applications simplifying development process

    Cross-vendor programming abstraction for diverse heterogeneous platforms

    Get PDF
    Hardware specialization is a well-known means to significantly improve the performance and energy efficiency of various application domains. Modern computing systems consist of multiple specialized processing devices which need to collaborate with each other to execute common tasks. New heterogeneous programming abstractions have been created to program heterogeneous systems. Even though many of these abstractions are open vendor-independent standards, cross-vendor interoperability between different implementations is limited since the vendors typically do not have commercial motivations to invest in it. Therefore, getting good performance from vendor-independent heterogeneous programming standards has proven difficult for systems with multiple different device types. In order to help unify the field of heterogeneous programming APIs for platforms with hardware accelerators from multiple vendors, we propose a new software abstraction for hardware-accelerated tasks based on the open OpenCL programming standard. In the proposed abstraction, we rely on the built-in kernel feature of the OpenCL specification to define a portability layer that stores enough information for automated accelerator utilization. This enables the portability of high-level applications to a diverse set of accelerator devices with minimal programmer effort. The abstraction enables a layered software architecture that provides for an efficient combination of application phases to a single asynchronous application description from multiple domain-specific input languages. As proofs of the abstraction layer serving its purpose for the layers above and below it, we show how a domain-specific input description ONNX can be implemented on top of this portability abstraction, and how it also allows driving fixed function and FPGA-based hardware accelerators below in the hardware-specific backend. We also provide an example implementation of the abstraction to show that the abstraction layer does not seem to incur significant execution time overhead.publishedVersionPeer reviewe

    PYDAC: A DISTRIBUTED RUNTIME SYSTEM AND PROGRAMMING MODEL FOR A HETEROGENEOUS MANY-CORE ARCHITECTURE

    Get PDF
    Heterogeneous many-core architectures that consist of big, fast cores and small, energy-efficient cores are very promising for future high-performance computing (HPC) systems. These architectures offer a good balance between single-threaded perfor- mance and multithreaded throughput. Such systems impose challenges on the design of programming model and runtime system. Specifically, these challenges include (a) how to fully utilize the chip’s performance, (b) how to manage heterogeneous, un- reliable hardware resources, and (c) how to generate and manage a large amount of parallel tasks. This dissertation proposes and evaluates a Python-based programming framework called PyDac. PyDac supports a two-level programming model. At the high level, a programmer creates a very large number of tasks, using the divide-and-conquer strategy. At the low level, tasks are written in imperative programming style. The runtime system seamlessly manages the parallel tasks, system resilience, and inter- task communication with architecture support. PyDac has been implemented on both an field-programmable gate array (FPGA) emulation of an unconventional het- erogeneous architecture and a conventional multicore microprocessor. To evaluate the performance, resilience, and programmability of the proposed system, several micro-benchmarks were developed. We found that (a) the PyDac abstracts away task communication and achieves programmability, (b) the micro-benchmarks are scalable on the hardware prototype, but (predictably) serial operation limits some micro-benchmarks, and (c) the degree of protection versus speed could be varied in redundant threading that is transparent to programmers

    Methodology for complex dataflow application development

    Get PDF
    This thesis addresses problems inherent to the development of complex applications for reconfig- urable systems. Many projects fail to complete or take much longer than originally estimated by relying on traditional iterative software development processes typically used with conventional computers. Even though designer productivity can be increased by abstract programming and execution models, e.g., dataflow, development methodologies considering the specific properties of reconfigurable systems do not exist. The first contribution of this thesis is a design methodology to facilitate systematic develop- ment of complex applications using reconfigurable hardware in the context of High-Performance Computing (HPC). The proposed methodology is built upon a careful analysis of the original application, a software model of the intended hardware system, an analytical prediction of performance and on-chip area usage, and an iterative architectural refinement to resolve identi- fied bottlenecks before writing a single line of code targeting the reconfigurable hardware. It is successfully validated using two real applications and both achieve state-of-the-art performance. The second contribution extends this methodology to provide portability between devices in two steps. First, additional tool support for contemporary multi-die Field-Programmable Gate Arrays (FPGAs) is developed. An algorithm to automatically map logical memories to hetero- geneous physical memories with special attention to die boundaries is proposed. As a result, only the proposed algorithm managed to successfully place and route all designs used in the evaluation while the second-best algorithm failed on one third of all large applications. Second, best practices for performance portability between different FPGA devices are collected and evaluated on a financial use case, showing efficient resource usage on five different platforms. The third contribution applies the extended methodology to a real, highly demanding emerging application from the radiotherapy domain. A Monte-Carlo based simulation of dose accumu- lation in human tissue is accelerated using the proposed methodology to meet the real time requirements of adaptive radiotherapy.Open Acces
    corecore