15 research outputs found
Modeling, Synthesis, and Validation of Heterogeneous Biomedical Embedded Systems
Abstract-The increasing performance and availability of embedded systems increases their attractiveness for biomedical applications. With advances in sensor processing and classification algorithms, real-time decision support in patient monitoring becomes feasible. However, the gap between algorithm design and their embedded realization is growing. This paper overviews an approach for development of biomedical devices at an abstract algorithm level with automatic generation of an embedded implementation. Based on a case study of a Brain Computer Interface (BCI), this paper demonstrates capturing, modeling and synthesis of such applications
Flexible Function-Level Acceleration of Embedded Vision Applications using the Pipelined Vision Processor
Abstract-The emerging massive embedded vision market is driving demanding and ever-increasing computationally complex high-performance and low-power MPSoC requirements. To satisfy these requirements innovative solutions are required to deliver high performance pixel processing combined with low energy per pixel execution. These solutions must combine the power efficiency of ASIC style IP while incorporating elements of Instruction-Level Processors flexibility and software ecosystem. This paper introduces Analog Devices BF609's Pipelined Vision Processor (PVP) as a state-of-the-art industrial solution achieving both efficiency and flexibility. The PVP incorporates over 10 function level blocks enabling dozens of programmable functions that can be allocated to implement many algorithms and applications. Additionally, the pipelined style connectivity is programmable enabling many temporal function permutations. Overall, the PVP offers greater than 25 billion operations per second (GOPs) and very low memory bandwidth. These capabilities enable the PVP to execute multiple concurrent ADAS, Industrial, or general vision applications. This paper focuses on the key architecture concepts of the PVP from individual functionblock construction to the allocation and chaining of functional blocks to build function based application implementations. The paper also addresses the benefits and challenges of architecting and programming at the function-level granularity and abstractions
A Fast and Accurate Cost Model for FPGA Design Space Exploration in HPC Applications
Heterogeneous High-Performance Computing
(HPC) platforms present a significant programming challenge,
especially because the key users of HPC resources are scientists,
not parallel programmers. We contend that compiler technology
has to evolve to automatically create the best program variant
by transforming a given original program. We have developed a
novel methodology based on type transformations for generating
correct-by-construction design variants, and an associated
light-weight cost model for evaluating these variants for
implementation on FPGAs. In this paper we present a key
enabler of our approach, the cost model. We discuss how we
are able to quickly derive accurate estimates of performance
and resource-utilization from the design’s representation in our
intermediate language. We show results confirming the accuracy
of our cost model by testing it on three different scientific
kernels. We conclude with a case-study that compares a solution
generated by our framework with one from a conventional
high-level synthesis tool, showing better performance and
power-efficiency using our cost model based approach
Type-driven automated program transformations and cost modelling for optimising streaming programs on FPGAs
In this paper we present a novel approach to program optimisation based on compiler-based type-driven program transformations and a fast and accurate cost/performance model for the target architecture. We target streaming programs for the problem domain of scientific computing, such as numerical weather prediction. We present our theoretical framework for type-driven program transformation, our target high-level language and intermediate representation languages and the cost model and demonstrate the effectiveness of our approach by comparison with a commercial toolchain
Review of System Design Frameworks
In the last decade, the enormous development of the semiconductor industry with ever-increasing complexities of digital embedded systems and strong market competition with fast time-to-market and low design cost demands have imposed serious difficulty to a conventional design method. Therefore, there emerges a new design flow named model-based system design, which is based on high-level abstraction models, heavy design automation, and extensive component reuse to increase productivity and satisfy the market pressure.
This thesis presents reviews of ten high level academic system design frameworks and tools that have been proposed and implemented recently to support the model based design flow, namely System-on-Chip Environment (SCE), Embedded System Environment (ESE), Metropolis, Daedalus, SystemCoDesigner (SCD), xPilot, GAUT, No-Instruction-Set Computer (NISC), Formal System Design (ForSyDe), and Ptolemy II. These tools are then compared to each other in various aspects comprising objective, technique, implementation and capability. Following that, three design flow frameworks, including ESE, Daedalus, and SystemCoDesigner, are experimented for their real usage, performance and practicality.
The frameworks and tools implementing the model-based design flow all show promising results. Modelling tools (ForSyDe, and Ptolemy II) can sufficiently capture a wide range of complicated modern systems, while high-level synthesis tools (xPilot, GAUT, and NISC) produce better design qualities in terms of area, power, and cost in comparison to traditional works. Study cases of design flow frameworks (SCE, ESE, Metropolis, Daedalus, and SCD) show the model-based method significantly reduces developing time as well as facilitates the system design process. However, most of these tools and frameworks are being incomplete, and still under the experimental stage. There still be a lot of works needed until the method can be put into practice
Projeto unificado de componentes em hardware e software para sistemas embarcados
Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro TecnolĂłgico, Programa de PĂłs-Graduação em CiĂŞncia da Computação, FlorianĂłpolis, 2013.O crescente aumento na complexidade dos sistemas embarcados está ocasionando uma migração para tĂ©cnicas de projeto em nĂveis mais altos de abstração, o que tem levado a uma convergĂŞncia entre as metodologias de desenvolvimento de hardware e software. Este trabalho tem como objetivo principal contribuir nesse cenário propondo uma estratĂ©gia de desenvolvimento unificada que possibilita a implementação de componentes em hardware e software a partir de uma Ăşnica descrição na linguagem C++. As tĂ©cnicas propostas se baseiam em conceitos de programação orientada a objetos (do inglĂŞs Object-oriented Programming - OOP) e programação orientada a aspectos (do inglĂŞs Aspect-oriented Programming - AOP) para guiar uma estratĂ©gia de engenharia de domĂnio que facilita a clara separação entre a estrutura e comportamento-base de um componente das caracterĂsticas que sĂŁo especĂficas de implementações em hardware ou software.Certos aspectos de um componente, como, por exemplo, alocação de recursos e a interface de comunicação, sĂŁo modelados de maneiras distintas dependendo da implementação-alvo (hardware ou software). Este trabalho mostra como tais aspectos podem ser fatorados e encapsulados em programas de aspecto que sĂŁo aplicados Ă s descrições iniciais apenas quando o particionamento final entre hardware e software Ă© definido. Os mecanismos de aplicação de aspectos sĂŁo definidos via metaprogramação estática utilizando os templates do C++. Dessa forma, a extração de implementações em hardware ou software a partir de uma implementação unificada em C++ Ă© direta e se dá atravĂ©s de transformações no nĂvel da linguagem suportadas por uma grande gama de compiladores e ferramentas de sĂntese de alto-nĂvel (do inglĂŞs High-level Synthesis - HLS). Para avaliar a abordagem proposta, foi desenvolvida uma plataforma flexĂvel para implementação de System-on-Chips (SoCs) em dispositivos lĂłgico programáveis. A infraestrutura de hardware/software desenvolvida utiliza uma arquitetura baseadas em Network-on-Chips (NoCs) para prover um mecanismo de comunicação transparente entre hardware e software. A avaliação dos mecanismos propostos foi feita atravĂ©s da implementação de um SoC para aplicações PABX. Os resultados mostraram que a estratĂ©gia proposta resulta em componentes flexĂveis e reusáveis com uma eficiĂŞncia muito prĂłxima a de componentes implementados especificamente para software ou hardware.Abstract : The increasing complexity of current embedded systems is pushing their design to higher levels of abstraction, leading to a convergence between hardware and software design methodologies. In this work we aim at narrowing the gap between hardware and software design by introducing a strategy that handles both domains in a unified fashion. We leverage on Aspect-oriented Programming (AOP) and Object-oriented Programming (OOP) techniques in order to provide unified C++ descriptions of embedded system components. Such unified descriptions can be obtained through a careful design process focused on isolating aspects that are specific of hardware and software scenarios. Aspects that differ significantly in each domain, such as resource allocation and communication interface, were isolated in aspect programs that are applied to the unified descriptions before they are compiled to software binaries or synthesized to dedicated hardware using High-level Synthesis (HLS) tools. Furthermore, we propose a flexible FPGA-based SoC platform for the deployment of SoCs in a HLS-capable environment. The proposed hardware/software infrastructure relies on a Network-on-Chip-based architecture to provide transparent communication mechanisms for hardware and software components. The proposed unified design approach and its transparent communication mechanisms are evaluated through the implementation of a SoC for digital PABX systems. The results show that our strategy leads to reusable and flexible components at the cost of an acceptable overhead when compared to software-only C/C++ and hardware-only C++ implementations
Synthèse et description de circuits numériques au niveau des transferts synchronisés par les données
RÉSUMÉ Au-delà des processeurs d’instructions multi-coeurs, le monde du traitement numérique haute performance moderne est également caractérisé par l’utilisation de circuits spécifiques à un domaine d’application implémentés au moyen de circuits programmables FPGA (réseau de portes programmables in situ). Les FPGA représentent des candidats intéressants à la réalisation de calculs haute-performances pour différentes raisons. D’une part, le nombre importants
de blocs de propriétés intellectuelles gravés en dur sur ces puces (processeurs, mémoires, unités de traitement de signal numérique) réduit l’écart qui les sépare des circuits intégrés dédiés en termes de ressources disponibles. Un écart qui s’explique par le haut niveau de configurabilité offert par le circuit programmable, une capacité pour laquelle un grand nombre de ressources doit être dédié sans être utilisé par le circuit programmé. Néanmoins dans un contexte où souvent plus de transistors sont disponibles qu’on puisse en utiliser, le coût
associé à la configurabilité s’en trouve d’autant réduit.
De par leur capacité à être reconfigurés complètement ou partiellement, les FPGAs modernes, tout comme les processeurs d’instructions, offrent la flexibilité requise pour supporter un grand nombre d’applications. Néanmoins, contrairement aux processeurs d’instructions qui
peuvent être programmés avec différents langages de programmation haut-niveau (Java, C#, C/C++, MPI, OpenMP, OpenCL), la programmation d’un FPGA requiert la spécification d’un circuit numérique, ce qui représente un obstacle majeur à leur plus grande adoption. La description de circuits numériques est généralement exprimée au moyen d’un langage concurrent pour lequel le niveau d’abstraction se situe au niveau des transferts entre registres
(RTL), tels les langages VHDL et Verilog. Pour une application donnée, la réalisation d’un circuit numérique spécialisé requiert typiquement un effort de conception significativement plus grand qu’une réalisation logicielle. Il existe aujourd’hui différents outils académiques et commerciaux permettant la synthèse haut-niveau de circuits numériques en partant de descriptions C/C++/SystemC, et plus récemment OpenCL. Cependant, selon l’application
considérée, ces outils ne permettent pas toujours d’obtenir des performances comparables à celles qui peuvent être obtenues avec une description RTL produite manuellement.
On s’intéresse dans ce travail à un outil de synthèse de niveau intermédiaire offrant un compromis entre les performances atteignables au moyen d’une méthode de conception RTL, ainsi que les temps de conception que permet la synthèse à haut-niveau.----------ABSTRACT
Beyond modern multi/many-cores processors, the world of computing is also caracterized by the use of dedicated circuits implemented on Field-Programmable Gate-Arrays (FPGAs). For many reasons, modern FPGAs have become interesting targets for high-performance computing applications. On one hand, their integration of considerable amounts of IP blocks (processors, memories, DSPs) has contributed to reduce the resource/performance gap that exist with Application Specific Integrated Devices (ASICs). A gap that is easily explained by the high-level of reconfigurability that these devices provide, a feature for which a considerable amount of resources (transistors) must be dedicated. Nevertheless, in a context where often
more transistors are often available than it is needed or required, the impact of such a cost is less important.
The ability to reconfigure completely or partially modern FPGAs further offer the flexibility required to support multiple different applications over time, similarly to instruction processors. However, while instruction processors can be programmed with different high abstraction
level software programming languages (Java, C#, C/C++, MPI, OpenMP, OpenCL), FPGA programming typically requires the specification of a hardware design, which is a major
obstacle to their widespread use. The description of a hardware design is generally done at the register-transfer level (RTL), using hardware description languages (HDLs) such as VHDL and Verilog. For a given application, the design and verification of a dedicated circuit requires a significantly more important effort than a software implementation. Nowadays, numerous commercial and academic tools allow the high-level synthesis of hardware designs
starting from a software description using programming languages such as C/C++/SystemC, and more recently OpenCL. Nevertheless, depending on the application considered, at current state of the art, these tools do not allow performances that matches those which can be obtained through hand-made RTL designs. In this work, we consider an intermediate-level synthesis methodology offering a compromise between the performances and design times that can be obtained with RTL and high-level synthesis methodologies. We consider an input hardware description language that allows the description of algorithmic state machines (ASMs) handling connections between sources
and sinks with predefined streaming interfaces. These interfaces are similar AXI4-Streaming and Avalon-Streaming interfaces, featuring ready-to-send/ready-to-receive synchronisation signals