Search CORE

11 research outputs found

Design and development from single core reconfigurable accelerators to a heterogeneous accelerator-rich platform

Author: Hussain M. Waqar
Publication venue: Tampere University of Technology
Publication date: 01/01/2014
Field of study

The performance of a platform is evaluated based on its ability to deal with the processing of multiple applications of different nature. In this context, the platform under evaluation can be of homogeneous, heterogeneous or of hybrid architecture. The selection of an architecture type is generally based on the set of different target applications and performance parameters, where the applications can be of serial or parallel nature. The evaluation is normally based on different performance metrics, e.g., resource/area utilization, execution time, power and energy consumption. This process can also include high-level performance metrics, e.g., Operations Per Second (OPS), OPS/Watt, OPS/Hz, Watt/Area etc. An example of architecture selection can be related to a wireless communication system where the processing of computationally-intensive signal-processing algorithms has strict execution-time constraints and in this case, a platform with special-purpose accelerators is relatively more suitable than a typical homogeneous platform. A couple of decades ago, it was expensive to plant many special-purpose accelerators on a chip as the cost per unit area was relatively higher than today. The utilization wall is also becoming a limiting factor in homogeneous multicore scaling which means that all the cores on a platform cannot be operated at their maximum frequency due to a possible thermal meltdown. In this case, some of the processing cores have to be turned-off or to be operated at very low frequencies making most of the part of the chip to stay underutilized. A possible solution lies in the use of heterogeneous multicore platforms where many application-specific cores operate at lower frequencies, therefore reducing power dissipation density and increasing other performance parameters. However, to achieve maximum flexibility in processing, a general-purpose flavor can also be introduced by adding a few Reduced Instruction-Set Computing (RISC) cores. A power class of heterogeneous multicore platforms is an accelerator-rich platform where many application-specific accelerators are loosely connected with each other for work load distribution or to execute the tasks independently. This research work spans from the design and development of three different types of template-based Coarse-Grain Reconfigurable Arrays (CGRAs), i.e., CREMA, AVATAR and SCREMA to a Heterogeneous Accelerator-Rich Platform (HARP). The accelerators generated from the three CGRAs could perform different lengths and types of Fast Fourier Transform (FFT), real and complex Matrix-Vector Multiplication (MVM) algorithms. CREMA and AVATAR were fixed CGRAs with eight and sixteen number of Processing Element (PE) columns, respectively. SCREMA could flex between four, eight, sixteen and thirty two number of PE columns. Many case studies were conducted to evaluate the performance of the reconfigurable accelerators generated from these CGRA templates. All of these CGRAs work in a processor/coprocessor model tightly integrated with a Direct Memory Access (DMA) device. Apart from these platforms, a reconfigurable Application-Specific Instruction-set Processor (rASIP) is also designed, tested for FFT execution under IEEE-802.11n timing constraints and evaluated against a processor/coprocessor model. It was designed by integrating AVATAR generated radix-(2, 4) FFT accelerator into the datapath of a RISC processor. The instruction set of the RISC processor was extended to perform additional operations related to AVATAR. As mentioned earlier, the underutilized part of the chip, now-a-days called Dark Silicon is posing many challenges for the designers. Apart from software optimizations, clock gating, dynamic voltage/frequency scaling and other high-level techniques, one way of dealing with this problem is to use many application-specific cores. In an effort to maximize the number of reconfigurable processing resources on a platform, the accelerator-rich architecture HARP was designed and evaluated in terms of different performance metrics. HARP is constructed on a Network-on-Chip (NoC) of 3x3 nodes where with every node, a CGRA of application-specific size is integrated other than the central node which is attached to a RISC processor. The RISC establishes synchronization between the nodes for data transfer and also performs the supervisory control. While using the NoC as the backbone of communication between the cores, it becomes possible for all the cores to address each other and also perform execution simultaneously and independently of each other. The performance of accelerators generated from CREMA, AVATAR and SCREMA templates were evaluated individually and also when attached to HARP's NoC nodes. The individual CGRAs show promising results in their own capacity but when integrated all together in the framework of HARP, interesting comparisons were established in terms of overall execution times, resource utilization, operating frequencies, power and energy consumption. In evaluating HARP, estimates and measurements were also made in some advanced performance metrics, e.g., in MOPS/mW and MOPS/MHz. The overall research work promotes the idea of heterogeneous accelerator-rich platform as a solution to current problems and future needs of industry and academia

Trepo - Institutional Repository of Tampere University

A Compiler Framework for a Coarse-Grained Reconfigurable Array

Author: Valderas Rodríguez Leticia Trinidad
Publication venue
Publication date: 12/08/2015
Field of study

The number of transistors on a chip is increasing with time giving rise to multiple design challenges. In this context, reconfigurable architectures have emerged to provide high flexibility, less power/energy consumption yet while delivering high performance levels. The success of an embedded architecture depends on powerful compiler support. Current studies are focused on developing compilers to reduce the designer’s effort by introducing many automation related features. In this thesis work, a compiler framework is presented for a scalable Coarse-Grained Reconfigurable Array (CGRA) called SCREMA. The compiler framework presented in this thesis replaces the exiting GUI compiler with an added feature of automatic placement and routing. The compiler receives a Reverse Polish Notation (RPN) description of the target algorithm by the user. It extracts the computational information from the RPN description and performs placement and routing over the CGRA template. The first configuration stream generated by the compiler is the main processing context. Furthermore, if additional configuration patterns have to be designed, the compiler framework gives the possibility to implement them in two different design paradigms: a preprocessing context and a canonical context. Pre-processing context is used to align the data into a CGRA to facilitate post-processing. Canonical context allows the user to perform additions in sum-of-products related algorithms. The compiler framework has been tested by implementing real integer Matrix-Vector Multiplication (MVM) algorithms. Specifically, the tested MVM orders are 4th, 8th, 16th and 32nd on the CGRA sizes of 4x4, 4x8, 4x16 and 4x32 PEs, respectively. All the implementation are based on the RPN description of 4th-order MVM. Other than implementing 4th-order MVM, the rest of tested MVM algorithms need preprocessing and canonical contexts to be designed and implemented. The user effort which was needed to Place and Route (P&R) an algorithm manually on SCREMA is now reduced by using this compiler framework as it provides an automatic P&R mechanism

Trepo - Institutional Repository of Tampere University

TUT DPub

On the design of multimedia architectures : proceedings of a one-day workshop, Eindhoven, December 18, 2003

Author
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2003
Field of study

Pure OAI Repository

On the design of multimedia architectures : proceedings of a one-day workshop, Eindhoven, December 18, 2003

Author
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2003
Field of study

Pure OAI Repository

FDSOI Design using Automated Standard-Cell-Grained Body Biasing

Author: Kühn Johannes Maximilian
Publication venue: Universität Tübingen
Publication date: 01/01/2017
Field of study

With the introduction of FDSOI processes at competitive technology nodes, body biasing on an unprecedented scale was made possible. Body biasing influences one of the central transistor characteristics, the threshold voltage. By being able to heighten or lower threshold voltage by more than 100mV, the very physics of transistor switching can be manipulated at run time. Furthermore, as body biasing does not lead to different signal levels, it can be applied much more fine-grained than, e.g., DVFS. With the state of the art mainly focused on combinations of body biasing with DVFS, it has thus ignored granularities unfeasible for DVFS. This thesis fills this gap by proposing body bias domain partitioning techniques and for body bias domain partitionings thereby generated, algorithms that search for body bias assignments. Several different granularities ranging from entire cores to small groups of standard cells were examined using two principal approaches: Designer aided pre-partitioning based determination of body bias domains and a first-time, fully automatized, netlist based approach called domain candidate exploration. Both approaches operate along the lines of activation and timing of standard cell groups. These approaches were evaluated using the example of a Dynamically Reconfigurable Processor (DRP), a highly efficient category of reconfigurable architectures which consists of an array of processing elements and thus offers many opportunities for generalization towards many-core architectures. Finally, the proposed methods were validated by manufacturing a test-chip. Extensive simulation runs as well as the test-chip evaluation showed the validity of the proposed methods and indicated substantial improvements in energy efficiency compared to the state of the art. These improvements were accomplished by the fine-grained partitioning of the DRP design. This method allowed reducing dynamic power through supply voltage levels yielding higher clock frequencies using forward body biasing, while simultaneously reducing static power consumption in unused parts.Die Einführung von FDSOI Prozessen in gegenwärtigen Prozessgrößen ermöglichte die Nutzung von Substratvorspannung in nie zuvor dagewesenem Umfang. Substratvorspannung beeinflusst unter anderem eine zentrale Eigenschaft von Transistoren, die Schwellspannung. Mittels Substratvorspannung kann diese um mehr als 100mV erhöht oder gesenkt werden, was es ermöglicht, die schiere Physik des Schaltvorgangs zu manipulieren. Da weiterhin hiervon der Signalpegel der digitalen Signale unberührt bleibt, kann diese Technik auch in feineren Granularitäten angewendet werden, als z.B. Dynamische Spannungs- und Frequenz Anpassung (Engl. Dynamic Voltage and Frequency Scaling, Abk. DVFS). Da jedoch der Stand der Technik Substratvorspannung hauptsächlich in Kombinationen mit DVFS anwendet, werden feinere Granularitäten, welche für DVFS nicht mehr wirtschaftlich realisierbar sind, nicht berücksichtigt. Die vorliegende Arbeit schließt diese Lücke, indem sie Partitionierungsalgorithmen zur Unterteilung eines Entwurfs in Substratvorspannungsdomänen vorschlägt und für diese hierdurch unterteilten Domänen entsprechende Substratvorspannungen berechnet. Hierzu wurden verschiedene Granularitäten berücksichtigt, von ganzen Prozessorkernen bis hin zu kleinen Gruppen von Standardzellen. Diese Entwürfe wurden dann mit zwei verschiedenen Herangehensweisen unterteilt: Chipdesigner unterstützte, vorpartitionierungsbasierte Bestimmung von Substratvorspannungsdomänen, sowie ein erstmals vollautomatisierter, Netzlisten basierter Ansatz, in dieser Arbeit Domänen Kandidaten Exploration genannt. Beide Ansätze funktionieren nach dem Prinzip der Aktivierung, d.h. zu welchem Zeitpunkt welcher Teil des Entwurfs aktiv ist, sowie der Signallaufzeit durch die entsprechenden Entwurfsteile. Diese Ansätze wurden anhand des Beispiels Dynamisch Rekonfigurierbarer Prozessoren (DRP) evaluiert. DRPs stellen eine Klasse hocheffizienter rekonfigurierbarer Architekturen dar, welche hauptsächlich aus einem Feld von Rechenelementen besteht und dadurch auch zahlreiche Möglichkeiten zur Verallgemeinerung hinsichtlich Many-Core Architekturen zulässt. Schließlich wurden die vorgeschlagenen Methoden in einem Testchip validiert. Alle ermittelten Ergebnisse zeigen im Vergleich zum Stand der Technik drastische Verbesserungen der Energieeffizienz, welche durch die feingranulare Unterteilung in Substratvorspannungsdomänen erzielt wurde. Hierdurch konnten durch die Anwendung von Substratvorspannung höhere Taktfrequenzen bei gleicher Versorgungsspannung erzielt werden, während zeitgleich in zeitlich unkritischen oder ungenutzten Entwurfsteilen die statische Leistungsaufnahme minimiert wurde

Publikationsserver der Universität Tübingen

Reconfigurable Computing Based on Commercial FPGAs. Solutions for the Design and Implementation of Partially Reconfigurable Systems = Computación reconfigurable basada en FPGAs comerciales. Soluciones para el diseño e implementación de sistemas parcialmente reconfigurables.

Author: Esteves Krasteva Yana
Publication venue: E.T.S.I. Industriales (UPM)
Publication date: 01/01/2009
Field of study

Esta tesis doctoral está enmarcada en el campo de investigación de la computación reconfigurable. Este campo ha experimentado un crecimiento abrumador en los últimos años como resultado de la evolución de los dispositivos reconfigurables, donde las Field Programmable Gate Arrays (FPGAs) son el máximo exponente desde el punto de vista comercial. De forma tradicional las empresas de electrónica han seleccionado las FPGAs como prototipos iníciales de productos de altas prestaciones. Luego el sistema final es integrado en Application Specific Integrated Circuits (ASICs) que se producen en grandes volúmenes perimiendo amortiza su alto coste de diseño y producción y aprovechando la ventaja del bajo coste por unidad. Por otro lado, los DSPs (Digital Signal Processing) y los microprocesadores han sido preferidos por su bajo coste ante las FPGAs el campo de los dispositivos con menores requisitos de cómputo. En los últimos años, este panorama está sufriendo una serie de cambios. Ahora el mercado busca mas soluciones “reconfigurables” ya que permiten reducir el tiempo de salida del producto al mercado (time-to-market), aumentar el tiempo del producto en el mercado (time-in-market) y además cubren los amplios requisitos de cómputo. El cambio que se observa, se debe a que los dispositivos programables han evolucionado de simples estructuras programables a complejas plataformas reconfigurables. Las FPGAs del estado de la técnica han alcanzado un grado de integración muy alto y además ahora contienen, dentro de su arquitectura programable, microprocesadores y lógica específica de procesamiento digital de señal. Otro factor sumamente importante para el cambio es que las FPGAs permiten el diseño de dispositivos cuyo hardware pueden ser adaptado, o actualizado, una vez que el producto ya esta entregado e instalado, obteniendo así una flexibilidad en el hardware comparable con la del software, donde la actualización postventa de los sistemas es una práctica muy explotada de cara a la reducción de costes y la salida rápida al mercado. Por otro lado, y sobre todo en el ámbito académico, existen dispositivos reconfigurables con distinta granularidad que permites alcanzar altas prestaciones en comparación con las FPGAs comerciales de grano fino (comparable con la de los ASICs), pero están restringidas a una aplicación o grupo de aplicaciones. A pesar de que los dispositivos reconfigurables propietarios ofrecen muchas ventajas, esta opción ha sido descartada en la presente tesis debido a que, desde el punto de vista industrial requieren, aparte del diseño del ASIC reconfigurable, el desarrollo de un entorno de diseño completo. Todo esto conlleva a un elevado coste de recursos, además del alejamiento de las propuestas de la industria. La presente tesis se ha centrado en proporcionar soluciones para dispositivos comerciales, FPGAs de grano fino, con la finalidad de aprovechar las herramientas existentes y mantener las soluciones propuestas lo más cerca posible de la industria. Los dispositivos reconfigurables proporcionan diversos métodos de reconfiguración, siendo el más atractivo la reconfiguración parcial y dinámica, ya que permite adaptar el dispositivo sin interrumpir su funcionamiento y crear dispositivos auto-adaptables. Este tipo de reconfiguración será el objeto de estudio de la tesis doctoral. La reconfiguración parcial permite tener una serie de tareas hardware (módulos que se ubican en la estructura reconfigurable) ejecutándose paralelamente en la FPGA y sustituir un bloque por otro, dependiendo de las necesidades del sistema, sin alterar el funcionamiento del resto de bloques. Esta idea básica en teoría brinda la flexibilidad del software al hardware, que combinado con su paralelismo implícito hace del sistema reconfigurable una potente herramienta que puede dar pie a la creación de sistemas adaptables o incluso autoadaptativos, supercomputadores reconfigurables y hardware bio-inspirado entre otros. Por otro lado, a pesar que algunos proveedores de FPGAs permiten la reconfiguración parcial, el uso de esta técnica aún está restringido al ámbito académico y a sistemas muy básicos. El trabajo de investigación descrito dentro de la presente tesis doctoral ha tenido por objeto el estudio de diversos aspectos de los sistemas parcialmente reconfigurables, la identificación de las principales deficiencias de las soluciones existentes y la propuesta de nuevas soluciones originales. Como resultado del estudio del estado del arte se ha visto que las soluciones existentes son poco flexibles y la escalabilidad de los sistemas que se pueden diseñar es reducida. Por ello las propuestas originales de esta tesis tienen como objetivo permitir el diseño e implementación de sistemas parcialmente reconfigurables con alta escalabilidad y flexibilidad. La tesis principal del trabajo de investigación ha sido basada en la idea que para obtener una mayor flexibilidad de los sistemas se debe desligar el diseño del sistema reconfigurable del diseño de los cores que serán consumidos por dicho sistema. La tesis doctoral ha contribuido proponiendo mejores soluciones a nivel de arquitectura, flujos de diseño y herramientas que han permitido el diseño e implementación de diversos sistemas parcialmente reconfigurables con distinto grado de flexibilidad y escalabilidad. La flexibilidad y la escalabilidad son términos que en los sistemas reconfigurables se pueden asociar a diversos aspectos. Dentro de esta tesis la flexibilidad está asociada principalmente a la diversidad de cores o tareas hardware que pueden ser consumidos o integrados en un sistema ya definido, mientras que la escalabilidad está referida al número de cores que pueden coexistir en el sistema y ser reconfigurados independientemente. Para poder diseñar sistemas flexibles y escalables, estas características deben estar cubiertas en distintos niveles. Más en detalle dentro de la presente tesis, desde el punto de vista de la arquitectura, la flexibilidad está cubierta por la posibilidad de posicionar libremente cores en una arquitectura escalable predefinida. Desde el punto de vista del sistema, la flexibilidad está reflejada por la posibilidad de no sólo de modificar o reconfigurar un core del sistema hardware, sino también de modificar las comunicaciones internas del mismo. Desde el punto de vista del dispositivo, la flexibilidad está garantizada por la transparencia en el proceso de reconfiguración. Por último, la flexibilidad en el proceso de diseño está cubierta por la definición de herramientas y flujos de diseño que permiten por un lado desligar el diseño del sistema reconfigurable del diseño de los cores para el sistema, y por otro lado que diseñadores sin conocimientos detallados de reconfiguración parcial puedan diseñar cores. Dentro de la tesis doctoral se presentan cuatro dispositivos reconfigurable integrados en distintos entornos y con distinto grado de flexibilidad que corresponde al grado de aprovechamiento de las aportaciones originales de la tesis. Las principales aportaciones de la tesis doctoral, relacionadas a cada uno de los aspectos mencionados en el párrafo anterior, y tratados en distintas partes de la tesis se resumen a continuación destacando en la medida de lo posible las diferencias con respecto al estado del arte: Se ha definido una metodología de diseño de Arquitecturas Virtuales (abstracción de la arquitectura física de la FPGA que incluye la distribución de los recursos programables en slots y la forma de interconexión de los slots). La metodología, propuesta originalmente en esta tesis, permite el diseño de sistemas reconfigurables con alta flexibilidad y escalabilidad comparadas con el estado del arte. Una solución a la adaptación de las comunicaciones internas en los sistemas reconfigurables llamada DRNoC (Dynamic Reconfigurable NoC). La solución original abarca diversos aspectos e incluye la definición de una arquitectura de interconexión para los sistemas reconfigurables basada en redes en chip (Network on Chip - NoC), la definición de métodos de reconfiguración y el direccionamiento interno del sistema, y de forma más específica para las comunicaciones basadas en redes, la definición de un formato de tramas y la arquitectura de los enrutadores. La principal diferencia de la solución propuesta con el estado del arte es que DRNoC no restringe la comunicación únicamente a NoCs y permite la definición de cualquier tipo de esquema de comunicación (NoC, punto a punto, punto a multipunto, bus, o una combinación de las anteriores) y además, permite que varios esquemas de comunicación coexistan en el mismo sistema y que funcionen de forma independiente. De esta forma la solución propuesta brinda una mayor flexibilidad que las ya existentes. Se ha propuesto una solución para la manipulación de los ficheros de configuración para las FPGA del tipo Virtex II/Pro que es la más completa comparada con el estado del arte. Asimismo, una serie de herramientas que permiten la generación y extracción de cores para sistemas reconfigurables basados en FPGA Virtex II que ha sido la primera solución existente para estas FPGA. Un flujo de diseño para cores basado en plantillas que permite el diseño de cores hardware sin ser un experto en reconfiguración parcial y sin conocer los detalles del sistema final en el que se implementará el core. El diseño, implementación y prueba de un sistema parcialmente reconfigurable basado en FPGAs comerciales de grano fino para redes de sensores. La primera aproximación existente en el estado del arte al uso de los sistemas parcialmente reconfigurables en las redes de sensores. La integración de un sistema reconfigurable en un entorno cliente-servidor que incluye un original sistema de control de la reconfiguración. Una solución para la depuración de los sistemas reconfigurables. Un sistema de emulación y prototipado rápido de las comunicaciones dentro de un chip basado originalmente en la idea de la reutilización de cores hardware por medio de la técnica de reconfiguración parcial. Como conclusión global del trabajo de investigación realizado cabe destacar que la presente tesis ha dado lugar a la creación y consolidación de una línea de investigación en el grupo de electrónica digital del Centro de Electrónica Industrial que actualmente se encuentra entre las más activas y de mayor importancia. Además, el trabajo de investigación y la divulgación de las aportaciones originales han permitido que el centro de investigación pase a formar parte del estado del arte de los sistemas parcialmente reconfigurables. The thesis is enclosed in the research area of reconfigurable computing which, in the last years, has experienced a remarkable growth as a result of the impressive evolution of reconfigurable devices. In this area, Field Programmable Gate Arrays (FPGAs) are the most outstanding representative from the commercial point of view. Traditionally FPGAs have been used for prototyping, in previous to the final Application Specific Integrated Circuit (ASIC) design stages. However, the interest in the integration of FPGAs in final products has been growing in the last years. FPGAs are preferred for small production volumes, where the ASIC masks high cost is unaffordable and also in products where time-to-market is a priority, and waiting for a complete ASIC design cycle is not desirable. State of the art FPGAs are highly integrated electronic circuits, composed of tens of millions of system gates, with competitive speed, performance and configurability. These devices have evolved from simple gate arrays to complex platforms that include embedded memory, multipliers and even microprocessors and digital signal processing elements. Additionally, the fine grain nature of the reconfigurable arrays, make FPGAs suitable for a broad set of application domains. On the other side, and mostly in the academic community, there are custom reconfigurable devices with different granularity levels that permit to achieve higher performance, compared to commercial FPGAs, but for a certain application domain. Although there are very good solutions in the academic state of the art, their main drawback from the industry point of view is that they require specific design environments and also, that the efforts and resources needed for designing such solutions are very high. This thesis work is focused on providing solutions that target commercial fine grain reconfigurable devices, FPGAs, in order to take advantage of existing tools and to keep the proposed solutions closer to the industry. Today FPGAs provide different reconfiguration options. Among them, the most challenging one is partial reconfiguration. This feature has special interest, as it permits system updates on the fly once the device is deployed, without the need of stopping it and without theoretical loss of performance. Partial reconfiguration is also an attractive feature because it permits to allocate different tasks/cores running in parallel in the device and change them on the fly as needed without disturbing other tasks/cores. This basic idea, brings software-like flexibility to hardware which, in combination with its inherited parallelism, opens the door for a broad amount of possibilities and applications, like runtime adaptive super-computing, adaptive embedded software ii accelerators, bio-inspired, self-reconfigurable and self-arrangeable systems. However, even though some commercial FPGAs provide partial reconfiguration features, its utilization is still in its early stages and it is not well supported by FPGA vendors, making its exploitation in real electronic systems very difficult. Therefore, there are several academic groups working to provide alternative solutions for the design and implementation of partially reconfigurable systems based on commercial FPGAs that intend to stimulate their integration and use in the industry. This research work intents to study different aspects of partially reconfigurable system on-chip and contribute with flexibility improvements. The main idea that will be followed along the thesis is that the design of reconfigurable systems will be considered an independent process from the design of cores that will be consumed by the system. This approach involves the design of flexible and scalable partial runtime reconfigurable systems, where most of the thesis contribution will be focused. More in detail, this thesis will contribute to improve architecture solutions, design tools and design flows of partially reconfigurable systems for commercial FPGAs and provide systems with higher flexibility and scalability. Flexibility and scalability in a reconfigurable system are terms that can be related to several aspects. In this thesis flexibility is mainly related to the diversity of tasks or cores a system can consume, while scalability is connected to the number of cores that can run in parallel and be independently reconfigured. Flexibility and scalability have to be covered by the system at different levels and the work presented in this thesis will contribute in all the specific levels. More in detail, from an architectural point of view, flexibility is reflected by the possibility of freely loading tasks or cores in a defined, scalable architecture. From the system point of view, flexibility is related to the possibility of modifying not only the system functionality by loading different tasks, but also to adapt the on-chip communications. From the device point of view, flexibility is reflected by the reconfiguration process transparency and, from the design point of view, it is oriented to the definition of design tools and flows that will permit, as far as possible, non specialized designers to design cores for a partially reconfigurable system and without knowing the system details. All the original proposed solutions, in each individual aspect, will be compared with the state of the art and complete systems solutions will be designed and will be integrated in different application domains in order to validate the thesis proposals. In order to achieve better understanding of the thesis and to facilitate the comparison with some, selected, related work, the thesis structure is not traditional. Instead of including a state of the art and a result Chapter, each Chapter is focused on a specific aspect of partially reconfigurable system design and includes state of the art and result sections. The first chapter, Chapter 1, introduces the main concepts to be used in the thesis. Chapter 2 is focused on reconfigurable systems architectures, contributing with architecture solutions and a design method. Chapter 3 proposes a solution that enhances the features of the architectures defined in Chapter 2, and provides more flexibility to the entire system by extending reconfiguration to the on-chip communication. Chapter 4 is related to the design flows and tools, where contributions are made in both aspects and the proposed solutions are compared with the state of the Abstract and Thesis Organization iii art. Complete systems, with different independency levels, are presented in Chapter 5 in order to validate the thesis contributions. Conclusions, a summary of contributions and the future work are included in Chapter 6. A more detailed description of each Chapter content is presented below: Chapter 1 provides an introduction to the reconfigurable systems based on FPGAs topic, by first defining the place of FPGAs in the electronic industry and afterwards, introducing the main concepts to be used along the thesis. Although the focus is put on commercial reconfigurable devices, some custom reconfigurable systems are also described in order to have a complete view of the options in reconfigurable devices. The Chapter discuses the thesis main topic, related to partial runtime reconfigurable systems, highlighting its main advantages and disadvantages and, introducing some of the approaches to be followed in this thesis. The main term introduced in this Chapter, associated to reconfigurable systems architectures, is ”Virtual Architecture”. The term defines the architecture of the partially reconfigurable system and how the different regions it is composed of are interconnected. A brief summary of the thesis main goals is included at the end of the Chapter. The main topic of Chapter 2 is related to reconfigurable systems architectures design. The Chapter includes a specific state of the art section that reviews some existing architecture solutions. After that, a general method for virtual architectures design, an original thesis contribution, is presented in detail. Afterwards, the method is applied to the design of general one dimensional (1D) and two dimensional (2D) architectures for Xilinx Virtex II/Pro FPGAs and, as an example, following the specific steps of the method, two 1D, bus based, architectures are designed. The architecture buses are compared with two state of the art solutions in terms of area and performance in the results section of the same Chapter. Chapter 3 is focused on reconfigurable systems on-chip communication issues, where the need of adaptability is the main topic. Again, a state of the art description of some 2D reconfigurable systems is presented at the beginning of the Chapter and, afterwards, an original solution, called Dynamic Reconfigurable NoC (DRNoC), is proposed. This solution covers different aspects. First, an architecture oriented to support adaptability in the on-chip communications is originally proposed. The architecture is mapped to a Virtex II FPGA by modifying a virtual architecture from the general ones presented in Chapter 2. Second, two types of reconfigurations that span through different levels of the OSI communication model are originally proposed. Third, a set of Network on Chip models, focused on the communication adaptability are designed and/or adapted and presented in the Chapter, along with an original NoC packet format and router architecture. These models are mapped to the DRNoC architecture and implementation cost parameters are defined and used to evaluate different implementation options. Regarding the architecture reconfigurability, it is important to remark that along the entire Chapter, intermediate test of possible partial reconfigurations and test results are included. At the end of the Chapter, the proposed architecture is compared with the state of the art using a set of structural parameters taken from a reference work and complemented with others defined in the Chapter. Chapter 4, focuses on the design tools and flows for partially reconfigurable systems. Again, an overview of the state of the art is included at the beginning of the Chapter. Abstract and Thesis Organization iv Afterwards, an original software solution for Virtex II configuration files (bitstreams) manipulations is presented. The first part of the solution is a study of the Virtex II/Pro FPGA bitstream format, used to define a set of equations for accessing a specific bitstream resource (at register or block level). Based on these equations, a set of tools for bitstream manipulation that target resource restricted devices are originally presented. Also, a design flow, based on systems and virtual architectures templates, which permits a straightforward core design by non partial reconfiguration experts and without knowing the system details, is originally proposed. In Chapter 5 four reconfigurable systems with different flexibility level, which corresponds to the level of the thesis proposals exploitation, are presented. The selected application domains attempt to demonstrate different advantages of the use of partial runtime reconfigurable systems and therefore are mainly a proof of concept. The first domain belongs to the wireless sensor networks, where t

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Power and Energy Aware Heterogeneous Computing Platform

Author: Nouri Sajjad
Publication venue: Tampere University of Technology
Publication date: 01/01/2018
Field of study

During the last decade, wireless technologies have experienced signiﬁcant development, most notably in the form of mobile cellular radio evolution from GSM to UMTS/HSPA and thereon to Long-Term Evolution (LTE) for increasing the capacity and speed of wireless data networks. Considering the real-time constraints of the new wireless standards and their demands for parallel processing, reconﬁgurable architectures and in particular, multicore platforms are part of the most successful platforms due to providing high computational parallelism and throughput. In addition to that, by moving toward Internet-of-Things (IoT), the number of wireless sensors and IP-based high throughput network routers is growing at a rapid pace. Despite all the progression in IoT, due to power and energy consumption, a single chip platform for providing multiple communication standards and a large processing bandwidth is still missing.The strong demand for performing different sets of operations by the embedded systems and increasing the computational performance has led to the use of heterogeneous multicore architectures with the help of accelerators for computationally-intensive data-parallel tasks acting as coprocessors. Currently, highly heterogeneous systems are the most power-area efﬁcient solution for performing complex signal processing systems. Additionally, the importance of IoT has increased signiﬁcantly the need for heterogeneous and reconﬁgurable platforms.On the other hand, subsequent to the breakdown of the Dennardian scaling and due to the enormous heat dissipation, the performance of a single chip was obstructed by the utilization wall since all cores cannot be clocked at their maximum operating frequency. Therefore, a thermal melt-down might be happened as a result of high instantaneous power dissipation. In this context, a large fraction of the chip, which is switched-off (Dark) or operated at a very low frequency (Dim) is called Dark Silicon. The Dark Silicon issue is a constraint for the performance of computers, especially when the up-coming IoT scenario will demand a very high performance level with high energy efﬁciency. Among the suggested solution to combat the problem of Dark-Silicon, the use of application-speciﬁc accelerators and in particular Coarse-Grained Reconﬁgurable Arrays (CGRAs) are the main motivation of this thesis work.This thesis deals with design and implementation of Software Deﬁned Radio (SDR) as well as High Efﬁciency Video Coding (HEVC) application-speciﬁc accelerators for computationally intensive kernels and data-parallel tasks. One of the most important data transmission schemes in SDR due to its ability of providing high data rates is Orthogonal Frequency Division Multiplexing (OFDM). This research work focuses on the evaluation of Heterogeneous Accelerator-Rich Platform (HARP) by implementing OFDM receiver blocks as designs for proof-of-concept. The HARP template allows the designer to instantiate a heterogeneous reconﬁgurable platform with a very large amount of custom-tailored computational resources while delivering a high performance in terms of many high-level metrics. The availability of this platform lays an excellent foundation to investigate techniques and methods to replace the Dark or Dim part of chip with high-performance silicon dissipating very low power and energy. Furthermore, this research work is also addressing the power and energy issues of the embedded computing systems by tailoring the HARP for self-aware and energy-aware computing models. In this context, the instantaneous power dissipation and therefore the heat dissipation of HARP are mitigated on FPGA/ASIC by using Dynamic Voltage and Frequency Scaling (DVFS) to minimize the dark/dim part of the chip. Upgraded HARP for self-aware and energy-aware computing can be utilized as an energy-efﬁcient general-purpose transceiver platform that is cognitive to many radio standards and can provide high throughput while consuming as little energy as possible. The evaluation of HARP has shown promising results, which makes it a suitable platform for avoiding Dark Silicon in embedded computing platforms and also for diverse needs of IoT communications.In this thesis, the author designed the blocks of OFDM receiver by crafting templatebased CGRA devices and then attached them to HARP’s Network-on-Chip (NoC) nodes. The performance of application-speciﬁc accelerators generated from templatebased CGRAs, the performance of the entire platform subsequent to integrating the CGRA nodes on HARP and the NoC trafﬁc are recorded in terms of several highlevel performance metrics. In evaluating HARP on FPGA prototype, it delivers a performance of 0.012 GOPS/mW. Because of the scalability and regularity in HARP, the author considered its value as architectural constant. In addition to showing the gain and the beneﬁts of maximizing the number of reconﬁgurable processing resources on a platform in comparison to the scaled performance of several state-of-the-art platforms, HARP’s architectural constant ensures application-independent ﬁgure of merit. HARP is further evaluated by implementing various sizes of Discrete Cosine transform (DCT) and Discrete Sine Transform (DST) dedicated for HEVC standard, which showed its ability to sustain Full HD 1080p format at 30 fps on FPGA. The author also integrated self-aware computing model in HARP to mitigate the power dissipation of an OFDM receiver. In the case of FPGA implementation, the total power dissipation of the platform showed 16.8% reduction due to employing the Feedback Control System (FCS) technique with Dynamic Frequency Scaling (DFS). Furthermore, by moving to ASIC technology and scaling both frequency and voltage simultaneously, signiﬁcant dynamic power reduction (up to 82.98%) was achieved, which proved the DFS/DVFS techniques as one step forward to mitigate the Dark Silicon issue

Trepo - Institutional Repository of Tampere University

Runtime Hardware Reconfiguration in Wireless Sensor Networks for Condition Monitoring

Author: Philipp François
Publication venue: Universitäts- und Landesbibliothek Darmstadt
Publication date: 24/09/2014
Field of study

The integration of miniaturized heterogeneous electronic components has enabled the deployment of tiny sensing platforms empowered by wireless connectivity known as wireless sensor networks. Thanks to an optimized duty-cycled activity, the energy consumption of these battery-powered devices can be reduced to a level where several years of operation is possible. However, the processing capability of currently available wireless sensor nodes does not scale well with the observation of phenomena requiring a high sampling resolution. The large amount of data generated by the sensors cannot be handled efficiently by low-power wireless communication protocols without a preliminary filtering of the information relevant for the application. For this purpose, energy-efficient, flexible, fast and accurate processing units are required to extract important features from the sensor data and relieve the operating system from computationally demanding tasks. Reconfigurable hardware is identified as a suitable technology to fulfill these requirements, balancing implementation flexibility with performance and energy-efficiency. While both static and dynamic power consumption of field programmable gate arrays has often been pointed out as prohibitive for very-low-power applications, recent programmable logic chips based on non-volatile memory appear as a potential solution overcoming this constraint. This thesis first verifies this assumption with the help of a modular sensor node built around a field programmable gate array based on Flash technology. Short and autonomous duty-cycled operation combined with hardware acceleration efficiently drop the energy consumption of the device in the considered context. However, Flash-based devices suffer from restrictions such as long configuration times and limited resources, which reduce their suitability for complex processing tasks. A template of a dynamically reconfigurable architecture built around coarse-grained reconfigurable function units is proposed in a second part of this work to overcome these issues. The module is conceived as an overlay of the sensor node FPGA increasing the implementation flexibility and introducing a standardized programming model. Mechanisms for virtual reconfiguration tailored for resource-constrained systems are introduced to minimize the overhead induced by this genericity. The definition of this template architecture leaves room for design space exploration and application- specific customization. Nevertheless, this aspect must be supported by appropriate design tools which facilitate and automate the generation of low-level design files. For this purpose, a software tool is introduced to graphically configure the architecture and operation of the hardware accelerator. A middleware service is further integrated into the wireless sensor network operating system to bridge the gap between the hardware and the design tools, enabling remote reprogramming and scheduling of the hardware functionality at runtime. At last, this hardware and software toolchain is applied to real-world wireless sensor network deployments in the domain of condition monitoring. This category of applications often require the complex analysis of signals in the considered range of sampling frequencies such as vibrations or electrical currents, making the proposed system ideally suited for the implementation. The flexibility of the approach is demonstrated by taking examples with heterogeneous algorithmic specifications. Different data processing tasks executed by the sensor node hardware accelerator are modified at runtime according to application requests

TUbiblio

tuprints

Towards the development of flexible, reliable, reconfigurable, and high-performance imaging systems

Author: Khalifat Jalal Mohamed
Publication venue: The University of Edinburgh
Publication date: 29/11/2016
Field of study

Current FPGAs can implement large systems because of the high density of reconfigurable logic resources in a single chip. FPGAs are comprehensive devices that combine flexibility and high performance in the same platform compared to other platform such as General-Purpose Processors (GPPs) and Application Specific Integrated Circuits (ASICs). The flexibility of modern FPGAs is further enhanced by introducing Dynamic Partial Reconfiguration (DPR) feature, which allows for changing the functionality of part of the system while other parts are functioning. FPGAs became an important platform for digital image processing applications because of the aforementioned features. They can fulfil the need of efficient and flexible platforms that execute imaging tasks efficiently as well as the reliably with low power, high performance and high flexibility. The use of FPGAs as accelerators for image processing outperforms most of the current solutions. Current FPGA solutions can to load part of the imaging application that needs high computational power on dedicated reconfigurable hardware accelerators while other parts are working on the traditional solution to increase the system performance. Moreover, the use of the DPR feature enhances the flexibility of image processing further by swapping accelerators in and out at run-time. The use of fault mitigation techniques in FPGAs enables imaging applications to operate in harsh environments following the fact that FPGAs are sensitive to radiation and extreme conditions. The aim of this thesis is to present a platform for efficient implementations of imaging tasks. The research uses FPGAs as the key component of this platform and uses the concept of DPR to increase the performance, flexibility, to reduce the power dissipation and to expand the cycle of possible imaging applications. In this context, it proposes the use of FPGAs to accelerate the Image Processing Pipeline (IPP) stages, the core part of most imaging devices. The thesis has a number of novel concepts. The first novel concept is the use of FPGA hardware environment and DPR feature to increase the parallelism and achieve high flexibility. The concept also increases the performance and reduces the power consumption and area utilisation. Based on this concept, the following implementations are presented in this thesis: An implementation of Adams Hamilton Demosaicing algorithm for camera colour interpolation, which exploits the FPGA parallelism to outperform other equivalents. In addition, an implementation of Automatic White Balance (AWB), another IPP stage that employs DPR feature to prove the mentioned novelty aspects. Another novel concept in this thesis is presented in chapter 6, which uses DPR feature to develop a novel flexible imaging system that requires less logic and can be implemented in small FPGAs. The system can be employed as a template for any imaging application with no limitation. Moreover, discussed in this thesis is a novel reliable version of the imaging system that adopts novel techniques including scrubbing, Built-In Self Test (BIST), and Triple Modular Redundancy (TMR) to detect and correct errors using the Internal Configuration Access Port (ICAP) primitive. These techniques exploit the datapath-based nature of the implemented imaging system to improve the system's overall reliability. The thesis presents a proposal for integrating the imaging system with the Robust Reliable Reconfigurable Real-Time Heterogeneous Operating System (R4THOS) to get the best out of the system. The proposal shows the suitability of the proposed DPR imaging system to be used as part of the core system of autonomous cars because of its unbounded flexibility. These novel works are presented in a number of publications as shown in section 1.3 later in this thesis

Edinburgh Research Archive

A coarse-grain reconfigurable architecture for multimedia applications featuring subword computation capabilities

Author: Claudio Brunelli
Fabio Garzia
Jari Nurmi
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref