12 research outputs found

    Lookup table partial reconfiguration for an evolvable hardware classifier system

    Full text link
    Abstract—The evolvable hardware (EHW) paradigm relies on continuous run-time reconfiguration of hardware. When applied on modern FPGAs, the technically challenging reconfiguration process becomes an issue and can be approached at multiple levels. In related work, virtual reconfigurable circuits (VRC), partial reconfiguration, and lookup table (LUT) reconfiguration approaches have been investigated. In this paper, we show how fine-grained partial reconfiguration of 6-input LUTs of modern Xilinx FPGAs can lead to significantly more efficient resource utilization in an EHW application. Neither manual placement nor any proprietary bitstream manipulation is required in the simplest form of the employed method. We specify the goal archi-tecture in VHDL and read out the locations of the automatically placed LUTs for use in an online reconfiguration setting. This allows for an easy and flexible architecture specification, as well as possible implementation improvements over a hand-placed design. For demonstration, we rely on a hardware signal classifier application. Our results show that the proposed approach can fit a classification circuit 4 times larger than an equivalent VRC-based approach, and 6 times larger than a shift register-based approach, in a Xilinx Virtex-5 device. To verify the reconfiguration process, a MicroBlaze-based embedded system is implemented, and reconfiguration is carried out via the Xilinx Internal Configuration Access Port (ICAP) and driver software. I

    Accelerating FPGA-based evolution of wavelet transform filters by optimized task scheduling

    Get PDF
    Adaptive embedded systems are required in various applications. This work addresses these needs in the area of adaptive image compression in FPGA devices. A simplified version of an evolution strategy is utilized to optimize wavelet filters of a Discrete Wavelet Transform algorithm. We propose an adaptive image compression system in FPGA where optimized memory architecture, parallel processing and optimized task scheduling allow reducing the time of evolution. The proposed solution has been extensively evaluated in terms of the quality of compression as well as the processing time. The proposed architecture reduces the time of evolution by 44% compared to our previous reports while maintaining the quality of compression unchanged with respect to existing implementations. The system is able to find an optimized set of wavelet filters in less than 2 min whenever the input type of data changes

    Hardware evolution of a digital circuit using a custom VLSI architecture

    Get PDF
    This research investigates three solutions to overcoming portability and scalability concerns in the Evolutionary Hardware (EHW) field. Firstly, the study explores if the V-FPGA—a new, portable Virtual-Reconfigurable-Circuit architecture—is a practical and viable evolution platform. Secondly, the research looks into two possible ways of making EHW systems more scalable: by optimising the system’s genetic algorithm; and by decomposing the solution circuit into smaller, evolvable sub-circuits or modules. GA optimisation is done is by: omitting a canonical GA’s crossover operator (i.e. by using an algorithm); applying evolution constraints; and optimising the fitness function. The circuit decomposition is done in order to demonstrate modular evolution. Three two-bit multiplier circuits and two sub-circuits of a simple, but real-world control circuit are evolved. The results show that the evolved multiplier circuits, when compared to a conventional multiplier, are either equal or more efficient. All the evolved circuits improve two of the four critical paths, and all are unique. Thus, it is experimentally shown that the V-FPGA is a viable hardware-platform on which hardware evolution can be implemented; and how hardware evolution is able to synthesise novel, optimised versions of conventional circuits. By comparing the and canonical GAs, the results verify that optimised GAs can find solutions quicker, and with fewer attempts. Part of the optimisation also includes a comprehensive critical-path analysis, where the findings show that the identification of dependent critical paths is vital in enhancing a GA’s efficiency. Finally, by demonstrating the modular evolution of a finite-state machine’s control circuit, it is found that although the control circuit as a whole makes use of more than double the available hardware resources on the V-FPGA and is therefore not evolvable, the evolution of each state’s sub-circuit is possible. Thus, modular evolution is shown to be a successful tool when dealing with scalability

    Hexarray: A Novel Self-Reconfigurable Hardware System

    Get PDF
    Evolvable hardware (EHW) is a powerful autonomous system for adapting and finding solutions within a changing environment. EHW consists of two main components: a reconfigurable hardware core and an evolutionary algorithm. The majority of prior research focuses on improving either the reconfigurable hardware or the evolutionary algorithm in place, but not both. Thus, current implementations suffer from being application oriented and having slow reconfiguration times, low efficiencies, and less routing flexibility. In this work, a novel evolvable hardware platform is proposed that combines a novel reconfigurable hardware core and a novel evolutionary algorithm. The proposed reconfigurable hardware core is a systolic array, which is called HexArray. HexArray was constructed using processing elements with a redesigned architecture, called HexCells, which provide routing flexibility and support for hybrid reconfiguration schemes. The improved evolutionary algorithm is a genome-aware genetic algorithm (GAGA) that accelerates evolution. Guided by a fitness function the GAGA utilizes context-aware genetic operators to evolve solutions. The operators are genome-aware constrained (GAC) selection, genome-aware mutation (GAM), and genome-aware crossover (GAX). The GAC selection operator improves parallelism and reduces the redundant evaluations. The GAM operator restricts the mutation to the part of the genome that affects the selected output. The GAX operator cascades, interleaves, or parallel-recombines genomes at the cell level to generate better genomes. These operators improve evolution while not limiting the algorithm from exploring all areas of a solution space. The system was implemented on a SoC that includes a programmable logic (i.e., field-programmable gate array) to realize the HexArray and a processing system to execute the GAGA. A computationally intensive application that evolves adaptive filters for image processing was chosen as a case study and used to conduct a set of experiments to prove the developed system robustness. Through an iterative process using the genetic operators and a fitness function, the EHW system configures and adapts itself to evolve fitter solutions. In a relatively short time (e.g., seconds), HexArray is able to evolve autonomously to the desired filter. By exploiting the routing flexibility in the HexArray architecture, the EHW has a simple yet effective mechanism to detect and tolerate faulty cells, which improves system reliability. Finally, a mechanism that accelerates the evolution process by hiding the reconfiguration time in an “evolve-while-reconfigure” process is presented. In this process, the GAGA utilizes the array routing flexibility to bypass cells that are being configured and evaluates several genomes in parallel

    Towards the development of a reliable reconfigurable real-time operating system on FPGAs

    Get PDF
    In the last two decades, Field Programmable Gate Arrays (FPGAs) have been rapidly developed from simple “glue-logic” to a powerful platform capable of implementing a System on Chip (SoC). Modern FPGAs achieve not only the high performance compared with General Purpose Processors (GPPs), thanks to hardware parallelism and dedication, but also better programming flexibility, in comparison to Application Specific Integrated Circuits (ASICs). Moreover, the hardware programming flexibility of FPGAs is further harnessed for both performance and manipulability, which makes Dynamic Partial Reconfiguration (DPR) possible. DPR allows a part or parts of a circuit to be reconfigured at run-time, without interrupting the rest of the chip’s operation. As a result, hardware resources can be more efficiently exploited since the chip resources can be reused by swapping in or out hardware tasks to or from the chip in a time-multiplexed fashion. In addition, DPR improves fault tolerance against transient errors and permanent damage, such as Single Event Upsets (SEUs) can be mitigated by reconfiguring the FPGA to avoid error accumulation. Furthermore, power and heat can be reduced by removing finished or idle tasks from the chip. For all these reasons above, DPR has significantly promoted Reconfigurable Computing (RC) and has become a very hot topic. However, since hardware integration is increasing at an exponential rate, and applications are becoming more complex with the growth of user demands, highlevel application design and low-level hardware implementation are increasingly separated and layered. As a consequence, users can obtain little advantage from DPR without the support of system-level middleware. To bridge the gap between the high-level application and the low-level hardware implementation, this thesis presents the important contributions towards a Reliable, Reconfigurable and Real-Time Operating System (R3TOS), which facilitates the user exploitation of DPR from the application level, by managing the complex hardware in the background. In R3TOS, hardware tasks behave just like software tasks, which can be created, scheduled, and mapped to different computing resources on the fly. The novel contributions of this work are: 1) a novel implementation of an efficient task scheduler and allocator; 2) implementation of a novel real-time scheduling algorithm (FAEDF) and two efficacious allocating algorithms (EAC and EVC), which schedule tasks in real-time and circumvent emerging faults while maintaining more compact empty areas. 3) Design and implementation of a faulttolerant microprocessor by harnessing the existing FPGA resources, such as Error Correction Code (ECC) and configuration primitives. 4) A novel symmetric multiprocessing (SMP)-based architectures that supports shared memory programing interface. 5) Two demonstrations of the integrated system, including a) the K-Nearest Neighbour classifier, which is a non-parametric classification algorithm widely used in various fields of data mining; and b) pairwise sequence alignment, namely the Smith Waterman algorithm, used for identifying similarities between two biological sequences. R3TOS gives considerably higher flexibility to support scalable multi-user, multitasking applications, whereby resources can be dynamically managed in respect of user requirements and hardware availability. Benefiting from this, not only the hardware resources can be more efficiently used, but also the system performance can be significantly increased. Results show that the scheduling and allocating efficiencies have been improved up to 2x, and the overall system performance is further improved by ~2.5x. Future work includes the development of Network on Chip (NoC), which is expected to further increase the communication throughput; as well as the standardization and automation of our system design, which will be carried out in line with the enablement of other high-level synthesis tools, to allow application developers to benefit from the system in a more efficient manner

    A Practical Hardware Implementation of Systemic Computation

    Get PDF
    It is widely accepted that natural computation, such as brain computation, is far superior to typical computational approaches addressing tasks such as learning and parallel processing. As conventional silicon-based technologies are about to reach their physical limits, researchers have drawn inspiration from nature to found new computational paradigms. Such a newly-conceived paradigm is Systemic Computation (SC). SC is a bio-inspired model of computation. It incorporates natural characteristics and defines a massively parallel non-von Neumann computer architecture that can model natural systems efficiently. This thesis investigates the viability and utility of a Systemic Computation hardware implementation, since prior software-based approaches have proved inadequate in terms of performance and flexibility. This is achieved by addressing three main research challenges regarding the level of support for the natural properties of SC, the design of its implied architecture and methods to make the implementation practical and efficient. Various hardware-based approaches to Natural Computation are reviewed and their compatibility and suitability, with respect to the SC paradigm, is investigated. FPGAs are identified as the most appropriate implementation platform through critical evaluation and the first prototype Hardware Architecture of Systemic computation (HAoS) is presented. HAoS is a novel custom digital design, which takes advantage of the inbuilt parallelism of an FPGA and the highly efficient matching capability of a Ternary Content Addressable Memory. It provides basic processing capabilities in order to minimize time-demanding data transfers, while the optional use of a CPU provides high-level processing support. It is optimized and extended to a practical hardware platform accompanied by a software framework to provide an efficient SC programming solution. The suggested platform is evaluated using three bio-inspired models and analysis shows that it satisfies the research challenges and provides an effective solution in terms of efficiency versus flexibility trade-off

    Dynamic partial reconfiguration management for high performance and reliability in FPGAs

    Get PDF
    Modern Field-Programmable Gate Arrays (FPGAs) are no longer used to implement small “glue logic” circuitries. The high-density of reconfigurable logic resources in today’s FPGAs enable the implementation of large systems in a single chip. FPGAs are highly flexible devices; their functionality can be altered by simply loading a new binary file in their configuration memory. While the flexibility of FPGAs is comparable to General-Purpose Processors (GPPs), in the sense that different functions can be performed using the same hardware, the performance gain that can be achieved using FPGAs can be orders of magnitudes higher as FPGAs offer the ability for customisation of parallel computational architectures. Dynamic Partial Reconfiguration (DPR) allows for changing the functionality of certain blocks on the chip while the rest of the FPGA is operational. DPR has sparked the interest of researchers to explore new computational platforms where computational tasks are off-loaded from a main CPU to be executed using dedicated reconfigurable hardware accelerators configured on demand at run-time. By having a battery of custom accelerators which can be swapped in and out of the FPGA at runtime, a higher computational density can be achieved compared to static systems where the accelerators are bound to fixed locations within the chip. Furthermore, the ability of relocating these accelerators across several locations on the chip allows for the implementation of adaptive systems which can mitigate emerging faults in the FPGA chip when operating in harsh environments. By porting the appropriate fault mitigation techniques in such computational platforms, the advantages of FPGAs can be harnessed in different applications in space and military electronics where FPGAs are usually seen as unreliable devices due to their sensitivity to radiation and extreme environmental conditions. In light of the above, this thesis investigates the deployment of DPR as: 1) a method for enhancing performance by efficient exploitation of the FPGA resources, and 2) a method for enhancing the reliability of systems intended to operate in harsh environments. Achieving optimal performance in such systems requires an efficient internal configuration management system to manage the reconfiguration and execution of the reconfigurable modules in the FPGA. In addition, the system needs to support “fault-resilience” features by integrating parameterisable fault detection and recovery capabilities to meet the reliability standard of fault-tolerant applications. This thesis addresses all the design and implementation aspects of an Internal Configuration Manger (ICM) which supports a novel bitstream relocation model to enable the placement of relocatable accelerators across several locations on the FPGA chip. In addition to supporting all the configuration capabilities required to implement a Reconfigurable Operating System (ROS), the proposed ICM also supports the novel multiple-clone configuration technique which allows for cloning several instances of the same hardware accelerator at the same time resulting in much shorter configuration time compared to traditional configuration techniques. A faulttolerant (FT) version of the proposed ICM which supports a comprehensive faultrecovery scheme is also introduced in this thesis. The proposed FT-ICM is designed with a much smaller area footprint compared to Triple Modular Redundancy (TMR) hardening techniques while keeping a comparable level of fault-resilience. The capabilities of the proposed ICM system are demonstrated with two novel applications. The first application demonstrates a proof-of-concept reliable FPGA server solution used for executing encryption/decryption queries. The proposed server deploys bitstream relocation and modular redundancy to mitigate both permanent and transient faults in the device. It also deploys a novel Built-In Self- Test (BIST) diagnosis scheme, specifically designed to detect emerging permanent faults in the system at run-time. The second application is a data mining application where DPR is used to increase the computational density of a system used to implement the Frequent Itemset Mining (FIM) problem

    Low-power adaptive control scheme using switching activity measurement method for reconfigurable analog-to-digital converters

    Get PDF
    Power consumption is a critical issue for portable devices. The ever-increasing demand for multimode wireless applications and the growing concerns towards power-aware green technology make dynamically reconfigurable hardware an attractive solution for overcoming the power issue. This is due to its advantages of flexibility, reusability, and adaptability. During the last decade, reconfigurable analog-to-digital converters (ReADCs) have been used to support multimode wireless applications. With the ability to adaptively scale the power consumption according to different operation modes, reconfigurable devices utilise the power supply efficiently. This can prolong battery life and reduce unnecessary heat emission to the environment. However, current adaptive mechanisms for ReADCs rely upon external control signals generated using digital signal processors (DSPs) in the baseband. This thesis aims to provide a single-chip solution for real-time and low-power ReADC implementations that can adaptively change the converter resolution according to signal variations without the need of the baseband processing. Specifically, the thesis focuses on the analysis, design and implementation of a low-power digital controller unit for ReADCs. In this study, the following two important reconfigurability issues are investigated: i) the detection mechanism for an adaptive implementation, and ii) the measure of power and area overheads that are introduced by the adaptive control modules. This thesis outlines four main achievements to address these issues. The first achievement is the development of the switching activity measurement (SWAM) method to detect different signal components based upon the observation of the output of an ADC. The second achievement is a proposed adaptive algorithm for ReADCs to dynamically adjust the resolution depending upon the variations in the input signal. The third achievement is an ASIC implementation of the adaptive control module for ReADCs. The module achieves low reconfiguration overheads in terms of area and power compared with the main analog part of a ReADC. The fourth achievement is the development of a low-power noise detection module using a conventional ADC for signal improvement. Taken together, the findings from this study demonstrate the potential use of switching activity information of an ADC to adaptively control the circuits, and simultaneously expanding the functionality of the ADC in electronic systems

    Proposal and development of a highly modular and scalable self-adaptive hardware architecture with parallel processing capability

    Get PDF
    This dissertation describes a novel unconventional self-adaptive hardware architecture with capacity for parallel processing. For scalability issues, this bioinspired architecture is based on a regular array of homogeneous cells. The proposed programmable architecture implements in a distributed way self-adaptive capabilities including self-placement and self-routing which, due to its intrinsic design, enable the development of systems with runtime reconfiguration, self-repair and/or fault tolerance capabilities. The physical implementation of this architecture is composed of two-layers, interconnected cells in the first level and interconnected switch and pin matrices in the second level. The cell is the basic element of the proposed self-adaptive architecture. Any application scheduled to the system has to be organized in components, where each component is composed by one or more interconnected cells. The interconnection of cells inside a component is made at cell level (first layer), while the physical interconnections of components are made in the second layer. Additionally, two layers are defined as conceptual organization for the implementation of general purpose applications: the SANE and the SANE assembly. The Self-Adaptive Networked Entity (SANE) is composed by a group of components. This is the basic self-adaptive computing system. It has the ability to monitor its local environment and its internal computation process. The SANE-Assembly (SANE-ASM) is composed by a group of interconnected SANEs. The processing capabilities of the cell are included in its Functional Unit (FU), which can be described as a four-core configurable multicomputer. The FU includes twelve programmable configuration modes, i.e., each cell permits to select from one to four processors working in parallel, with different size of program and data memories. The self-adaptive capabilities of the cell are executed mainly by the Cell Configuration Unit (CCU). The self-placement algorithm is responsible for finding out the most suitable position in the cell array to insert the new cell of a component. The self-routing algorithm permits interconnecting the ports of the FU of two cells through the cell ports. The self-placement and self-routing processes allow for performing complex functionality changes in real time, these processes endow the system with enhanced functionality, enabling the system to change itself, this allows for the implementation of run-time self-configuration, without the need for any configuration manager. The architecture proposed includes two mechanisms of fault tolerance. One of these is the Dynamic Fault Tolerance Scaling Technique, that has the ability to create and eliminate the redundant copies of the functional section of a specific application. The other mechanism of fault tolerance is a dedicated or static Fault Tolerance System. It provides redundant processing capabilities that are working continuously. When a failure in the execution of a program is detected, the processors of the cell are stopped and the self-elimination and self-replication processes start for the cell (or cells) involved in the failure. An FPGA-based prototype and a software tool have been built for demonstration purposes. The prototype includes all the self-adaptive capabilities described in this dissertation. With the purpose of having a complete development system, the software tool SANE Project Developer (SPD) has been implemented. The SPD is an Integrated Development Environment (IDE) that allows generating the memory initialization data for the control microprocessor inside the prototype.Esta tesis doctoral describe una arquitectura de hardware auto-adaptable novedosa y no convencional con capacidad de procesamiento en paralelo. Por razones de escalabilidad, esta arquitectura bioinspirada está basada en una matriz regular de células homogéneas. La arquitectura propuesta es programable, e implementa de manera distribuida diversas capacidades auto-adaptables incluyendo el auto-emplazamiento y auto-enrutamiento, los cuales debido a su diseño intrínseco, permiten el desarrollo de sistemas reconfigurables en tiempo de ejecución, así como de sistemas autoreparables y/o con capacidades de tolerancia a fallos. La implementación física de esta arquitectura esta compuesta de dos capas, que incluyen células interconectadas en el primer nivel y matrices de conmutación y pines en el segundo nivel. La célula es el elemento básico de la arquitectura propuesta. Cualquier aplicación que se quiera programar en el sistema debe estar organizada en componentes, donde cada componente está compuesto por una o más células interconectadas. La interconexión de células dentro de un componente es realizado en el mismo nivel de la matriz de células, mientras que la interconexión de componentes es realizada en la segunda capa. Adicionalmente, se definen dos capas conceptuales que son usadas con propósitos organizativos en aplicaciones de propósito general, estas son: el SANE y el SANE-assembly (o conjunto de SANEs). La entidad auto-adaptable interconectada o SANE está compuesta por un grupo de componentes. Este es el sistema de computación auto-adaptable básico, el cual tiene la habilidad de monitorizar su entorno local y su proceso de computación interno. Las capacidades de procesamiento de la célula están incluidas en su unidad funcional (FU). Esta puede ser definida como un multicomputador configurable con cuatro núcleos, los cuales son agrupados o no dependiendo del modo de configuración. La FU tiene doce modos de configuración programables, por lo que cada célula permite seleccionar entre uno y cuatro procesadores trabajando en paralelo con diversas capacidades en las memorias de programa y datos. Las capacidades auto-adaptables de la célula son ejecutadas principalmente por la unidad de configuración de la célula (CCU). El algoritmo de auto-emplazamiento es el encargado de encontrar la posición mas adecuada dentro de la matriz de células para insertar la nueva célula de un componente. El algoritmo de auto-enrutamiento permite interconectar los puertos de las FU de dos células. Los procesos de auto-emplazamiento y auto-enrutamiento permiten realizar en tiempo real cambios funcionales complejos; estos procesos dotan al sistema de una mayor funcionalidad, permitiendo que el sistema cambie por si mismo, lo que permite la implementación de la auto-configuración en tiempo real, sin la necesidad de ningún gestor de configuración. La arquitectura propuesta incluye dos mecanismos de tolerancia a fallos. Uno de estos es una técnica escalonada y dinámica de tolerancia a fallos, que tiene la habilidad de crear y eliminar copias redundantes de la unidad funcional (o de cómputo) de una aplicación específica. El otro mecanismo de tolerancia a fallos es el Sistema de Tolerancia a Fallos dedicado o estático. Este provee capacidades de procesamiento redundante que están en funcionamiento continuamente. Cuando un fallo en la ejecución de un programa es detectado, los procesadores de la célula son detenidos y los procesos de auto-eliminación y auto-replicación se inician para la célula (o células) implicada en el fallo. Se desarrolló un prototipo basado en FPGAs y una herramienta de software para comprobar la funcionalidad del sistema. El prototipo incluye todas las características de los sistemas auto-adaptable descritas en este trabajo. El SANE Project developer (SPD) es un ambiente integrado de desarrollo (IDE) que permite generar y descargar la memoria de inicialización de datos para el Microprocesador de Control dentro del prototipo
    corecore