499 research outputs found

    Automatic synthesis of application-specific processors

    Get PDF
    Thesis (D. Tech. (Engineering: Electrical)) -- Central University of technology, Free State, 2012This thesis describes a method for the automatic generation of appli- cation speci_c processors. The thesis was organized into three sepa- rate but interrelated studies, which together provide: a justi_cation for the method used, a theory that supports the method, and a soft- ware application that realizes the method. The _rst study looked at how modern day microprocessors utilize their hardware resources and it proposed a metric, called core density, for measuring the utilization rate. The core density is a function of the microprocessor's instruction set and the application scheduled to run on that microprocessor. This study concluded that modern day microprocessors use their resources very ine_ciently and proposed the use of subset processors to exe- cute the same applications more e_ciently. The second study sought to provide a theoretical framework for the use of subset processors by developing a generic formal model of computer architecture. To demonstrate the model's versatility, it was used to describe a number of computer architecture components and entire computing systems. The third study describes the development of a set of software tools that enable the automatic generation of application speci_c proces- sors. The FiT toolkit automatically generates a unique Hardware Description Language (HDL) description of a processor based on an application binary _le and a parameterizable template of a generic mi- croprocessor. Area-optimized and performance-optimized custom soft processors were generated using the FiT toolkit and the utilization of the hardware resources by the custom soft processors was character- ized. The FiT toolkit was combined with an ANSI C compiler and a third-party tool for programming _eld-programmable gate arrays (FPGAs) to create an unconstrained C-to-silicon compiler

    Study and development of a software implemented fault injection plug-in for the Xception tool/powerPC 750

    Get PDF
    EstĂĄgio realizado na Critical Software e orientado pelo Eng.Âș Ricardo BarbosaTese de mestrado integrado. Engenharia InformĂĄtica e Computação. Faculdade de Engenharia. Universidade do Porto. 200

    A Multi-layer Fpga Framework Supporting Autonomous Runtime Partial Reconfiguration

    Get PDF
    Partial reconfiguration is a unique capability provided by several Field Programmable Gate Array (FPGA) vendors recently, which involves altering part of the programmed design within an SRAM-based FPGA at run-time. In this dissertation, a Multilayer Runtime Reconfiguration Architecture (MRRA) is developed, evaluated, and refined for Autonomous Runtime Partial Reconfiguration of FPGA devices. Under the proposed MRRA paradigm, FPGA configurations can be manipulated at runtime using on-chip resources. Operations are partitioned into Logic, Translation, and Reconfiguration layers along with a standardized set of Application Programming Interfaces (APIs). At each level, resource details are encapsulated and managed for efficiency and portability during operation. An MRRA mapping theory is developed to link the general logic function and area allocation information to the device related physical configuration level data by using mathematical data structure and physical constraints. In certain scenarios, configuration bit stream data can be read and modified directly for fast operations, relying on the use of similar logic functions and common interconnection resources for communication. A corresponding logic control flow is also developed to make the entire process autonomous. Several prototype MRRA systems are developed on a Xilinx Virtex II Pro platform. The Virtex II Pro on-chip PowerPC core and block RAM are employed to manage control operations while multiple physical interfaces establish and supplement autonomous reconfiguration capabilities. Area, speed and power optimization techniques are developed based on the developed Xilinx prototype. Evaluations and analysis of these prototype and techniques are performed on a number of benchmark and hashing algorithm case studies. The results indicate that based on a variety of test benches, up to 70% reduction in the resource utilization, up to 50% improvement in power consumption, and up to 10 times increase in run-time performance are achieved using the developed architecture and approaches compared with Xilinx baseline reconfiguration flow. Finally, a Genetic Algorithm (GA) for a FPGA fault tolerance case study is evaluated as a ultimate high-level application running on this architecture. It demonstrated that this is a hardware and software infrastructure that enables an FPGA to dynamically reconfigure itself efficiently under the control of a soft microprocessor core that is instantiated within the FPGA fabric. Such a system contributes to the observed benefits of intelligent control, fast reconfiguration, and low overhead

    Using Relocatable Bitstreams for Fault Tolerance

    Get PDF
    This research develops a method for relocating reconfigurable modules on the Virtex-II (Pro) family of Field Programmable Gate Arrays (FPGAs). A bitstream translation program is developed which correctly changes the location of a partial bitstream that implements a module on the FPGA. To take advantage of relocatable modules, three fault-tolerance circuit designs are developed and tested. This circuit can operate through a fault by efficiently removing the faulty module and replacing it with a relocated module without faults. The FPGA can recover from faults at a known location, without the need for external intervention using an embedded fault recovery system. The recovery system uses an internal PowerPC to relocate the modules and reprogram the FPGA. Due to the limited architecture of the target FPGA and Xilinx tool errors, an FPGA with automatic fault recovery could not be demonstrated. However, the various components needed to do this type of recovery have been implemented and demonstrated individually

    An architecture and technology for Ambient Intelligence Node

    Get PDF
    The era of separate networks is over. The existing technology leaders are preparing a big change in recreation of environment around us. There are several faces for this change. Names like Ambient Intelligence, Ambient Network, IP Multimedia Subsystem and others were created all over the Globe. Regardless of which name is used the new network will combine three main functional principles---it will be: contextual aware, ubiquitous access and intelligent interfaces unified network. Within this thesis two major aspects are defined. First, the definition of the Ambient Intelligence Environment concept is presented. Secondly the architecture vectors for the technology are named. A short overview of the existing technology is followed by details for the chosen technology---FPGA. The overall specifications are incorporated in the design and demonstration of a basic Ambient Intelligence Node created in the System on the Chip (SoC) FPGA technology

    Efficient Implementation on Low-Cost SoC-FPGAs of TLSv1.2 Protocol with ECC_AES Support for Secure IoT Coordinators

    Get PDF
    Security management for IoT applications is a critical research field, especially when taking into account the performance variation over the very different IoT devices. In this paper, we present high-performance client/server coordinators on low-cost SoC-FPGA devices for secure IoT data collection. Security is ensured by using the Transport Layer Security (TLS) protocol based on the TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256 cipher suite. The hardware architecture of the proposed coordinators is based on SW/HW co-design, implementing within the hardware accelerator core Elliptic Curve Scalar Multiplication (ECSM), which is the core operation of Elliptic Curve Cryptosystems (ECC). Meanwhile, the control of the overall TLS scheme is performed in software by an ARM Cortex-A9 microprocessor. In fact, the implementation of the ECC accelerator core around an ARM microprocessor allows not only the improvement of ECSM execution but also the performance enhancement of the overall cryptosystem. The integration of the ARM processor enables to exploit the possibility of embedded Linux features for high system flexibility. As a result, the proposed ECC accelerator requires limited area, with only 3395 LUTs on the Zynq device used to perform high-speed, 233-bit ECSMs in 413 ”s, with a 50 MHz clock. Moreover, the generation of a 384-bit TLS handshake secret key between client and server coordinators requires 67.5 ms on a low cost Zynq 7Z007S device

    An efficient graph representation for arithmetic circuit verification

    Full text link

    Software-Based Self-Test of Set-Associative Cache Memories

    Get PDF
    Embedded microprocessor cache memories suffer from limited observability and controllability creating problems during in-system tests. This paper presents a procedure to transform traditional march tests into software-based self-test programs for set-associative cache memories with LRU replacement. Among all the different cache blocks in a microprocessor, testing instruction caches represents a major challenge due to limitations in two areas: 1) test patterns which must be composed of valid instruction opcodes and 2) test result observability: the results can only be observed through the results of executed instructions. For these reasons, the proposed methodology will concentrate on the implementation of test programs for instruction caches. The main contribution of this work lies in the possibility of applying state-of-the-art memory test algorithms to embedded cache memories without introducing any hardware or performance overheads and guaranteeing the detection of typical faults arising in nanometer CMOS technologie
    • 

    corecore