30 research outputs found

    MLCAD: A Survey of Research in Machine Learning for CAD Keynote Paper

    Get PDF

    Cross-Layer Automated Hardware Design for Accuracy-Configurable Approximate Computing

    Get PDF
    Approximate Computing trades off computation accuracy against performance or energy efficiency. It is a design paradigm that arose in the last decade as an answer to diminishing returns from Dennard\u27s scaling and a shift in the prominent workloads. A range of modern workloads, categorized mainly as recognition, mining, and synthesis, features an inherent tolerance to approximations. Their characteristics, such as redundancies in their input data and robust-to-noise algorithms, allow them to produce outputs of acceptable quality, despite an approximation in some of their computations. Approximate Computing leverages the application tolerance by relaxing the exactness in computation towards primary design goals of increasing performance or improving energy efficiency. Existing techniques span across the abstraction layers of computer systems where cross-layer techniques are shown to offer a larger design space and yield higher savings. Currently, the majority of the existing work aims at meeting a single accuracy. The extent of approximation tolerance, however, significantly varies with a change in input characteristics and applications. In this dissertation, methods and implementations are presented for cross-layer and automated design of accuracy-configurable Approximate Computing to maximally exploit the performance and energy benefits. In particular, this dissertation addresses the following challenges and introduces novel contributions: A main Approximate Computing category in hardware is to scale either voltage or frequency beyond the safe limits for power or performance benefits, respectively. The rationale is that timing errors would be gradual and for an initial range tolerable. This scaling enables a fine-grain accuracy-configurability by varying the timing error occurrence. However, conventional synthesis tools aim at meeting a single delay for all paths within the circuit. Subsequently, with voltage or frequency scaling, either all paths succeed, or a large number of paths fail simultaneously, with a steep increase in error rate and magnitude. This dissertation presents an automated method for minimizing path delays by individually constraining the primary outputs of combinational circuits. As a result, it reduces the number of failing paths and makes the timing errors significantly more gradual, and also rarer and smaller on average. Additionally, it reveals that delays can be significantly reduced towards the least significant bit (LSB) and allows operating at a higher frequency when small operands are computed. Precision scaling, i.e., reducing the representation of data and its accuracy is widely used in multiple abstraction layers in Approximate Computing. Reducing data precision also reduces the transistor toggles, and therefore the dynamic power consumption. Application and architecture level precision scaling results in using only LSBs of the circuit. Arithmetic circuits often have less complexity and logic depth in LSBs compared to most significant bits (MSB). To take advantage of this circuit property, a delay-altering synthesis methodology is proposed. The method finds energy-optimal delay values under configurable precision usage and assigns them to primary outputs used for different precisions. Thereby, it enables dynamic frequency-precision scalable circuits for energy efficiency. Within the hardware architecture, it is possible to instantiate multiple units with the same functionality with different fixed approximation levels, where each block benefits from having fewer transistors and also synthesis relaxations. These blocks can be selected dynamically and thus allow to configure the accuracy during runtime. Instantiating such approximate blocks can be a lower dynamic power but higher area and leakage cost alternative to the current state-of-the-art gating mechanisms which switch off a group of paths in the circuit to reduce the toggling activity. Jointly, instantiating multiple blocks and gating mechanisms produce a large design space of accuracy-configurable hardware, where energy-optimal solutions require a cross-layer search in architecture and circuit levels. To that end, an approximate hardware synthesis methodology is proposed with joint optimizations in architecture and circuit for dynamic accuracy scaling, and thereby it enables energy vs. area trade-offs

    Embedded electronic systems driven by run-time reconfigurable hardware

    Get PDF
    Abstract This doctoral thesis addresses the design of embedded electronic systems based on run-time reconfigurable hardware technology –available through SRAM-based FPGA/SoC devices– aimed at contributing to enhance the life quality of the human beings. This work does research on the conception of the system architecture and the reconfiguration engine that provides to the FPGA the capability of dynamic partial reconfiguration in order to synthesize, by means of hardware/software co-design, a given application partitioned in processing tasks which are multiplexed in time and space, optimizing thus its physical implementation –silicon area, processing time, complexity, flexibility, functional density, cost and power consumption– in comparison with other alternatives based on static hardware (MCU, DSP, GPU, ASSP, ASIC, etc.). The design flow of such technology is evaluated through the prototyping of several engineering applications (control systems, mathematical coprocessors, complex image processors, etc.), showing a high enough level of maturity for its exploitation in the industry.Resumen Esta tesis doctoral abarca el diseño de sistemas electrónicos embebidos basados en tecnología hardware dinámicamente reconfigurable –disponible a través de dispositivos lógicos programables SRAM FPGA/SoC– que contribuyan a la mejora de la calidad de vida de la sociedad. Se investiga la arquitectura del sistema y del motor de reconfiguración que proporcione a la FPGA la capacidad de reconfiguración dinámica parcial de sus recursos programables, con objeto de sintetizar, mediante codiseño hardware/software, una determinada aplicación particionada en tareas multiplexadas en tiempo y en espacio, optimizando así su implementación física –área de silicio, tiempo de procesado, complejidad, flexibilidad, densidad funcional, coste y potencia disipada– comparada con otras alternativas basadas en hardware estático (MCU, DSP, GPU, ASSP, ASIC, etc.). Se evalúa el flujo de diseño de dicha tecnología a través del prototipado de varias aplicaciones de ingeniería (sistemas de control, coprocesadores aritméticos, procesadores de imagen, etc.), evidenciando un nivel de madurez viable ya para su explotación en la industria.Resum Aquesta tesi doctoral està orientada al disseny de sistemes electrònics empotrats basats en tecnologia hardware dinàmicament reconfigurable –disponible mitjançant dispositius lògics programables SRAM FPGA/SoC– que contribueixin a la millora de la qualitat de vida de la societat. S’investiga l’arquitectura del sistema i del motor de reconfiguració que proporcioni a la FPGA la capacitat de reconfiguració dinàmica parcial dels seus recursos programables, amb l’objectiu de sintetitzar, mitjançant codisseny hardware/software, una determinada aplicació particionada en tasques multiplexades en temps i en espai, optimizant així la seva implementació física –àrea de silici, temps de processat, complexitat, flexibilitat, densitat funcional, cost i potència dissipada– comparada amb altres alternatives basades en hardware estàtic (MCU, DSP, GPU, ASSP, ASIC, etc.). S’evalúa el fluxe de disseny d’aquesta tecnologia a través del prototipat de varies aplicacions d’enginyeria (sistemes de control, coprocessadors aritmètics, processadors d’imatge, etc.), demostrant un nivell de maduresa viable ja per a la seva explotació a la indústria

    RTRLIB : a high-level modeling tool for dynamically partially reconfigurable systems

    Get PDF
    Dissertação (mestrado)—Universidade de Brasília, Faculdade de Tecnologia, Departamento de Engenharia Mecânica, 2020.Reconfiguração dinâmica parcial é considerada uma interessante técnica a ser aplicada para o aumento da flexibilidade de sistemas implementados em FPGA, em função da implementação dinâmica de módulos de hardware enquanto o restante do circuito permanece em operação. Trata- se de uma técnica utilizada em sistemas com requisitos muito restritos, como adaptabilidade, robustez, consumo de potência, custo e tolerância à falhas. Entretanto, a complexidade de desen- volvimento de sistemas com reconfiguração dinâmica parcial é consideravelmente alta quando comparada à de sistemas com lógica totalmente estática. Nesse sentido, novas metodologias e ferramentas de desenvolvimento são necessárias para reduzir a complexidade de implementação desse tipo de sistema. Nesse contexto, esse trabalho apresenta o RTRLib, uma ferramenta de modelagem em alto nível para o desenvolvimento de sistemas com reconfiguração dinâmica parcial em dispositivos Xilinx Zynq a partir da especificação e parametrização de alguns blocos. Sob condições específi- cas, o RTRLib automaticamante produz os scripts de hardware e software para implementação da solução utilizando o Vivado Design Suite e o SDK. Tais scripts são compostos pelos comandos necessários para a implementação do sistema desde a criação do projeto de hardware até a criação do arquivo de boot. Uma vez que o RTRLib é composto por IP-Cores previamente caracterizados, a ferramenta também pode ser utilizada para a análise, em fase de modelagem, do sistema a ser implementado, por meio da estimação de características importantes do sistema, como o consumo de recursos e latência. O presente trabalho também inclui novas funcionalidades implementadas no RTRLib no con- texto do design de hardware e de software, como: generalização do script de hardware, mapea- mento de IO, floorplanning por meio de uma GUI, criação de um gerador de script de software, gerador de template de aplicação standalone que faz uso do partial reconfiguration controller (PRC) e implementação de uma biblioteca para aplicações FreeRTOS. Por fim, quatro estudos de casos foram implementados para demonstrar as funcionalidades da ferramenta: um sistema de classificação de terrenos baseado em redes neurais, um sistema com regressores lineares utilizado para controle de uma prótese miocinética de mão e, por último, uma aplicação hipotética de um sistema com requisitos de tempo real.Partial dynamic reconfiguration is considered an interesting technique to increase flexibility in FPGA designs due to the dynamic replacement of hardware modules while the remainder of the circuit remains in operation. It is used in systems with hard requirements such as adaptability, robustness, power consumption, cost, and fault-tolerance. However, the complexity to develop dynamically partially reconfigurable systems in considerably higher comparing with static de- signs. Therefore, new design methodologies and tools have been required to reduce the design complexity of such systems. In this context, this work presents the RTRLib, a high-level modeling tool for the development of dynamically reconfigurable systems on Xilinx Zynq devices by a simple system specification and parametrization of some blocks. Under specific conditions, RTRLib automatically generates the hardware and software scripts to implement the solution using Vivado and SDK. These scripts are composed by the sequential design steps from hardware project creation to the boot image elaboration. Since RTRLib is composed of pre-characterized IP-Cores, the tool also can be used to analyze the system behavior during the design process by the early estimation of essential characteristics of the system such as resource consumption and latency. The present work also includes the new functionalities implemented on RTRLib in the context of the hardware and the software design, such as: hardware script generalization, IO mapping, floorplanning by a GUI, software script creation, generator of a standalone template application that uses PRC, and implementation of a FreeRTOS library application. Finally, four case studies were implemented to demonstrate the tool capability: a system for terrain classification based on neuron networks, a linear regressor system used to control a myokinetic-based prosthetic hand, and a hypothetical real-time application

    Dependable Embedded Systems

    Get PDF
    This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from today’s points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems

    Cost-Efficient Soft-Error Resiliency for ASIP-based Embedded Systems

    Full text link
    Recent decades have witnessed the rapid growth of embedded systems. At present, embedded systems are widely applied in a broad range of critical applications including automotive electronics, telecommunication, healthcare, industrial electronics, consumer electronics military and aerospace. Human society will continue to be greatly transformed by the pervasive deployment of embedded systems. Consequently, substantial amount of efforts from both industry and academic communities have contributed to the research and development of embedded systems. Application-specific instruction-set processor (ASIP) is one of the key advances in embedded processor technology, and a crucial component in some embedded systems. Soft errors have been directly observed since the 1970s. As devices scale, the exponential increase in the integration of computing systems occurs, which leads to correspondingly decrease in the reliability of computing systems. Today, major research forums state that soft errors are one of the major design technology challenges at and beyond the 22 nm technology node. Therefore, a large number of soft-error solutions, including error detection and recovery, have been proposed from differing perspectives. Nonetheless, most of the existing solutions are designed for general or high-performance systems which are different to embedded systems. For embedded systems, the soft-error solutions must be cost-efficient, which requires the tailoring of the processor architecture with respect to the feature of the target application. This thesis embodies a series of explorations for cost-efficient soft-error solutions for ASIP-based embedded systems. In this exploration, five major solutions are proposed. The first proposed solution realizes checkpoint recovery in ASIPs. By generating customized instructions, ASIP-implemented checkpoint recovery can perform at a finer granularity than what was previously possible. The fault-free performance overhead of this solution is only 1.45% on average. The recovery delay is only 62 cycles at the worst case. The area and leakage power overheads are 44.4% and 45.6% on average. The second solution explores utilizing two primitive error recovery techniques jointly. This solution includes three application-specific optimization methodologies. This solution generates the optimized error-resilient ASIPs, based on the characteristics of primitive error recovery techniques, static reliability analysis and design constraints. The resultant ASIP can be configured to perform at runtime according to the optimized recovery scheme. This solution can strategically enhance cost-efficiency for error recovery. In order to guarantee cost-efficiency in unpredictable runtime situations, the third solution explores runtime adaptation for error recovery. This solution aims to budget and adapt the error recovery operations, so as to spend the resources intelligently and to tolerate adverse influences of runtime variations. The resultant ASIP can make runtime decisions to determine the activation of spatial and temporal redundancies, according to the runtime situations. At the best case, this solution can achieve almost 50x reliability gain over the state of the art solutions. Given the increasing demand for multi-core computing systems, the last two proposed solutions target error recovery in multi-core ASIPs. The first solution of these two explores ASIP-implemented fine-grained process migration. This solution is a key infrastructure, which allows cost-efficient task management, for realizing cost-efficient soft-error recovery in multi-core ASIPs. The average time cost is only 289 machine cycles to perform process migration. The last solution explores using dynamic and adaptive mapping to assign heterogeneous recovery operations to the tasks in the multi-core context. This solution allows each individual ASIP-based processing core to dynamically adapt its specific error recovery functionality according to the corresponding task's characteristics, in terms of soft error vulnerability and execution time deadline. This solution can significantly improve the reliability of the system by almost two times, with graceful constraint penalty, in comparison to the state-of-the-art counterparts

    An Adaptive Modular Redundancy Technique to Self-regulate Availability, Area, and Energy Consumption in Mission-critical Applications

    Get PDF
    As reconfigurable devices\u27 capacities and the complexity of applications that use them increase, the need for self-reliance of deployed systems becomes increasingly prominent. A Sustainable Modular Adaptive Redundancy Technique (SMART) composed of a dual-layered organic system is proposed, analyzed, implemented, and experimentally evaluated. SMART relies upon a variety of self-regulating properties to control availability, energy consumption, and area used, in dynamically-changing environments that require high degree of adaptation. The hardware layer is implemented on a Xilinx Virtex-4 Field Programmable Gate Array (FPGA) to provide self-repair using a novel approach called a Reconfigurable Adaptive Redundancy System (RARS). The software layer supervises the organic activities within the FPGA and extends the self-healing capabilities through application-independent, intrinsic, evolutionary repair techniques to leverage the benefits of dynamic Partial Reconfiguration (PR). A SMART prototype is evaluated using a Sobel edge detection application. This prototype is shown to provide sustainability for stressful occurrences of transient and permanent fault injection procedures while still reducing energy consumption and area requirements. An Organic Genetic Algorithm (OGA) technique is shown capable of consistently repairing hard faults while maintaining correct edge detector outputs, by exploiting spatial redundancy in the reconfigurable hardware. A Monte Carlo driven Continuous Markov Time Chains (CTMC) simulation is conducted to compare SMART\u27s availability to industry-standard Triple Modular Technique (TMR) techniques. Based on nine use cases, parameterized with realistic fault and repair rates acquired from publically available sources, the results indicate that availability is significantly enhanced by the adoption of fast repair techniques targeting aging-related hard-faults. Under harsh environments, SMART is shown to improve system availability from 36.02% with lengthy repair techniques to 98.84% with fast ones. This value increases to five nines (99.9998%) under relatively more favorable conditions. Lastly, SMART is compared to twenty eight standard TMR benchmarks that are generated by the widely-accepted BL-TMR tools. Results show that in seven out of nine use cases, SMART is the recommended technique, with power savings ranging from 22% to 29%, and area savings ranging from 17% to 24%, while still maintaining the same level of availability

    Segurança de computadores por meio de autenticação intrínseca de hardware

    Get PDF
    Orientadores: Guido Costa Souza de Araújo, Mario Lúcio Côrtes e Diego de Freitas AranhaTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Neste trabalho apresentamos Computer Security by Hardware-Intrinsic Authentication (CSHIA), uma arquitetura de computadores segura para sistemas embarcados que tem como objetivo prover autenticidade e integridade para código e dados. Este trabalho está divido em três fases: Projeto da Arquitetura, sua Implementação, e sua Avaliação de Segurança. Durante a fase de projeto, determinamos como integridade e autenticidade seriam garantidas através do uso de Funções Fisicamente Não Clonáveis (PUFs) e propusemos um algoritmo de extração de chaves criptográficas de memórias cache de processadores. Durante a implementação, flexibilizamos o projeto da arquitetura para fornecer diferentes possibilidades de configurações sem comprometimento da segurança. Então, avaliamos seu desempenho levando em consideração o incremento em área de chip, aumento de consumo de energia e memória adicional para diferentes configurações. Por fim, analisamos a segurança de PUFs e desenvolvemos um novo ataque de canal lateral que circunvê a propriedade de unicidade de PUFs por meio de seus elementos de construçãoAbstract: This work presents Computer Security by Hardware-Intrinsic Authentication (CSHIA), a secure computer architecture for embedded systems that aims at providing authenticity and integrity for code and data. The work encompassed three phases: Design, Implementation, and Security Evaluation. In design, we laid out the basic ideas behind CSHIA, namely, how integrity and authenticity are employed through the use of Physical Unclonable Functions (PUFs), and we proposed an algorithm to extract cryptographic keys from the intrinsic memories of processors. In implementation, we made CSHIA¿s design more flexible, allowing different configurations without compromising security. Then, we evaluated CSHIA¿s performance and overheads, such as area, energy, and memory, for multiple configurations. Finally, we evaluated security of PUFs, which led us to develop a new side-channel-based attack that enabled us to circumvent PUFs¿ uniqueness property through their architectural elementsDoutoradoCiência da ComputaçãoDoutor em Ciência da Computação2015/06829-2; 2016/25532-3147614/2014-7FAPESPCNP

    Implementation of an AMIDAR-based Java Processor

    Get PDF
    This thesis presents a Java processor based on the Adaptive Microinstruction Driven Architecture (AMIDAR). This processor is intended as a research platform for investigating adaptive processor architectures. Combined with a configurable accelerator, it is able to detect and speed up hot spots of arbitrary applications dynamically. In contrast to classical RISC processors, an AMIDAR-based processor consists of four main types of components: a token machine, functional units (FUs), a token distribution network and an FU interconnect structure. The token machine is a specialized functional unit and controls the other FUs by means of tokens. These tokens are delivered to the FUs over the token distribution network. The tokens inform the FUs about what to do with input data and where to send the results. Data is exchanged among the FUs over the FU interconnect structure. Based on the virtual machine architecture defined by the Java bytecode, a total of six FUs have been developed for the Java processor, namely a frame stack, a heap manager, a thread scheduler, a debugger, an integer ALU and a floating-point unit. Using these FUs, the processor can already execute the SPEC JVM98 benchmark suite properly. This indicates that it can be employed to run a broad variety of applications rather than embedded software only. Besides bytecode execution, several enhanced features have also been implemented in the processor to improve its performance and usability. First, the processor includes an object cache using a novel cache index generation scheme that provides a better average hit rate than the classical XOR-based scheme. Second, a hardware garbage collector has been integrated into the heap manager, which greatly reduces the overhead caused by the garbage collection process. Third, thread scheduling has been realized in hardware as well, which allows it to be performed concurrently with the running application. Furthermore, a complete debugging framework has been developed for the processor, which provides powerful debugging functionalities at both software and hardware levels
    corecore