605 research outputs found

    Aika-digitaalimuunnin laajakaistaisiin aikapohjaisiin analogia-digitaalimuuntimiin

    Get PDF
    Modern deeply scaled semiconductor processes make the design of voltage-domain circuits increasingly challenging. On the contrary, the area and power consumption of digital circuits are improving with every new process node. Consequently, digital solutions are designed in place of their purely analog counterparts in applications such as analog-to-digital (A/D) conversion. Time-based analog-to-digital converters (ADC) employ digital-intensive architectures by processing analog quantities in time-domain. The quantization step of the time-based A/D-conversion is carried out by a time-to-digital converter (TDC). A free-running ring oscillator -based TDC design is presented for use in wideband time-based ADCs. The proposed architecture aims to maximize time resolution and full-scale range, and to achieve error resilient conversion performance with minimized power and area consumptions. The time resolution is maximized by employing a high-frequency multipath ring oscillator, and the full-scale range is extended using a high-speed gray counter. The error resilience is achieved by custom sense-amplifier -based sampling flip-flops, gray coded counter and a digital error correction algorithm for counter sampling error correction. The implemented design achieves up to 9-bit effective resolution at 250 MS/s with 4.3 milliwatt power consumption.Modernien puolijohdeteknologioiden skaalautumisen seurauksena jännitetason piirien suunnittelu tulee entistä haasteellisemmaksi. Toisaalta digitaalisten piirirakenteiden pinta-ala sekä tehonkulutus pienenevät prosessikehityksen myötä. Tästä syystä digitaalisia ratkaisuja suunnitellaan vastaavien puhtaasti analogisien rakenteiden tilalle. Analogia-digitaalimuunnos (A/D-muunnos) voidaan toteuttaa jännitetason sijaan aikatasossa käyttämällä aikapohjaisia A/D-muuntimia, jotka ovat rakenteeltaan pääosin digitaalisia. Kvantisointivaihe aikapohjaisessa A/D-muuntimessa toteutetaan aika-digitaalimuuntimella. Työ esittelee vapaasti oskilloivaan silmukkaoskillaattoriin perustuvan aika-digitaalimuuntimen, joka on suunniteltu käytettäväksi laajakaistaisessa aikapohjaisessa A/D-muuntimessa. Esitelty rakenne pyrkii maksimoimaan muuntimen aikaresoluution sekä muunnosalueen, sekä saavuttamaan virhesietoisen muunnostoiminnan minimoidulla tehon sekä pinta-alan kulutuksella. Aikaresoluutio on maksimoitu hyödyntämällä suuritaajuista monipolkuista silmukkaoskillaattoria, ja muunnosalue on maksimoitu nopealla Gray-koodi -laskuripiirillä. Muunnosprosessin virhesietoisuus on saavutettu toteuttamalla näytteistys herkillä kiikkuelementeillä, hyödyntämällä Gray-koodattua laskuria, sekä jälkiprosessoimalla laskurin näytteistetyt arvot virheenkorjausalgoritmilla. Esitelty muunnintoteutus saavuttaa 9 bitin efektiivisen resoluution 250 MS/s näytetaajuudella ja 4.3 milliwatin tehonkulutuksella

    Radiation Hardened by Design Methodologies for Soft-Error Mitigated Digital Architectures

    Get PDF
    abstract: Digital architectures for data encryption, processing, clock synthesis, data transfer, etc. are susceptible to radiation induced soft errors due to charge collection in complementary metal oxide semiconductor (CMOS) integrated circuits (ICs). Radiation hardening by design (RHBD) techniques such as double modular redundancy (DMR) and triple modular redundancy (TMR) are used for error detection and correction respectively in such architectures. Multiple node charge collection (MNCC) causes domain crossing errors (DCE) which can render the redundancy ineffectual. This dissertation describes techniques to ensure DCE mitigation with statistical confidence for various designs. Both sequential and combinatorial logic are separated using these custom and computer aided design (CAD) methodologies. Radiation vulnerability and design overhead are studied on VLSI sub-systems including an advanced encryption standard (AES) which is DCE mitigated using module level coarse separation on a 90-nm process with 99.999% DCE mitigation. A radiation hardened microprocessor (HERMES2) is implemented in both 90-nm and 55-nm technologies with an interleaved separation methodology with 99.99% DCE mitigation while achieving 4.9% increased cell density, 28.5 % reduced routing and 5.6% reduced power dissipation over the module fences implementation. A DMR register-file (RF) is implemented in 55 nm process and used in the HERMES2 microprocessor. The RF array custom design and the decoders APR designed are explored with a focus on design cycle time. Quality of results (QOR) is studied from power, performance, area and reliability (PPAR) perspective to ascertain the improvement over other design techniques. A radiation hardened all-digital multiplying pulsed digital delay line (DDL) is designed for double data rate (DDR2/3) applications for data eye centering during high speed off-chip data transfer. The effect of noise, radiation particle strikes and statistical variation on the designed DDL are studied in detail. The design achieves the best in class 22.4 ps peak-to-peak jitter, 100-850 MHz range at 14 pJ/cycle energy consumption. Vulnerability of the non-hardened design is characterized and portions of the redundant DDL are separated in custom and auto-place and route (APR). Thus, a range of designs for mission critical applications are implemented using methodologies proposed in this work and their potential PPAR benefits explored in detail.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    Novel Area-Efficient and Flexible Architectures for Optimal Ate Pairing on FPGA

    Full text link
    While FPGA is a suitable platform for implementing cryptographic algorithms, there are several challenges associated with implementing Optimal Ate pairing on FPGA, such as security, limited computing resources, and high power consumption. To overcome these issues, this study introduces three approaches that can execute the optimal Ate pairing on Barreto-Naehrig curves using Jacobean coordinates with the goal of reaching 128-bit security on the Genesys board. The first approach is a pure software implementation utilizing the MicroBlaze processor. The second involves a combination of software and hardware, with key operations in FpF_{p} and Fp2F_{p^{2}} being transformed into IP cores for the MicroBlaze. The third approach builds on the second by incorporating parallelism to improve the pairing process. The utilization of multiple MicroBlaze processors within a single system offers both versatility and parallelism to speed up pairing calculations. A variety of methods and parameters are used to optimize the pairing computation, including Montgomery modular multiplication, the Karatsuba method, Jacobean coordinates, the Complex squaring method, sparse multiplication, squaring in Gϕ6Fp12G_{\phi 6}F_{p^{12}}, and the addition chain method. The proposed systems are designed to efficiently utilize limited resources in restricted environments, while still completing tasks in a timely manner.Comment: 13 pages, 8 figures, and 5 table

    An Implementation of a Three Dimensional Computational Pipeline with Minimal Latency and Maximum Throughput for LU Factorization Using Field Programmable Gate Arrays

    Get PDF
    Traditionally, computationally intense algebraic functions such as LU factorization are solved using complex systems such as supercomputers, parallel processing systems, and non-dedicated computing clusters. While these solutions are adequate for some problems, they typically suffer from classic parallel processing issues such as communication overhead, complex scheduling algorithms, and cost. Moreover, they are not feasible for embedded applications. Extremely high performance solutions are sometimes implemented using costly, custom hardware such as Application Specific Integrated Circuits (ASICs). Unfortunately, the design, implementation, and verification of ASICs has become cost prohibitive and such solutions are only feasible if the end design is to be manufactured in very high volumes. As a result, many proposed architectures to solve specific problems lie dormant because they are simply too expensive to realize.In recent years, advancements in Field Programmable Gate Array (FPGA) technology allow engineers to map complex algorithms to logic gates while achieving performance similar to ASIC technology. This thesis demonstrates the feasibility of the implementation of a three dimensional pipeline designed to solve LU factorization using FPGAs based on an architecture proposed nearly 10 years ago when a technology to implement such an architecture either did not exist or was too costly to implement

    Dependable Embedded Systems

    Get PDF
    This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from today’s points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems

    Hardware Considerations for Signal Processing Systems: A Step Toward the Unconventional.

    Full text link
    As we progress into the future, signal processing algorithms are becoming more computationally intensive and power hungry while the desire for mobile products and low power devices is also increasing. An integrated ASIC solution is one of the primary ways chip developers can improve performance and add functionality while keeping the power budget low. This work discusses ASIC hardware for both conventional and unconventional signal processing systems, and how integration, error resilience, emerging devices, and new algorithms can be leveraged by signal processing systems to further improve performance and enable new applications. Specifically this work presents three case studies: 1) a conventional and highly parallel mix signal cross-correlator ASIC for a weather satellite performing real-time synthetic aperture imaging, 2) an unconventional native stochastic computing architecture enabled by memristors, and 3) two unconventional sparse neural network ASICs for feature extraction and object classification. As improvements from technology scaling alone slow down, and the demand for energy efficient mobile electronics increases, such optimization techniques at the device, circuit, and system level will become more critical to advance signal processing capabilities in the future.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/116685/1/knagphil_1.pd

    Approximate logic circuits: Theory and applications

    Get PDF
    CMOS technology scaling, the process of shrinking transistor dimensions based on Moore's law, has been the thrust behind increasingly powerful integrated circuits for over half a century. As dimensions are scaled to few tens of nanometers, process and environmental variations can significantly alter transistor characteristics, thus degrading reliability and reducing performance gains in CMOS designs with technology scaling. Although design solutions proposed in recent years to improve reliability of CMOS designs are power-efficient, the performance penalty associated with these solutions further reduces performance gains with technology scaling, and hence these solutions are not well-suited for high-performance designs. This thesis proposes approximate logic circuits as a new logic synthesis paradigm for reliable, high-performance computing systems. Given a specification, an approximate logic circuit is functionally equivalent to the given specification for a "significant" portion of the input space, but has a smaller delay and power as compared to a circuit implementation of the original specification. This contributions of this thesis include (i) a general theory of approximation and efficient algorithms for automated synthesis of approximations for unrestricted random logic circuits, (ii) logic design solutions based on approximate circuits to improve reliability of designs with negligible performance penalty, and (iii) efficient decomposition algorithms based on approxiiii mate circuits to improve performance of designs during logic synthesis. This thesis concludes with other potential applications of approximate circuits and identifies. open problems in logic decomposition and approximate circuit synthesis

    Extensions of Task-based Runtime for High Performance Dense Linear Algebra Applications

    Get PDF
    On the road to exascale computing, the gap between hardware peak performance and application performance is increasing as system scale, chip density and inherent complexity of modern supercomputers are expanding. Even if we put aside the difficulty to express algorithmic parallelism and to efficiently execute applications at large scale, other open questions remain. The ever-growing scale of modern supercomputers induces a fast decline of the Mean Time To Failure. A generic, low-overhead, resilient extension becomes a desired aptitude for any programming paradigm. This dissertation addresses these two critical issues, designing an efficient unified linear algebra development environment using a task-based runtime, and extending a task-based runtime with fault tolerant capabilities to build a generic framework providing both soft and hard error resilience to task-based programming paradigm. To bridge the gap between hardware peak performance and application perfor- mance, a unified programming model is designed to take advantage of a lightweight task-based runtime to manage the resource-specific workload, and to control the data ow and parallel execution of tasks. Under this unified development, linear algebra tasks are abstracted across different underlying heterogeneous resources, including multicore CPUs, GPUs and Intel Xeon Phi coprocessors. Performance portability is guaranteed and this programming model is adapted to a wide range of accelerators, supporting both shared and distributed-memory environments. To solve the resilient challenges on large scale systems, fault tolerant mechanisms are designed for a task-based runtime to protect applications against both soft and hard errors. For soft errors, three additions to a task-based runtime are explored. The first recovers the application by re-executing minimum number of tasks, the second logs intermediary data between tasks to minimize the necessary re-execution, while the last one takes advantage of algorithmic properties to recover the data without re- execution. For hard errors, we propose two generic approaches, which augment the data logging mechanism for soft errors. The first utilizes non-volatile storage device to save logged data, while the second saves local logged data on a remote node to protect against node failure. Experimental results have confirmed that our soft and hard error fault tolerant mechanisms exhibit the expected correctness and efficiency

    Hardware implementation of a low-power two-dimensional discrete cosine transform

    Get PDF
    Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002.Includes bibliographical references (p. 143-144).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.In this project, a JPEG compliant, low-power dedicated, two-dimensional, Discrete Cosine Transform (DCT) core meeting all IBM Softcore requirements is developed. Power is optimized completely at the algorithmic, architectural, and logic levels. The architecture uses row-column decomposition of a fast 1-D algorithm implemented with distributed arithmetic. It features clock gating schemes as well as power-aware schemes that utilize input correlations to dynamically scale down power consumption. This is done by eliminating glitching in the ROM Accumulate (RAC) units to effectively stop unnecessary computation. The core is approximately 180K transistors, runs at a maximum of 100MHz, is synthesized to a .18[mu]m double-well CMOS technology with a 1.8V power supply, and consumes between 63 and 87 mW of power at 100MHz depending on the image data. The thesis explores the algorithmic evaluations, architectural design, development of the C and VHDL models, verification methods, synthesis operations, static timing analysis, design for test compliance, power analysis, and performance comparisons for the development of the core. The work has been completed in the ASIC Digital Cores I department of the IBM Microelectronics Division in Burlington, Vermont as part of the third assignment in the MIT VI-A program.by Rajul R. Shah.M.Eng

    Cost-Efficient Soft-Error Resiliency for ASIP-based Embedded Systems

    Full text link
    Recent decades have witnessed the rapid growth of embedded systems. At present, embedded systems are widely applied in a broad range of critical applications including automotive electronics, telecommunication, healthcare, industrial electronics, consumer electronics military and aerospace. Human society will continue to be greatly transformed by the pervasive deployment of embedded systems. Consequently, substantial amount of efforts from both industry and academic communities have contributed to the research and development of embedded systems. Application-specific instruction-set processor (ASIP) is one of the key advances in embedded processor technology, and a crucial component in some embedded systems. Soft errors have been directly observed since the 1970s. As devices scale, the exponential increase in the integration of computing systems occurs, which leads to correspondingly decrease in the reliability of computing systems. Today, major research forums state that soft errors are one of the major design technology challenges at and beyond the 22 nm technology node. Therefore, a large number of soft-error solutions, including error detection and recovery, have been proposed from differing perspectives. Nonetheless, most of the existing solutions are designed for general or high-performance systems which are different to embedded systems. For embedded systems, the soft-error solutions must be cost-efficient, which requires the tailoring of the processor architecture with respect to the feature of the target application. This thesis embodies a series of explorations for cost-efficient soft-error solutions for ASIP-based embedded systems. In this exploration, five major solutions are proposed. The first proposed solution realizes checkpoint recovery in ASIPs. By generating customized instructions, ASIP-implemented checkpoint recovery can perform at a finer granularity than what was previously possible. The fault-free performance overhead of this solution is only 1.45% on average. The recovery delay is only 62 cycles at the worst case. The area and leakage power overheads are 44.4% and 45.6% on average. The second solution explores utilizing two primitive error recovery techniques jointly. This solution includes three application-specific optimization methodologies. This solution generates the optimized error-resilient ASIPs, based on the characteristics of primitive error recovery techniques, static reliability analysis and design constraints. The resultant ASIP can be configured to perform at runtime according to the optimized recovery scheme. This solution can strategically enhance cost-efficiency for error recovery. In order to guarantee cost-efficiency in unpredictable runtime situations, the third solution explores runtime adaptation for error recovery. This solution aims to budget and adapt the error recovery operations, so as to spend the resources intelligently and to tolerate adverse influences of runtime variations. The resultant ASIP can make runtime decisions to determine the activation of spatial and temporal redundancies, according to the runtime situations. At the best case, this solution can achieve almost 50x reliability gain over the state of the art solutions. Given the increasing demand for multi-core computing systems, the last two proposed solutions target error recovery in multi-core ASIPs. The first solution of these two explores ASIP-implemented fine-grained process migration. This solution is a key infrastructure, which allows cost-efficient task management, for realizing cost-efficient soft-error recovery in multi-core ASIPs. The average time cost is only 289 machine cycles to perform process migration. The last solution explores using dynamic and adaptive mapping to assign heterogeneous recovery operations to the tasks in the multi-core context. This solution allows each individual ASIP-based processing core to dynamically adapt its specific error recovery functionality according to the corresponding task's characteristics, in terms of soft error vulnerability and execution time deadline. This solution can significantly improve the reliability of the system by almost two times, with graceful constraint penalty, in comparison to the state-of-the-art counterparts
    corecore