Search CORE

1,915 research outputs found

Innovative Techniques for Testing and Diagnosing SoCs

Author: DE CARVALHO Mauricio
Publication venue: Politecnico di Torino
Publication date: 01/01/2015
Field of study

We rely upon the continued functioning of many electronic devices for our everyday welfare, usually embedding integrated circuits that are becoming even cheaper and smaller with improved features. Nowadays, microelectronics can integrate a working computer with CPU, memories, and even GPUs on a single die, namely System-On-Chip (SoC). SoCs are also employed on automotive safety-critical applications, but need to be tested thoroughly to comply with reliability standards, in particular the ISO26262 functional safety for road vehicles. The goal of this PhD. thesis is to improve SoC reliability by proposing innovative techniques for testing and diagnosing its internal modules: CPUs, memories, peripherals, and GPUs. The proposed approaches in the sequence appearing in this thesis are described as follows: 1. Embedded Memory Diagnosis: Memories are dense and complex circuits which are susceptible to design and manufacturing errors. Hence, it is important to understand the fault occurrence in the memory array. In practice, the logical and physical array representation differs due to an optimized design which adds enhancements to the device, namely scrambling. This part proposes an accurate memory diagnosis by showing the efforts of a software tool able to analyze test results, unscramble the memory array, map failing syndromes to cell locations, elaborate cumulative analysis, and elaborate a final fault model hypothesis. Several SRAM memory failing syndromes were analyzed as case studies gathered on an industrial automotive 32-bit SoC developed by STMicroelectronics. The tool displayed defects virtually, and results were confirmed by real photos taken from a microscope. 2. Functional Test Pattern Generation: The key for a successful test is the pattern applied to the device. They can be structural or functional; the former usually benefits from embedded test modules targeting manufacturing errors and is only effective before shipping the component to the client. The latter, on the other hand, can be applied during mission minimally impacting on performance but is penalized due to high generation time. However, functional test patterns may benefit for having different goals in functional mission mode. Part III of this PhD thesis proposes three different functional test pattern generation methods for CPU cores embedded in SoCs, targeting different test purposes, described as follows: a. Functional Stress Patterns: Are suitable for optimizing functional stress during I Operational-life Tests and Burn-in Screening for an optimal device reliability characterization b. Functional Power Hungry Patterns: Are suitable for determining functional peak power for strictly limiting the power of structural patterns during manufacturing tests, thus reducing premature device over-kill while delivering high test coverage c. Software-Based Self-Test Patterns: Combines the potentiality of structural patterns with functional ones, allowing its execution periodically during mission. In addition, an external hardware communicating with a devised SBST was proposed. It helps increasing in 3% the fault coverage by testing critical Hardly Functionally Testable Faults not covered by conventional SBST patterns. An automatic functional test pattern generation exploiting an evolutionary algorithm maximizing metrics related to stress, power, and fault coverage was employed in the above-mentioned approaches to quickly generate the desired patterns. The approaches were evaluated on two industrial cases developed by STMicroelectronics; 8051-based and a 32-bit Power Architecture SoCs. Results show that generation time was reduced upto 75% in comparison to older methodologies while increasing significantly the desired metrics. 3. Fault Injection in GPGPU: Fault injection mechanisms in semiconductor devices are suitable for generating structural patterns, testing and activating mitigation techniques, and validating robust hardware and software applications. GPGPUs are known for fast parallel computation used in high performance computing and advanced driver assistance where reliability is the key point. Moreover, GPGPU manufacturers do not provide design description code due to content secrecy. Therefore, commercial fault injectors using the GPGPU model is unfeasible, making radiation tests the only resource available, but are costly. In the last part of this thesis, we propose a software implemented fault injector able to inject bit-flip in memory elements of a real GPGPU. It exploits a software debugger tool and combines the C-CUDA grammar to wisely determine fault spots and apply bit-flip operations in program variables. The goal is to validate robust parallel algorithms by studying fault propagation or activating redundancy mechanisms they possibly embed. The effectiveness of the tool was evaluated on two robust applications: redundant parallel matrix multiplication and floating point Fast Fourier Transform

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Technical guidance and analytic services in support of SEASAT-A

Author: Brooks W. L.
Dooley R. P.
Publication venue
Publication date
Field of study

The design of a high resolution radar for altimetry and ocean wave height estimation was studied. From basic principles, it is shown that a short pulse wide beam radar is the most appropriate and recommended technique for measuring both altitude and ocean wave height. To achieve a topographic resolution of + or - 10 cm RMS at 5.0 meter RMS wave heights, as required for SEASAT-A, it is recommended that the altimeter design include an onboard adaptive processor. The resulting design, which assumes a maximum likelihood estimation (MLE) processor, is shown to satisfy all performance requirements. A design summary is given for the recommended radar altimeter, which includes a full deramp STRETCH pulse compression technique followed by an analog filter bank to separate range returns as well as the assumed MLE processor. The feedback loop implementation of the MLE on a digital computer was examined in detail, and computer size, estimation accuracies, and bias due to range sidelobes are given for the MLE with typical SEASAT-A parameters. The standard deviation of the altitude estimate was developed and evaluated for several adaptive and nonadaptive split-gate trackers. Split-gate tracker biases due to range sidelobes and transmitter noise are examined. An approximate closed form solution for the altimeter power return is derived and evaluated. The feasibility of utilizing the basic radar altimeter design for the measurement of ocean wave spectra was examined

NASA Technical Reports Server

An asynchronous forth microprocessor.

Author
Publication venue
Publication date: 01/01/2000
Field of study

Ping-Ki Tsang.Thesis (M.Phil.)--Chinese University of Hong Kong, 2000.Includes bibliographical references (leaves 87-95).Abstracts in English and Chinese.Abstract --- p.iAcknowledgments --- p.iiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Motivation and Aims --- p.1Chapter 1.2 --- Contributions --- p.3Chapter 1.3 --- Overview of the Thesis --- p.4Chapter 2 --- Asynchronous Logic g --- p.6Chapter 2.1 --- Motivation --- p.6Chapter 2.2 --- Timing Models --- p.9Chapter 2.2.1 --- Fundamental-Mode Model --- p.9Chapter 2.2.2 --- Delay-Insensitive Model --- p.10Chapter 2.2.3 --- QDI and Speed-Independent Models --- p.11Chapter 2.3 --- Asynchronous Signalling Protocols --- p.12Chapter 2.3.1 --- 2-phase Handshaking Protocol --- p.12Chapter 2.3.2 --- 4-phase Handshaking Protocol --- p.13Chapter 2.4 --- Data Representations --- p.14Chapter 2.4.1 --- Dual Rail Coded Data --- p.15Chapter 2.4.2 --- Bundled Data --- p.15Chapter 2.5 --- Previous Asynchronous Processors --- p.16Chapter 2.6 --- Summary --- p.20Chapter 3 --- The MSL16 Architecture --- p.21Chapter 3.1 --- RISC Machines --- p.21Chapter 3.2 --- Stack Machines --- p.23Chapter 3.3 --- Forth and its Applications --- p.24Chapter 3.4 --- MSL16 --- p.26Chapter 3.4.1 --- Architecture --- p.28Chapter 3.4.2 --- Instruction Set --- p.30Chapter 3.4.3 --- The Datapath --- p.32Chapter 3.4.4 --- Interrupts and Exceptions --- p.33Chapter 3.4.5 --- Implementing Forth primitives --- p.34Chapter 3.4.6 --- Code Density Estimation --- p.34Chapter 3.5 --- Summary --- p.35Chapter 4 --- Design Methodology --- p.37Chapter 4.1 --- Basic Notation --- p.38Chapter 4.2 --- Specification of MSL16A --- p.39Chapter 4.3 --- Decomposition into Concurrent Processes --- p.41Chapter 4.4 --- Separation of Control and Datapath --- p.45Chapter 4.5 --- Handshaking Expansion --- p.45Chapter 4.5.1 --- 4-Phase Handshaking Protocol --- p.46Chapter 4.6 --- Production-rule Expansion --- p.47Chapter 4.7 --- Summary --- p.48Chapter 5 --- Implementation --- p.49Chapter 5.1 --- C-element --- p.49Chapter 5.2 --- Mutual Exclusion Elements --- p.51Chapter 5.3 --- Caltech Asynchronous Synthesis Tools --- p.53Chapter 5.4 --- Stack Design --- p.54Chapter 5.4.1 --- Eager Stack Control --- p.55Chapter 5.4.2 --- Lazy Stack Control --- p.56Chapter 5.4.3 --- Eager/Lazy Stack Datapath --- p.53Chapter 5.4.4 --- Pointer Stack Control --- p.61Chapter 5.4.5 --- Pointer Stack Datapath --- p.62Chapter 5.5 --- ALU Design --- p.62Chapter 5.5.1 --- The Addition Operation --- p.63Chapter 5.5.2 --- Zero-Checker --- p.64Chapter 5.6 --- Memory Interface and Tri-state Buffers --- p.64Chapter 5.7 --- MSL16A --- p.65Chapter 5.8 --- Summary --- p.66Chapter 6 --- Results --- p.67Chapter 6.1 --- FPGA based implementation of MSL16 --- p.67Chapter 6.2 --- MSL16A --- p.69Chapter 6.2.1 --- A Comparison of 3 Stack Designs --- p.69Chapter 6.2.2 --- Evaluation of the ALU --- p.73Chapter 6.2.3 --- Evaluation of MSL16A --- p.74Chapter 6.3 --- Summary --- p.81Chapter 7 --- Conclusions --- p.83Chapter 7.1 --- Future Work --- p.85Bibliography --- p.87Publications --- p.9

CUHK Digital Repository

Advanced flight control system study

Author: Klafin J. F.
Mcgough J.
Moses K.
Publication venue
Publication date
Field of study

The architecture, requirements, and system elements of an ultrareliable, advanced flight control system are described. The basic criteria are functional reliability of 10 to the minus 10 power/hour of flight and only 6 month scheduled maintenance. A distributed system architecture is described, including a multiplexed communication system, reliable bus controller, the use of skewed sensor arrays, and actuator interfaces. Test bed and flight evaluation program are proposed

NASA Technical Reports Server

Determining application-specific peak power and energy requirements for ultra-low-power processors

Author: Ye Weidong
Publication venue
Publication date: 01/05/2017
Field of study

Many emerging applications such as IoT, wearables, implantables, and sensor networks are power- and energy-constrained. These applications rely on ultra-low-power processors that have rapidly become the most abundant type of processor manufactured today. In the ultra-low-power embedded systems used by these applications, peak power and energy requirements are the primary factors that determine critical system characteristics, such as size, weight, cost, and lifetime. While the power and energy requirements of these systems tend to be application-speci c, conventional techniques for rating peak power and energy cannot accurately bound the power and energy requirements of an application running on a processor, leading to overprovisioning that increases system size and weight. In this thesis, we present an automated technique that performs hardware-software co-analysis of the application and ultra-low-power processor in an embedded system to determine application-speci c peak power and energy requirements.Our technique provides more accurate, tighter bounds than conventional techniques for determining peak power and energy requirements, reporting 15% lower peak power and 17% lower peak energy, on average, than a conventional approach based on pro ling and guardbanding. Compared to an aggressive stressmark-based approach, our technique reports power and energy bounds that are 26% and 26% lower, respectively, on average. Also, unlike conventional approaches, our technique reports guaranteed bounds on peak power and energy independent of an application's input set. Tighter bounds on peak power and energy can be exploited to reduce system size, weight, and cost

Illinois Digital Environment for Access to Learning and Scholarship Repository

Power reduction techniques for memory elements

Author: Katrue Srikanth
Publication venue: RIT Scholar Works
Publication date: 14/12/2007
Field of study

High performance and computational capability in the current generation processors are made possible by small feature sizes and high device density. To maintain the current drive strength and control the dynamic power in these processors, simultaneous scaling down of supply and threshold voltages is performed. High device density and low threshold voltages result in an increase in the leakage current dissipation. Large on chip caches are integrated onto the current generation processors which are becoming a major contributor to total leakage power. In this work, a novel methodology is proposed to minimize the leakage power and dynamic power. The proposed static power reduction technique, GALEOR (GAted LEakage transistOR), introduces stacks by placing high threshold voltage transistors and consists of inherent control logic. The proposed dynamic power reduction technique, adaptive phase tag cache, achieves power savings through varying tag size for a design window. Testing and verification of the proposed techniques is performed on a two level cache system. Power delay squared product is used as a metric to measure the effectiveness of the proposed techniques. The GALEOR technique achieves 30% reduction when implemented on CMOS benchmark circuits and an overall leakage savings of 9% when implemented on the two level cache systems. The proposed dynamic power reduction technique achieves 10% savings when implemented on individual modules of the two level cache and an overall savings of 3% when implemented on the entire two level cache system

RIT Scholar Works

Advanced information processing system: The Army fault tolerant architecture conceptual study. Volume 2: Army fault tolerant architecture design and analysis

Author: Alger L. S.
Babikyan C. A.
Butler B. P.
Friend S. A.
Ganska R. J.
Harper R. E.
Lala J. H.
Masotto T. K.
Meyer A. J.
Morton D. P.
Publication venue
Publication date
Field of study

Described here is the Army Fault Tolerant Architecture (AFTA) hardware architecture and components and the operating system. The architectural and operational theory of the AFTA Fault Tolerant Data Bus is discussed. The test and maintenance strategy developed for use in fielded AFTA installations is presented. An approach to be used in reducing the probability of AFTA failure due to common mode faults is described. Analytical models for AFTA performance, reliability, availability, life cycle cost, weight, power, and volume are developed. An approach is presented for using VHSIC Hardware Description Language (VHDL) to describe and design AFTA's developmental hardware. A plan is described for verifying and validating key AFTA concepts during the Dem/Val phase. Analytical models and partial mission requirements are used to generate AFTA configurations for the TF/TA/NOE and Ground Vehicle missions

NASA Technical Reports Server

Integrated Application of Active Controls (IAAC) technology to an advanced subsonic transport project: Current and advanced act control system definition study. Volume 2: Appendices

Author: Blight J. D.
Buchan S. M.
Crumb C. B.
Dethman H. A.
Dorwart R. J.
Gangsaas D.
Gratzer L. B.
Hanks G. W.
Maeshiro A.
Shomber H. A.
Publication venue
Publication date
Field of study

The current status of the Active Controls Technology (ACT) for the advanced subsonic transport project is investigated through analysis of the systems technical data. Control systems technologies under examination include computerized reliability analysis, pitch axis fly by wire actuator, flaperon actuation system design trade study, control law synthesis and analysis, flutter mode control and gust load alleviation analysis, and implementation of alternative ACT systems. Extensive analysis of the computer techniques involved in each system is included

NASA Technical Reports Server

An efficient design space exploration framework to optimize power-efficient heterogeneous many-core multi-threading embedded processor architectures

Author: Datta Kushal
NC DOCKS at The University of North Carolina at Charlotte
Publication venue
Publication date: 01/01/2011
Field of study

By the middle of this decade, uniprocessor architecture performance had hit a roadblock due to a combination of factors, such as excessive power dissipation due to high operating frequencies, growing memory access latencies, diminishing returns on deeper instruction pipelines, and a saturation of available instruction level parallelism in applications. An attractive and viable alternative embraced by all the processor vendors was multi-core architectures where throughput is improved by using micro-architectural features such as multiple processor cores, interconnects and low latency shared caches integrated on a single chip. The individual cores are often simpler than uniprocessor counterparts, use hardware multi-threading to exploit thread-level parallelism and latency hiding and typically achieve better performance-power figures. The overwhelming success of the multi-core microprocessors in both high performance and embedded computing platforms motivated chip architects to dramatically scale the multi-core processors to many-cores which will include hundreds of cores on-chip to further improve throughput. With such complex large scale architectures however, several key design issues need to be addressed. First, a wide range of micro- architectural parameters such as L1 caches, load/store queues, shared cache structures and interconnection topologies and non-linear interactions between them define a vast non-linear multi-variate micro-architectural design space of many-core processors; the traditional method of using extensive in-loop simulation to explore the design space is simply not practical. Second, to accurately evaluate the performance (measured in terms of cycles per instruction (CPI)) of a candidate design, the contention at the shared cache must be accounted in addition to cycle-by-cycle behavior of the large number of cores which superlinearly increases the number of simulation cycles per iteration of the design exploration. Third, single thread performance does not scale linearly with number of hardware threads per core and number of cores due to memory wall effect. This means that at every step of the design process designers must ensure that single thread performance is not unacceptably slowed down while increasing overall throughput. While all these factors affect design decisions in both high performance and embedded many-core processors, the design of embedded processors required for complex embedded applications such as networking, smart power grids, battlefield decision-making, consumer electronics and biomedical devices to name a few, is fundamentally different from its high performance counterpart because of the need to consider (i) low power and (ii) real-time operations. This implies the design objective for embedded many-core processors cannot be to simply maximize performance, but improve it in such a way that overall power dissipation is minimized and all real-time constraints are met. This necessitates additional power estimation models right at the design stage to accurately measure the cost and reliability of all the candidate designs during the exploration phase. In this dissertation, a statistical machine learning (SML) based design exploration framework is presented which employs an execution-driven cycle- accurate simulator to accurately measure power and performance of embedded many-core processors. The embedded many-core processor domain is Network Processors (NePs) used to processed network IP packets. Future generation NePs required to operate at terabits per second network speeds captures all the aspects of a complex embedded application consisting of shared data structures, large volume of compute-intensive and data-intensive real-time bound tasks and a high level of task (packet) level parallelism. Statistical machine learning (SML) is used to efficiently model performance and power of candidate designs in terms of wide ranges of micro-architectural parameters. The method inherently minimizes number of in-loop simulations in the exploration framework and also efficiently captures the non-linear interactions between the micro-architectural design parameters. To ensure scalability, the design space is partitioned into (i) core-level micro-architectural parameters to optimize single core architectures subject to the real-time constraints and (ii) shared memory level micro- architectural parameters to explore the shared interconnection network and shared cache memory architectures and achieves overall optimality. The cost function of our exploration algorithm is the total power dissipation which is minimized, subject to the constraints of real-time throughput (as determined from the terabit optical network router line-speed) required in IP packet processing embedded application

The University of North Carolina at Greensboro