29 research outputs found
New Techniques for On-line Testing and Fault Mitigation in GPUs
L'abstract è presente nell'allegato / the abstract is in the attachmen
Methodology to accelerate diagnostic coverage assessment: MADC
Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Engenharia Elétrica, Florianópolis, 2016.Os veÃculos da atualidade vêm integrando um número crescente de eletrônica embarcada, com o objetivo de permitir uma experiência mais segura aos motoristas. Logo, a garantia da segurança fÃsica é um requisito que precisa ser observada por completo durante o processo de desenvolvimento. O padrão ISO 26262 provê medidas para garantir que esses requisitos não sejam negligenciados. Injeção de falhas é fortemente recomendada quando da verificação do funcionamento dos mecanismos de segurança implementados, assim como sua capacidade de cobertura associada ao diagnóstico de falhas existentes. A análise exaustiva não é obrigatória, mas evidências de que o máximo esforço foi feito para acurar a cobertura de diagnóstico precisam ser apresentadas, principalmente durante a avalição dos nÃveis de segurança associados a arquitetura implementada em hardware. Estes nÃveis dão suporte à s alegações de que o projeto obedece à s métricas de segurança da integridade fÃsica exigida em aplicações automotivas. Os nÃveis de integridade variam de A à D, sendo este último o mais rigoroso. Essa Tese explora o estado-da-arte em soluções de verificação, e tem por objetivo construir uma metodologia que permita acelerar a verificação da cobertura de diagnóstico alcançado. Diferentemente de outras técnicas voltadas à aceleração de injeção de falhas, a metodologia proposta utiliza uma plataforma de hardware dedicada à verificação, com o intuito de maximizar o desempenho relativo a simulação de falhas. Muitos aspectos relativos a ISO 26262 são observados de forma que a presente contribuição possa ser apreciada no segmento automotivo. Por fim, uma arquitetura OpenRISC é utilizada para confirmar os resultados alcançados com essa solução proposta pertencente ao estado-da-arte.Abstract : Modern vehicles are integrating a growing number of electronics to provide a safer experience for the driver. Therefore, safety is a non-negotiable requirement that must be considered through the vehicle development process. The ISO 26262 standard provides guidance to ensure that such requirements are implemented. Fault injection is highly recommended for the functional verification of safety mechanisms or to evaluate their diagnostic coverage capability. An exhaustive analysis is not required, but evidence of best effort through the diagnostic coverage assessment needs to be provided when performing quantitative evaluation of hardware architectural metrics. These metrics support that the automotive safety integrity level ? ranging from A (lowest) to D (strictest) levels ? was obeyed. This thesis explores the most advanced verification solutions in order to build a methodology to accelerate the diagnostic coverage assessment. Different from similar techniques for fault injection acceleration, the proposed methodology does not require any modification of the design model to enable acceleration. Many functional safety requisites in the ISO 26262 are considered thus allowing the contribution presented to be a suitable solution for the automotive segment. An OpenRISC architecture is used to confirm the results achieved by this state-of-the-art solution
AI/ML Algorithms and Applications in VLSI Design and Technology
An evident challenge ahead for the integrated circuit (IC) industry in the
nanometer regime is the investigation and development of methods that can
reduce the design complexity ensuing from growing process variations and
curtail the turnaround time of chip manufacturing. Conventional methodologies
employed for such tasks are largely manual; thus, time-consuming and
resource-intensive. In contrast, the unique learning strategies of artificial
intelligence (AI) provide numerous exciting automated approaches for handling
complex and data-intensive tasks in very-large-scale integration (VLSI) design
and testing. Employing AI and machine learning (ML) algorithms in VLSI design
and manufacturing reduces the time and effort for understanding and processing
the data within and across different abstraction levels via automated learning
algorithms. It, in turn, improves the IC yield and reduces the manufacturing
turnaround time. This paper thoroughly reviews the AI/ML automated approaches
introduced in the past towards VLSI design and manufacturing. Moreover, we
discuss the scope of AI/ML applications in the future at various abstraction
levels to revolutionize the field of VLSI design, aiming for high-speed, highly
intelligent, and efficient implementations
The Use of Parallel Processing in VLSI Computer-Aided Design Application
Coordinated Science Laboratory was formerly known as Control Systems LaboratorySemiconductor Research Corporation / 87-DP-10
Toward Reliable, Secure, and Energy-Efficient Multi-Core System Design
Computer hardware researchers have perennially focussed on improving the performance of computers while stipulating the energy consumption under a strict budget. While several innovations over the years have led to high performance and energy efficient computers, more challenges have also emerged as a fallout. For example, smaller transistor devices in modern multi-core systems are afflicted with several reliability and security concerns, which were inconceivable even a decade ago. Tackling these bottlenecks happens to negatively impact the power and performance of the computers. This dissertation explores novel techniques to gracefully solve some of the pressing challenges of the modern computer design. Specifically, the proposed techniques improve the reliability of on-chip communication fabric under a high power supply noise, increase the energy-efficiency of low-power graphics processing units, and demonstrate an unprecedented security loophole of the low-power computing paradigm through rigorous hardware-based experiments
Concurrent Online Testing for Many Core Systems-on-Chips
Shrinking transistor sizes have introduced new challenges and opportunities for system-on-chip (SoC) design and reliability. Smaller transistors are more susceptible to early lifetime failure and electronic wear-out, greatly reducing their reliable lifetimes. However, smaller transistors will also allow SoC to contain hundreds of processing cores and other infrastructure components with the potential for increased reliability through massive structural redundancy. Concurrent online testing (COLT) can provide sufficient reliability and availability to systems with this redundancy. COLT manages the process of testing a subset of processing cores while the rest of the system remains operational. This can be considered a temporary, graceful degradation of system performance that increases reliability while maintaining availability.
In this dissertation, techniques to assist COLT are proposed and analyzed. The techniques described in this dissertation focus on two major aspects of COLT feasibility: recovery time and test delivery costs. To reduce the time between failure and recovery, and thereby increase system availability, an anomaly-based test triggering unit (ATTU) is proposed to initiate COLT when anomalous network behavior is detected. Previous COLT techniques have relied on initiating tests periodically. However, determining the testing period is based on a device's mean time between failures (MTBF), and calculating MTBF is exceedingly difficult and imprecise.
To address the test delivery costs associated with COLT, a distributed test vector storage (DTVS) technique is proposed to eliminate the dependency of test delivery costs on core location. Previous COLT techniques have relied on a single location to store test vectors, and it has been demonstrated that centralized storage of tests scales poorly as the number of cores per SoC grows. Assuming that the SoC organizes its processing cores with a regular topology, DTVS uses an interleaving technique to optimally distribute the test vectors across the entire chip. DTVS is analyzed both empirically and analytically, and a testing protocol using DTVS is described.
COLT is only feasible if the applications running concurrently are largely unaffected. The effect of COLT on application execution time is also measured in this dissertation, and an application-aware COLT protocol is proposed and analyzed. Application interference is greatly reduced through this technique
Cross-Layer Optimization for Power-Efficient and Robust Digital Circuits and Systems
With the increasing digital services demand, performance and power-efficiency
become vital requirements for digital circuits and systems. However, the
enabling CMOS technology scaling has been facing significant challenges of
device uncertainties, such as process, voltage, and temperature variations. To
ensure system reliability, worst-case corner assumptions are usually made in
each design level. However, the over-pessimistic worst-case margin leads to
unnecessary power waste and performance loss as high as 2.2x. Since
optimizations are traditionally confined to each specific level, those safe
margins can hardly be properly exploited.
To tackle the challenge, it is therefore advised in this Ph.D. thesis to
perform a cross-layer optimization for digital signal processing circuits and
systems, to achieve a global balance of power consumption and output quality.
To conclude, the traditional over-pessimistic worst-case approach leads to
huge power waste. In contrast, the adaptive voltage scaling approach saves
power (25% for the CORDIC application) by providing a just-needed supply
voltage. The power saving is maximized (46% for CORDIC) when a more aggressive
voltage over-scaling scheme is applied. These sparsely occurred circuit errors
produced by aggressive voltage over-scaling are mitigated by higher level error
resilient designs. For functions like FFT and CORDIC, smart error mitigation
schemes were proposed to enhance reliability (soft-errors and timing-errors,
respectively). Applications like Massive MIMO systems are robust against lower
level errors, thanks to the intrinsically redundant antennas. This property
makes it applicable to embrace digital hardware that trades quality for power
savings.Comment: 190 page