Search CORE

1,074 research outputs found

Memory consistency directed cache coherence protocols for scalable multiprocessors

Author: Elver Marco
Nagarajan Vijay
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2014
Field of study

The memory consistency model, which formally specifies the behavior of the memory system, is used by programmers to reason about parallel programs. From a hardware design perspective, weaker consistency models permit various optimizations in a multiprocessor system: this thesis focuses on designing and optimizing the cache coherence protocol for a given target memory consistency model. Traditional directory coherence protocols are designed to be compatible with the strictest memory consistency model, sequential consistency (SC). When they are used for chip multiprocessors (CMPs) that provide more relaxed memory consistency models, such protocols turn out to be unnecessarily strict. Usually, this comes at the cost of scalability, in terms of per-core storage due to sharer tracking, which poses a problem with increasing number of cores in today’s CMPs, most of which no longer are sequentially consistent. The recent convergence towards programming language based relaxed memory consistency models has sparked renewed interest in lazy cache coherence protocols. These protocols exploit synchronization information by enforcing coherence only at synchronization boundaries via self-invalidation. As a result, such protocols do not require sharer tracking which benefits scalability. On the downside, such protocols are only readily applicable to a restricted set of consistency models, such as Release Consistency (RC), which expose synchronization information explicitly. In particular, existing architectures with stricter consistency models (such as x86) cannot readily make use of lazy coherence protocols without either: adapting the protocol to satisfy the stricter consistency model; or changing the architecture’s consistency model to (a variant of) RC, typically at the expense of backward compatibility. The first part of this thesis explores both these options, with a focus on a practical approach satisfying backward compatibility. Because of the wide adoption of Total Store Order (TSO) and its variants in x86 and SPARC processors, and existing parallel programs written for these architectures, we first propose TSO-CC, a lazy cache coherence protocol for the TSO memory consistency model. TSO-CC does not track sharers and instead relies on self-invalidation and detection of potential acquires (in the absence of explicit synchronization) using per cache line timestamps to efficiently and lazily satisfy the TSO memory consistency model. Our results show that TSO-CC achieves, on average, performance comparable to a MESI directory protocol, while TSO-CC’s storage overhead per cache line scales logarithmically with increasing core count. Next, we propose an approach for the x86-64 architecture, which is a compromise between retaining the original consistency model and using a more storage efficient lazy coherence protocol. First, we propose a mechanism to convey synchronization information via a simple ISA extension, while retaining backward compatibility with legacy codes and older microarchitectures. Second, we propose RC3 (based on TSOCC), a scalable cache coherence protocol for RCtso, the resulting memory consistency model. RC3 does not track sharers and relies on self-invalidation on acquires. To satisfy RCtso efficiently, the protocol reduces self-invalidations transitively using per-L1 timestamps only. RC3 outperforms a conventional lazy RC protocol by 12%, achieving performance comparable to a MESI directory protocol for RC optimized programs. RC3’s storage overhead per cache line scales logarithmically with increasing core count and reduces on-chip coherence storage overheads by 45% compared to TSO-CC. Finally, it is imperative that hardware adheres to the promised memory consistency model. Indeed, consistency directed coherence protocols cannot use conventional coherence definitions (e.g. SWMR) to be verified against, and few existing verification methodologies apply. Furthermore, as the full consistency model is used as a specification, their interaction with other components (e.g. pipeline) of a system must not be neglected in the verification process. Therefore, verifying a system with such protocols in the context of interacting components is even more important than before. One common way to do this is via executing tests, where specific threads of instruction sequences are generated and their executions are checked for adherence to the consistency model. It would be extremely beneficial to execute such tests under simulation, i.e. when the functional design implementation of the hardware is being prototyped. Most prior verification methodologies, however, target post-silicon environments, which when used for simulation-based memory consistency verification would be too slow. We propose McVerSi, a test generation framework for fast memory consistency verification of a full-system design implementation under simulation. Our primary contribution is a Genetic Programming (GP) based approach to memory consistency test generation, which relies on a novel crossover function that prioritizes memory operations contributing to non-determinism, thereby increasing the probability of uncovering memory consistency bugs. To guide tests towards exercising as much logic as possible, the simulator’s reported coverage is used as the fitness function. Furthermore, we increase test throughput by making the test workload simulation-aware. We evaluate our proposed framework using the Gem5 cycle accurate simulator in full-system mode with Ruby (with configurations that use Gem5’s MESI protocol, and our proposed TSO-CC together with an out-of-order pipeline). We discover 2 new bugs in the MESI protocol due to the faulty interaction of the pipeline and the cache coherence protocol, highlighting that even conventional protocols should be verified rigorously in the context of a full-system. Crucially, these bugs would not have been discovered through individual verification of the pipeline or the coherence protocol. We study 11 bugs in total. Our GP-based test generation approach finds all bugs consistently, therefore providing much higher guarantees compared to alternative approaches (pseudo-random test generation and litmus tests)

Crossref

Edinburgh Research Archive

Edinburgh Research Explorer

McVerSi: A Test Generation Framework for Fast Memory Consistency Verification in Simulation

Author: Elver Marco
Nagarajan Vijayanand
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2016
Field of study

Edinburgh Research Explorer

Formal methods for functional verification of cache-coherent systems-on-chip

Author: Kriouile Abderahman
Publication venue: HAL CCSD
Publication date: 17/09/2015
Field of study

State-of-the-art System-on-Chip (SoC) architectures integrate many different components, such as processors, accelerators, memories, and I/O blocks. Some of those components, but not all, may have caches. Because the effort of validation with simulation-based techniques, currently used in industry, grows exponentially with the complexity of the SoC, this thesis investigates the use of formal verification techniques in this context. More precisely, we use the CADP toolbox to develop and validate a generic formal model of a heterogeneous cache-coherent SoC compliant with the recent AMBA 4 ACE specification proposed by ARM. We use a constraint-oriented specification style to model the general requirements of the specification. We verify system properties on both the constrained and unconstrained model to detect the cache coherency corner cases. We take advantage of the parametrization of the proposed model to produce a comprehensive set of counterexamples of non-satisfied properties in the unconstrained model. The results of formal verification are then used to improve the industrial simulation-based verification techniques in two aspects. On the one hand, we suggest using the formal model to assess the sanity of an interface verification unit. On the other hand, in order to generate clever semi-directed test cases from temporal logic properties, we propose a two-step approach. One step consists in generating system-level abstract test cases using model-based testing tools of the CADP toolbox. The other step consists in refining those tests into interface-level concrete test cases that can be executed at RTL level with a commercial Coverage-Directed Test Generation tool. We found that our approach helps in the transition between interface-level and system-level verification, facilitates the validation of system-level properties, and enables early detection of bugs in both the SoC and the commercial test-bench.Les architectures des systèmes sur puce (System-on-Chip, SoC) actuelles intègrent de nombreux composants différents tels que les processeurs, les accélérateurs, les mémoires et les blocs d'entrée/sortie, certains pouvant contenir des caches. Vu que l'effort de validation basée sur la simulation, actuellement utilisée dans l'industrie, croît de façon exponentielle avec la complexité des SoCs, nous nous intéressons à des techniques de vérification formelle. Nous utilisons la boîte à outils CADP pour développer et valider un modèle formel d'un SoC générique conforme à la spécification AMBA 4 ACE récemment proposée par ARM dans le but de mettre en œuvre la cohérence de cache au niveau système. Nous utilisons une spécification orientée contraintes pour modéliser les exigences générales de cette spécification. Les propriétés du système sont vérifié à la fois sur le modèle avec contraintes et le modèle sans contraintes pour détecter les cas intéressants pour la cohérence de cache. La paramétrisation du modèle proposé a permis de produire l'ensemble complet des contre-exemples qui ne satisfont pas une certaine propriété dans le modèle non contraint. Notre approche améliore les techniques industrielles de vérification basées sur la simulation en deux aspects. D'une part, nous suggérons l'utilisation du modèle formel pour évaluer la bonne construction d'une unité de vérification d'interface. D'autre part, dans l'objectif de générer des cas de test semi-dirigés intelligents à partir des propriétés de logique temporelle, nous proposons une approche en deux étapes. La première étape consiste à générer des cas de tests abstraits au niveau système en utilisant des outils de test basé sur modèle de la boîte à outils CADP. La seconde étape consiste à affiner ces tests en cas de tests concrets au niveau de l'interface qui peuvent être exécutés en RTL grâce aux services d'un outil commercial de génération de tests dirigés par les mesures de couverture. Nous avons constaté que notre approche participe dans la transition entre la vérification du niveau interface, classiquement pratiquée dans l'industrie du matériel, et la vérification au niveau système. Notre approche facilite aussi la validation des propriétés globales du système, et permet une détection précoce des bugs, tant dans le SoC que dans les bancs de test commerciales

Thèses en Ligne

Hal - Université Grenoble Alpes

On How to Identify Cache Coherence: Case of the NXP QorIQ T4240

Author: Brunel Julien
Pagetti Claire
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 32nd Euromicro Conference on Real-Time Systems (ECRTS 2020)
Publication date: 01/01/2020
Field of study

Architectures used in safety critical systems have to pass certain certification standards, which require sufficient proof that they will behave as expected. Multi-core processors make this challenging by featuring complex interactions between the tasks they run. A lot of these interactions are made without explicit instructions from the program designers. Furthermore, they can have strong negative impacts on performance (and potentially affect correctness). One important such source of interactions is cache coherence, which speeds up operations in most cases, but can also lead to unexpected variations in execution time if not fully understood. Architecture documentations often lack details on the implementation of cache coherence. We thus propose a strategy to ascertain that the platform does indeed implement the cache coherence protocol its user believes it to. We also apply this strategy to the NXP QorIQ T4240, resulting in the identification of a protocol (MESIF) other than the one this architecture’s documentation led us to believe it was using (MESI)

HAL Descartes

Dagstuhl Research Online Publication Server

Oskar Bordeaux

An Effective Verification Solution for Modern Microprocessors.

Author: Wagner Ilya
Publication venue
Publication date
Field of study

Over the past four decades microprocessors have come to be a vital and inseparable part of the modern world, becoming the digital brain of numerous electronic devices and gadgets that make today's lifestyle possible. Processors are capable of performing computation at astonishingly high speeds and are extremely integrated, occupying only a few square centimeters of silicon die. However, this computational power comes at a price: the task of verifying a modern microprocessor and guaranteeing correctness of its operation is increasingly challenging, even for most established processor vendors. Always attempting to deliver higher performance to end-users, processor manufacturers are forced to design progressively more complex circuits and employ immense verification teams to eliminate critical design bugs in a timely manner. Unfortunately, too often size doesn't seem to matter in verification, as schedules continue to slip and microprocessors find their way to the marketplace with design errors. This work describes a novel verification framework targeting specifically today's complex microprocessors. The scope of the work spans many levels of verification and different phases of the processor life-cycle, from validation of individual sub-modules to complete multi-core system, and from pre-silicon design verification to in-the-field hardware patching. In particular, our StressTest and MCjammer approaches enable efficient generation of high-quality tests at the pre-silicon level for individual cores and multi-core systems, respectively, using machine learning techniques and making the process as automatic as possible. On the other hand, Reversi and Dacota enable low cost validation in post-silicon, while delivering even higher coverage than pre-silicon techniques. Finally, the Field-repairable control logic (FRCL) and Caspar techniques allow designers to patch different classes of escaped errors in processors that are deployed in the field. The integrated set of solutions that we introduce with this thesis empowers processor vendors to drastically shorten their development timeline and, at the same time, to deliver more reliable and correct systems to their customers at a lower cost. Altogether, this work has the potential to solve the long-standing challenge of guaranteeing the complete functional correctness of modern microprocessors.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/61656/1/ivagner_1.pd

Deep Blue Documents at the University of Michigan

Decompose and Conquer: Addressing Evasive Errors in Systems on Chip

Author: Lee Doowon
Publication venue
Publication date
Field of study

Modern computer chips comprise many components, including microprocessor cores, memory modules, on-chip networks, and accelerators. Such system-on-chip (SoC) designs are deployed in a variety of computing devices: from internet-of-things, to smartphones, to personal computers, to data centers. In this dissertation, we discuss evasive errors in SoC designs and how these errors can be addressed efficiently. In particular, we focus on two types of errors: design bugs and permanent faults. Design bugs originate from the limited amount of time allowed for design verification and validation. Thus, they are often found in functional features that are rarely activated. Complete functional verification, which can eliminate design bugs, is extremely time-consuming, thus impractical in modern complex SoC designs. Permanent faults are caused by failures of fragile transistors in nano-scale semiconductor manufacturing processes. Indeed, weak transistors may wear out unexpectedly within the lifespan of the design. Hardware structures that reduce the occurrence of permanent faults incur significant silicon area or performance overheads, thus they are infeasible for most cost-sensitive SoC designs. To tackle and overcome these evasive errors efficiently, we propose to leverage the principle of decomposition to lower the complexity of the software analysis or the hardware structures involved. To this end, we present several decomposition techniques, specific to major SoC components. We first focus on microprocessor cores, by presenting a lightweight bug-masking analysis that decomposes a program into individual instructions to identify if a design bug would be masked by the program's execution. We then move to memory subsystems: there, we offer an efficient memory consistency testing framework to detect buggy memory-ordering behaviors, which decomposes the memory-ordering graph into small components based on incremental differences. We also propose a microarchitectural patching solution for memory subsystem bugs, which augments each core node with a small distributed programmable logic, instead of including a global patching module. In the context of on-chip networks, we propose two routing reconfiguration algorithms that bypass faulty network resources. The first computes short-term routes in a distributed fashion, localized to the fault region. The second decomposes application-aware routing computation into simple routing rules so to quickly find deadlock-free, application-optimized routes in a fault-ridden network. Finally, we consider general accelerator modules in SoC designs. When a system includes many accelerators, there are a variety of interactions among them that must be verified to catch buggy interactions. To this end, we decompose such inter-module communication into basic interaction elements, which can be reassembled into new, interesting tests. Overall, we show that the decomposition of complex software algorithms and hardware structures can significantly reduce overheads: up to three orders of magnitude in the bug-masking analysis and the application-aware routing, approximately 50 times in the routing reconfiguration latency, and 5 times on average in the memory-ordering graph checking. These overhead reductions come with losses in error coverage: 23% undetected bug-masking incidents, 39% non-patchable memory bugs, and occasionally we overlook rare patterns of multiple faults. In this dissertation, we discuss the ideas and their trade-offs, and present future research directions.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147637/1/doowon_1.pd

Deep Blue Documents at the University of Michigan

Observation mechanisms for in-field software-based self-test

Author: Pérez Acle Julio
Publication venue: Udelar.FI
Publication date: 01/01/2019
Field of study

When electronic systems are used in safety critical applications, as in the space, avionic, automotive or biomedical areas, it is required to maintain a very low probability of failures due to faults of any kind. Standards and regulations play a significant role, forcing companies to devise and adopt solutions able to achieve predefined targets in terms of dependability. Different techniques can be used to reduce fault occurrence or to minimize the probability that those faults produce critical failures (e.g., by introducing redundancy). Unfortunately, most of these techniques have a severe impact on the cost of the resulting product and, in some cases, the probability of failures is too large anyway. Hence, a solution commonly used in several scenarios lies on periodically performing a test able to detect the occurrence of any fault before it produces a failure (in-field test). This solution is normally based on forcing the processor inside the Device Under Test to execute a properly written test program, which is able to activate possible faults and to make their effects visible in some observable locations. This approach is also called Software-Based Self-Test, or SBST. If compared with testing in an end of manufacturing scenario, in-field testing has strong limitations in terms of access to the system inputs and outputs because Design for Testability structures and testing equipment are usually not available. As a consequence there are reduced possibilities to activate the faults and to observe their effects. This reduced observability particularly affects the ability to detect performance faults, i.e. faults that modify the timing but not the final value of computations. This kind of faults are hard to detect by only observing the final content of predefined memory locations, that is the usual test result observation method used in-field. Initially, the present work was focused on fault tolerance techniques against transient faults induced by ionizing radiation, the so called Single Event Upsets (SEUs). The main contribution of this early stage of the thesis lies in the experimental validation of the feasibility of achieving a safe system by using an architecture that combines task-level redundancy with already available IP cores, thus minimizing the development time. Task execution is replicated and Memory Protection is used to guarantee that any SEU may affect one and only one of the replicas. A proof of concept implementation was developed and validated using fault injection. Results outline the effectiveness of the architecture, and the overhead analysis shows that the proposed architecture is effective in reducing the resource occupation with respect to N-modular redundancy, at an affordable cost in terms of application execution time. The main part of the thesis is focused on in-field software-based self-test of permanent faults. A set of observation methods exploiting existing or ad-hoc hardware is proposed, aimed at obtaining a better coverage, in particular of performance faults. An extensive quantitative evaluation of the proposed methods is presented, including a comparison with the observation methods traditionally used in end of manufacturing and in-field testing. Results show that the proposed methods are a good complement to the traditionally used final memory content observation. Moreover, they show that an adequate combination of these complementary methods allows for achieving nearly the same fault coverage achieved when continuously observing all the processor outputs, which is an observation method commonly used for production test but usually not available in-field. A very interesting by-product of what is described above is a detailed description of how to compute the fault coverage achieved by functional in-field tests using a conventional fault simulator, a tool that is usually applied in an end of manufacturing testing scenario. Finally, another relevant result in the testing area is a method to detect permanent faults inside the cache coherence logic integrated in each cache controller of a multi-core system, based on the concurrent execution of a test program by the different cores in a coordinated manner. By construction, the method achieves full fault coverage of the static faults in the addressed logic.Cuando se utilizan sistemas electrónicos en aplicaciones críticas como en las áreas biomédica, aeroespacial o automotriz, se requiere mantener una muy baja probabilidad de malfuncionamientos debidos a cualquier tipo de fallas. Los estándares y normas juegan un papel importante, forzando a los desarrolladores a diseñar y adoptar soluciones que sean capaces de alcanzar objetivos predefinidos en cuanto a seguridad y confiabilidad. Pueden utilizarse diferentes técnicas para reducir la ocurrencia de fallas o para minimizar la probabilidad de que esas fallas produzcan mal funcionamientos críticos, por ejemplo a través de la incorporación de redundancia. Lamentablemente, muchas de esas técnicas afectan en gran medida el costo de los productos y, en algunos casos, la probabilidad de malfuncionamiento sigue siendo demasiado alta. En consecuencia, una solución usada a menudo en varios escenarios consiste en realizar periódicamente un test que sea capaz de detectar la ocurrencia de una falla antes de que esta produzca un mal funcionamiento (test en campo). En general, esta solución se basa en forzar a un procesador existente dentro del dispositivo bajo prueba a ejecutar un programa de test que sea capaz de activar las posibles fallas y de hacer que sus efectos sean visibles en puntos observables. A esta metodología también se la llama auto-test basado en software, o en inglés Software-Based Self-Test (SBST). Si se lo compara con un escenario de test de fin de fabricación, el test en campo tiene fuertes limitaciones en términos de posibilidad de acceso a las entradas y salidas del sistema, porque usualmente no se dispone de equipamiento de test ni de la infraestructura de Design for Testability. En consecuencia se tiene menos posibilidades de activar las fallas y de observar sus efectos. Esta observabilidad reducida afecta particularmente la habilidad para detectar fallas de performance, es decir fallas que modifican la temporización pero no el resultado final de los cálculos. Este tipo de fallas es difícil de detectar por la sola observación del contenido final de lugares de memoria, que es el método usual que se utiliza para observar los resultados de un test en campo. Inicialmente, el presente trabajo estuvo enfocado en técnicas para tolerar fallas transitorias inducidas por radiación ionizante, llamadas en inglés Single Event Upsets (SEUs). La principal contribución de esa etapa inicial de la tesis reside en la validación experimental de la viabilidad de obtener un sistema seguro, utilizando una arquitectura que combina redundancia a nivel de tareas con el uso de módulos hardware (IP cores) ya disponibles, que minimiza en consecuencia el tiempo de desarrollo. Se replica la ejecución de las tareas y se utiliza protección de memoria para garantizar que un SEU pueda afectar a lo sumo a una sola de las réplicas. Se desarrolló una implementación para prueba de concepto que fue validada mediante inyección de fallas. Los resultados muestran la efectividad de la arquitectura, y el análisis de los recursos utilizados muestra que la arquitectura propuesta es efectiva en reducir la ocupación con respecto a la redundancia modular con N réplicas, a un costo accesible en términos de tiempo de ejecución. La parte principal de esta tesis se enfoca en el área de auto-test en campo basado en software para la detección de fallas permanentes. Se propone un conjunto de métodos de observación utilizando hardware existente o ad-hoc, con el fin de obtener una mejor cobertura, en particular de las fallas de performance. Se presenta una extensa evaluación cuantitativa de los métodos propuestos, que incluye una comparación con los métodos tradicionalmente utilizados en tests de fin de fabricación y en campo. Los resultados muestran que los métodos propuestos son un buen complemento del método tradicionalmente usado que consiste en observar el valor final del contenido de memoria. Además muestran que una adecuada combinación de estos métodos complementarios permite alcanzar casi los mismos valores de cobertura de fallas que se obtienen mediante la observación continua de todas las salidas del procesador, método comúnmente usado en tests de fin de fabricación, pero que usualmente no está disponible en campo. Un subproducto muy interesante de lo arriba expuesto es la descripción detallada del procedimiento para calcular la cobertura de fallas lograda mediante tests funcionales en campo por medio de un simulador de fallas convencional, una herramienta que usualmente se aplica en escenarios de test de fin de fabricación. Finalmente, otro resultado relevante en el área de test es un método para detectar fallas permanentes dentro de la lógica de coherencia de cache que está integrada en el controlador de cache de cada procesador en un sistema multi procesador. El método está basado en la ejecución de un programa de test en forma coordinada por parte de los diferentes procesadores. Por construcción, el método cubre completamente las fallas de la lógica mencionad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Exploiting canonical dependence chains and address biasing constraints to improve random test generation for shared-memory veridication

Author: Andrade Gabriel Arthur Gerber
Publication venue
Publication date: 01/01/2017
Field of study

Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Ciência da Computação, Florianópolis, 2017.Introdução A verificação funcional do projeto de um sistema com multiprocessamento em chip (CMP) vem se tornando cada vez mais desafiadora por causa da crescente complexidade para suportar a abstração de memória compartilhada coerente, a qual provavelmente manterá seu papel crucial para multiprocessamento em chip, mesmo na escala de centenas de processadores. A verificação funcional baseia-se principalmente na geração de programas de teste aleatórios.Trabalhos Correlatos e Gerador Proposto Embora frameworks de verificação funcional que se baseiam na solução de problemas de satisfação de restrições possuam a vantagem de oferecer uma abordagem unificada para gerar estímulos aleatórios capazes de verificar todo o sistema, eles não são projetados para explorar não-determinismo, que é um importante mecanismo para expôr erros de memória compartilhada. Esta dissertação reporta novas técnicas que se baseiam em lições aprendidas de ambos? os frameworks de verificação de propósitos gerais e as abordagens especializadas em verificar o modelo de memória. Elas exploram restrições sobre endereços e cadeias canônicas de dependência para melhorar a geração de testes aleatórios enquanto mantêm o papel crucial do não-determinismo como um mecanismo-chave para a exposição de erros. Geração de Sequências Ao invés de selecionar instruções aleatoriamente, como faz uma técnica convencional, o gerador proposto seleciona instruções de acordo com cadeias de dependências pré-definidas que são comprovadamente significativas para preservar o modelo de memória sob verificação. Esta dissertação explora cadeias canônicas, definidas por Gharachorloo, para evitar a indução de instruções que, sendo desnecessárias para preservar o modelo de memória sob verificação, resultem na geração de testes ineficazes. Assinalamento de Endereços Em vez de selecionar aleatoriamente padrões binários para servir de endereços efetivos de memória, como faz um gerador convencional, o gerador proposto aceita restrições à formação desses endereços de forma a forçar o alinhamento de objetos em memória, evitar falso compartilhamento entre variáveis e especificar o grau de competição de endereços por uma mesma linha de cache. Avaliação Experimental Um novo gerador, construído com as técnicas propostas, foi comparado com um gerador convencional de testes aleatórios. Ambos foram avaliados em arquiteturas de 8, 16, e 32 núcleos, ao sintetizar 1200 programas de testes distintos para verificar 5 projetos derivados, cada um contendo um diferente tipo de erro (6000 casos de uso por arquitetura). Os testes sintetizados exploraram uma ampla variedade de parâmetros de geração (5 tamanhos de programas, 4 quantidades de posições compartilhadas de memória, 4 mixes de instruções, e 15 sementes aleatórias). Os resultados experimentais mostram que, em comparação com um convencional, o novo gerador tende a expor erros para um maior número de configurações dos parâmetros: ele aumentou em 38% o potencial de expor erros de projeto. Pela análise dos resultados da verificação sobre todo o espectro de parâmetros, descobriu-se que os geradores requerem um número bastante distinto de posições de memória para alcançar sua melhor exposição. Os geradores foram comparados quando cada um explorou a quantidade de posições de memória correspondente à sua melhor exposição. Nestas condições, quando destinados a projetos com 32 núcleos através da exploração de todo o espectro de tamanhos de testes, o novo gerador expôs um tipo de erro tão frequentemente quanto a técnica convencional, dois tipos com 11% mais frequência, um tipo duas vezes, e um tipo 4 vezes mais frequentemente. Com os testes mais longos (64000 operações) ambos os geradores foram capazes de expor todos os tipos de erros, mas o novo gerador precisou de 1,5 a 15 vezes menor esforço para expor cada erro, exceto por um (para o qual uma degradação de 19% foi observada). Conclusões e Perspectivas Com base na avaliação realizada, conclui-se que, quando se escolhe um número suficientemente grande de variáveis compartilhadas como parâmetro, o gerador proposto requer programas de teste mais curtos para expor erros de projeto e, portanto, resulta em menor esforço, quando comparado a um gerador convencional.Abstract : Albeit general functional processor verification frameworks relying on the solution of constraint satisfaction problems have the advantage of offering a unified approach for generating random stimuli to verify the whole system, they are not designed to exploit non-determinism, which is an important mechanism to expose shared-memory errors. This dissertation reports new techniques that build upon the lessons learned from both - the general verification frameworks and the approaches specifically targeting memory-model verification. They exploit address biasing constraints and canonical dependence chains to improve random test generation while keeping the crucial role of non-determinism as a key mechanism to error exposure. A new generator, built with the proposed techniques, was compared to a conventional random test generator. Both were evaluated for 8, 16, and 32-core architectures, when synthesizing 1200 distinct test programs for verifying 5 derivative designs containing each a different type of error (6000 use cases per architecture). The synthesized tests explored a wide variety of generation parameters (5 program sizes, 4 shared-location counts, 4 instruction mixes, and 15 random seeds). The experimental results show that, as compared to a conventional one, the new generator tends to expose errors for a larger number of parameter settings: it increased by 38% the potential for exposing design errors. By analyzing the verification out-comes over the full parameter ranges, we found out that the generators require quite distinct numbers of shared locations to reach best exposure. We compared them when each generator exploited the location count leading to its best exposure. In such conditions, when targeting32-core designs by exploring the whole range of test lengths, the new generator exposed one type of error as often as the conventional technique, two types 11% more often, one type twice as often, and one type4 times as often. With the longest tests (64000 operations) both generators were able to expose all types of errors, but the new generator required from 1.5 to 15 times less effort to expose each error, except for one (for which a degradation of 19% was observed)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositório Institucional da UFSC

Protocol-directed trace signal selection for post-silicon validation

Author: Sharma Abhishek
Publication venue
Publication date: 01/12/2016
Field of study

Due to the increasing complexity of modern digital designs using NoC (network-on-chip) communication, post-silicon validation has become an arduous task that consumes much of the development time of the product. The process of finding the root cause of bugs during post-silicon validation is very difficult because of the lack of observability of all signals on the chip. To increase observability for post-silicon validation, an effective silicon debug technique is to use an on-chip trace buffer to monitor and capture the circuit response of certain selected signals during its post-silicon operation. However, because of area limitations for debug structures on chip and routing concerns, the signals that are selected to be traced are a very small subset of all available signals. Traditionally, these trace signals were chosen manually by system designers who determined what signals may be needed for debug once the design reaches post-silicon. However, because modern digital designs have become very complex with many concurrent processes, this method is no longer reliable. Recent work has concentrated on automating the selection of low-level signals from a gate-level analysis. But none of them has ever been able to interpret the trace signals as high-level meaningful debugging information. In this work, we present an automated protocol-directed trace selection where the guiding force is the set of system-level protocols. We use a probabilistic formulation to select messages for tracing and then further analyze these solutions. This method produces traces that allow a debugger to observe when behavior has deviated from the correct path of execution and localize this incorrect behavior for further analysis. Most importantly, unlike the previous gate-level analysis based methods, this method can be applied during the chip design phase when most of the debug features are also designed. In addition, this method drastically reduces the time needed to select signals, as we automate a currently manual process

Illinois Digital Environment for Access to Learning and Scholarship Repository