Search CORE

8 research outputs found

Verificação de consistência de memória para sistemas integrados multiprocessados

Author: Rambo Eberle Andrey
Publication venue: Florianópolis, SC
Publication date: 01/01/2011
Field of study

Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico. Programa de Pós-Graduação em Ciência da Computação.O multiprocessamento em chip (CMP) mudou o panorama arquitetural dos servidores e computadores pessoais e agora está mudando o modo como os dispositivos pessoais móveis são projetados. CMP requer acesso a variáveis compartilhadas em hierarquias multiníveis sofisticadas onde caches privadas e compartilhadas coexistem. Ele se baseia no suporte em hardware para implicitamente gerenciar o relaxamento da ordem de programa e a atomicidade de escrita de modo a fornecer, na interface software-hardware, uma semântica de memória compartilhada bem definida, que é capturada pelos axiomas de um modelo de consistência de memória (MCM). Este trabalho aborda o problema de verificar se uma representação executável do subsistema de memória implementa um MCM especificado. Técnicas convencionais de verificação codificam os axiomas como arestas de um único grafo orientado, inferem arestas extras a partir de traces de memória e indicam um erro quando um ciclo é detectado. Usando uma abordagem diferente, esta dissertação propõe uma nova técnica que decompõe o problema de verificação em múltiplas instâncias de um problema (estendido) de emparelhamento de vértices em grafos bipartidos. Como a decomposição foi judiciosamente projetada para induzir instâncias independentes, o problema-alvo pode ser resolvido por um algoritmo paralelo de verificação. Também é proposto um gerador de sequências de instruções aleatórias distribuídas em múltiplas threads para estimular o sistema de memória sob verificação. Por ser independente do MCM sob verificação, o gerador proposto pode ser utilizado pela maioria dos verificadores. A técnica proposta, que é comprovadamente completa para diversos MCMs, superou um verificador convencional para um conjunto de 2400 casos de uso gerados aleatoriamente. Em média, o verificador proposto encontrou um maior percentual de faltas (90%) comparado ao convencional (69%) e foi, em média, 272 vezes mais rápido.Chip multiprocessing (CMP) changed the architectural landscape of servers and personal computers and is now changing the way personal mobile devices are designed. CMP requires access to shared variables in sophisticated multilevel hierarchies where private and shared caches coexist. It relies on hardware support to implicitly manage relaxed program order and write atomicity so as to provide, at the hardware-software interface, a well-defined sharedmemory semantics, which is captured by the axioms of a memory consistency model (MCM). This dissertation addresses the problem of checking if an executable representation of the memory system complies with a specified consistency model. Conventional verification techniques encode the axioms as edges of a single directed graph, infer extra edges from memory traces, and indicate an error when a cycle is detected. Unlike them, this dissertation proposes a novel technique that decomposes the verification problem into multiple instances of an extended bipartite graph matching problem. Since the decomposition was judiciously designed to induce independent instances, the target problem can be solved by a parallel verification algorithm. To stimulate the memory system under verification, the dissertation also proposes a generator of multi-threading random-instruction sequences. It complies with an arbitrary MCM and can be used by most checkers. Our technique, which is proven to be complete for several MCMs, outperformed a conventional checker for a suite of 2400 randomly-generated use cases. On average, it found a higher percentage of faults (90%) as compared to that checker (69%) and did it, on average, 272 times faster

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositório Institucional da UFSC

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Improving the Simulation Environment for Computer Architecture

Author: Naranjo Carmona Alberto Javier
Publication venue
Publication date: 28/04/2015
Field of study

This work presents the efforts to improve the simulation environment for computer architecture research through two major contributions: The addition of a three level cache hierarchy and implementation of a statistical sampling simulation framework. Full-system and micro-architectural simulation are the primary and most reliable research tools that the computer architecture community has. However, keeping the simulator up to date with the latest industry products is a challenging task, causing a growing time gap between the release of new commercial products and the implementation of their models in the simulators. Another problem architects have to deal with is the performance gap; the time spent on simulating one instruction is several orders of magnitude bigger than the time the real hardware takes to execute the same instruction. This leads to prohibitively long simulation times that, due to the always efficiency-focused industry trend, is also to be increased. As processors get more complex, so do the simulators. The performance improvement achieved by real hardware changes is too small compared to the overhead induced into the simulator while trying to replicate those same changes. Although a third level (L3) cache hierarchy is a common feature in current processors and its benefits in performance have been known for decades, currently, it is not supported in most full-system simulators. A modern full system simulator was extended to include a third level cache and experiments show that for the PARSEC benchmarks, the performance of the system with L3 is ≈ 30% better than the baseline. On the other hand the implementation of statistical sampling simulation allows a greater improvement in simulation performance while statistics theory guarantees that the subset of instructions executed are a representative sample of the benchmark behaviour. The experiments show a measured CPI error of less than 2.5% while achieving simulation time speed-ups of around 3X

Texas A&M Repository

Scalably Verifiable Cache Coherence

Author: Zhang Meng
Publication venue
Publication date
Field of study

The correctness of a cache coherence protocol is crucial to the system since a subtle bug in the protocol may lead to disastrous consequences. However, the verification of a cache coherence protocol is never an easy task due to the complexity of the protocol. Moreover, as more and more cores are compressed into a single chip, there is an urge for the cache coherence protocol to have higher performance, lower power consumption, and less storage overhead. People perform various optimizations to meet these goals, which unfortunately, further exacerbate the verification problem. The current situation is that there are no efficient and universal methods for verifying a realistic cache coherence protocol for a many-core system. We, as architects, believe that we can alleviate the verification problem by changing the traditional design paradigm. We suggest taking verifiability as a first-class design constraint, just as we do with other traditional metrics, such as performance, power consumption, and area overhead. To do this, we need to incorporate verification effort in the early design stage of a cache coherence protocol and make wise design decisions regarding the verifiability. Such a protocol will be amenable to verification and easier to be verified in a later stage. Specifically, we propose two methods in this thesis for designing scalably verifiable cache coherence protocols. The first method is Fractal Coherence, targeting verifiable hierarchical protocols. Fractal Coherence leverages the fractal idea to design a cache coherence protocol. The self-similarity of the fractal enables the inductive verification of the protocol. Such a verification process is independent of the number of nodes and thus is scalable. We also design example protocols to show that Fractal Coherence protocols can attain comparable performance compared to a traditional snooping or directory protocol. As a system scales hierarchically, Fractal Coherence can perfectly solve the verification problem of the implemented cache coherence protocol. However, Fractal Coherence cannot help if the system scales horizontally. Therefore, we propose the second method, PVCoherence, targeting verifiable flat protocols. PVCoherence is based on parametric verification, a widely used method for verifying the coherence of a flat protocol with infinite number of nodes. PVCoherence captures the fundamental requirements and limitations of parametric verification and proposes a set of guidelines for designing cache coherence protocols that are compatible with parametric verification. As long as designers follow these guidelines, their protocols can be easily verified. We further show that Fractal Coherence and PVCoherence can also facilitate the verification of memory consistency, another extremely challenging problem. One piece of previous work proves that the verification of memory consistency can be decomposed into three steps. The most complex and non-scalable step is the verification of the cache coherence protocol. If we design the protocol following the design methodology of Fractal Coherence or PVCoherence, we can easily verify the cache coherence protocol and overcome the biggest obstacle in the verification of memory consistency. As system expands and cache coherence protocols get more complex, the verification problem of the protocol becomes more prominent. We believe it is time to reconsider the traditional design flow in which verification is totally separated from the design stage. We show that by incorporating the verifiability in the early design stage and designing protocols to be scalably verifiable in the first place, we can greatly reduce the burden of verification. Meanwhile, we perform various experiments and show that we do not lose benefits in performance as well as in other metrics when we obtain the correctness guarantee.Dissertatio

DukeSpace

Reining in the Functional Verification of Complex Processor Designs with Automation, Prioritization, and Approximation

Author: Mammo Biruk Wendimagegn
Publication venue
Publication date: 01/01/2017
Field of study

Our quest for faster and efficient computing devices has led us to processor designs with enormous complexity. As a result, functional verification, which is the process of ascertaining the correctness of a processor design, takes up a lion's share of the time and cost spent on making processors. Unfortunately, functional verification is only a best-effort process that cannot completely guarantee the correctness of a design, often resulting in defective products that may have devastating consequences.Functional verification, as practiced today, is unable to cope with the complexity of current and future processor designs. In this dissertation, we identify extensive automation as the essential step towards scalable functional verification of complex processor designs. Moreover, recognizing that a complete guarantee of design correctness is impossible, we argue for systematic prioritization and prudent approximation to realize fast and far-reaching functional verification solutions. We partition the functional verification effort into three major activities: planning and test generation, test execution and bug detection, and bug diagnosis. Employing a perspective we refer to as the automation, prioritization, and approximation (APA) approach, we develop solutions that tackle challenges across these three major activities. In pursuit of efficient planning and test generation for modern systems-on-chips, we develop an automated process for identifying high-priority design aspects for verification. In addition, we enable the creation of compact test programs, which, in our experiments, were up to 11 times smaller than what would otherwise be available at the beginning of the verification effort. To tackle challenges in test execution and bug detection, we develop a group of solutions that enable the deployment of automatic and robust mechanisms for catching design flaws during high-speed functional verification. By trading accuracy for speed, these solutions allow us to unleash functional verification platforms that are over three orders of magnitude faster than traditional platforms, unearthing design flaws that are otherwise impossible to reach. Finally, we address challenges in bug diagnosis through a solution that fully automates the process of pinpointing flawed design components after detecting an error. Our solution, which identifies flawed design units with over 70% accuracy, eliminates weeks of diagnosis effort for every detected error.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/137057/1/birukw_1.pd

Deep Blue Documents at the University of Michigan

Decompose and Conquer: Addressing Evasive Errors in Systems on Chip

Author: Lee Doowon
Publication venue
Publication date
Field of study

Modern computer chips comprise many components, including microprocessor cores, memory modules, on-chip networks, and accelerators. Such system-on-chip (SoC) designs are deployed in a variety of computing devices: from internet-of-things, to smartphones, to personal computers, to data centers. In this dissertation, we discuss evasive errors in SoC designs and how these errors can be addressed efficiently. In particular, we focus on two types of errors: design bugs and permanent faults. Design bugs originate from the limited amount of time allowed for design verification and validation. Thus, they are often found in functional features that are rarely activated. Complete functional verification, which can eliminate design bugs, is extremely time-consuming, thus impractical in modern complex SoC designs. Permanent faults are caused by failures of fragile transistors in nano-scale semiconductor manufacturing processes. Indeed, weak transistors may wear out unexpectedly within the lifespan of the design. Hardware structures that reduce the occurrence of permanent faults incur significant silicon area or performance overheads, thus they are infeasible for most cost-sensitive SoC designs. To tackle and overcome these evasive errors efficiently, we propose to leverage the principle of decomposition to lower the complexity of the software analysis or the hardware structures involved. To this end, we present several decomposition techniques, specific to major SoC components. We first focus on microprocessor cores, by presenting a lightweight bug-masking analysis that decomposes a program into individual instructions to identify if a design bug would be masked by the program's execution. We then move to memory subsystems: there, we offer an efficient memory consistency testing framework to detect buggy memory-ordering behaviors, which decomposes the memory-ordering graph into small components based on incremental differences. We also propose a microarchitectural patching solution for memory subsystem bugs, which augments each core node with a small distributed programmable logic, instead of including a global patching module. In the context of on-chip networks, we propose two routing reconfiguration algorithms that bypass faulty network resources. The first computes short-term routes in a distributed fashion, localized to the fault region. The second decomposes application-aware routing computation into simple routing rules so to quickly find deadlock-free, application-optimized routes in a fault-ridden network. Finally, we consider general accelerator modules in SoC designs. When a system includes many accelerators, there are a variety of interactions among them that must be verified to catch buggy interactions. To this end, we decompose such inter-module communication into basic interaction elements, which can be reassembled into new, interesting tests. Overall, we show that the decomposition of complex software algorithms and hardware structures can significantly reduce overheads: up to three orders of magnitude in the bug-masking analysis and the application-aware routing, approximately 50 times in the routing reconfiguration latency, and 5 times on average in the memory-ordering graph checking. These overhead reductions come with losses in error coverage: 23% undetected bug-masking incidents, 39% non-patchable memory bugs, and occasionally we overlook rare patterns of multiple faults. In this dissertation, we discuss the ideas and their trade-offs, and present future research directions.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147637/1/doowon_1.pd

Deep Blue Documents at the University of Michigan

An Effective Verification Solution for Modern Microprocessors.

Author: Wagner Ilya
Publication venue
Publication date
Field of study

Over the past four decades microprocessors have come to be a vital and inseparable part of the modern world, becoming the digital brain of numerous electronic devices and gadgets that make today's lifestyle possible. Processors are capable of performing computation at astonishingly high speeds and are extremely integrated, occupying only a few square centimeters of silicon die. However, this computational power comes at a price: the task of verifying a modern microprocessor and guaranteeing correctness of its operation is increasingly challenging, even for most established processor vendors. Always attempting to deliver higher performance to end-users, processor manufacturers are forced to design progressively more complex circuits and employ immense verification teams to eliminate critical design bugs in a timely manner. Unfortunately, too often size doesn't seem to matter in verification, as schedules continue to slip and microprocessors find their way to the marketplace with design errors. This work describes a novel verification framework targeting specifically today's complex microprocessors. The scope of the work spans many levels of verification and different phases of the processor life-cycle, from validation of individual sub-modules to complete multi-core system, and from pre-silicon design verification to in-the-field hardware patching. In particular, our StressTest and MCjammer approaches enable efficient generation of high-quality tests at the pre-silicon level for individual cores and multi-core systems, respectively, using machine learning techniques and making the process as automatic as possible. On the other hand, Reversi and Dacota enable low cost validation in post-silicon, while delivering even higher coverage than pre-silicon techniques. Finally, the Field-repairable control logic (FRCL) and Caspar techniques allow designers to patch different classes of escaped errors in processors that are deployed in the field. The integrated set of solutions that we introduce with this thesis empowers processor vendors to drastically shorten their development timeline and, at the same time, to deliver more reliable and correct systems to their customers at a lower cost. Altogether, this work has the potential to solve the long-standing challenge of guaranteeing the complete functional correctness of modern microprocessors.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/61656/1/ivagner_1.pd

Deep Blue Documents at the University of Michigan