Search CORE

1,240 research outputs found

An interface aware guided search method for error-trace justification in large protocols

Author: Chen Xiaofang
Gopalakrishnan Ganesh
Publication venue: University of Utah
Publication date: 01/01/2008
Field of study

technical reportMany complex concurrent protocols that cannot be formally verified due to state explosion can often be formally verified by initially creating a collection of abstractions (overapproximations), and subsequently refining the overapproximated protocol in response to spurious counterexample traces. Such an approach crucially depends on the ability to check whether a given error trace in the abstract protocol corresponds to a concrete trace in the original protocol. Unfortunately, this checking step alone can be as as hard verifying the original protocol directly without abstractions, which is infeasible. Our approach tracks the interface behavior at the interfaces erected by our abstractions, and employs a few heuristic search methods based on a classification of the abstract system generating these traces. This collection of heuristic search methods form a tailor-made guided search strategy that works very efficiently in practice on three realistic multicore hierarchical cache coherence protocols. It could correctly analyze ?? ?? spurious error traces and genuine error scenarios, each within seconds. Also, on ?? of the ?? ?? of the spurious errors, our approach can precisely report which transition in the abstract protocol is overly approximated that leads to the spurious error

The University of Utah: J. Willard Marriott Digital Library

A general compositional approach to verifying hierarchical cache coherence protocols

Author: Chen Xiaofang
Gopalakrishnan Ganesh
Publication venue: University of Utah
Publication date: 01/01/2006
Field of study

technical reportModern chip multiprocessor (CMP) cache coherence protocols are extremely complex and error prone to design. Modern symbolic methods are unable to provide much leverage for this class of examples. In [1], we presented a method to verify hierarchical and inclusive versions of these protocols using explicit state enumeration tools. We circumvented state explosion by employing a meta-circular assume/guarantee technique in which a designer can model check abstracted versions of the original protocol and claim that the real protocol is correct. The abstractions were justified in the same framework (hence the meta-circular approach). In this paper, we present how our work can be extended to hierarchical non-inclusive protocols which are inherently much harder to verify, both from the point of having more corner cases, and having insufficient information in higher levels of the protocol hierarchy to imply the sharing states of cache lines at lower levels. Two methods are proposed. The first requires more manual effort, but allows our technique in [1] to be applied unchanged, barring a guard strengthening expression that is computed based on state residing outside the cluster being abstracted. The second requires less manual effort, can scale to deeper hierarchies of protocol implementations, and uses history variables which are computed much more modularly. This method also relies on the meta-circular definition framework. A non-inclusive protocol that could not be completely model checked even after visiting 1.5 billion states was verified using two model checks of roughly 0.25 billion states each

CiteSeerX

The University of Utah: J. Willard Marriott Digital Library

Formal Verification of a MESI-based Cache Implementation

Author: Kottapalli Venkateshwar
Publication venue
Publication date: 05/02/2018
Field of study

Cache coherency is crucial to multi-core systems with a shared memory programming model. Coherency protocols have been formally verified at the architectural level with relative ease. However, several subtle issues creep into the hardware realization of cache in a multi-processor environment. The assumption, made in the abstract model, that state transitions are atomic, is invalid for the HDL implementation. Each transition is composed of many concurrent multi-core operations. As a result, even with a blocking bus, several transient states come into existence. Most modern processors optimize communication with a split-transaction bus, this results in further transient states and race conditions. Therefore, the design and verification of cache coherency is increasingly complex and challenging. Simulation techniques are insufficient to ensure memory consistency and the absence of deadlock, livelock, and starvation. At best, it is tediously complex and time consuming to reach confidence in functionality with simulation. Formal methods are ideally suited to identify the numerous race conditions and subtle failures. In this study, we perform formal property verification on the RTL of a multi-core level-1 cache design based on snooping MESI protocol. We demonstrate full-proof verification of the coherence module in JasperGold using complexity reduction techniques through parameterization. We verify that the assumptions needed to constrain inputs of the stand-alone cache coherence module are satisfied as valid assertions in the instantiation environment. We compare results obtained from formal property verification against a state-of-the-art UVM environment. We highlight the benefits of a synergistic collaboration between simulation and formal techniques. We present formal analysis as a generic toolkit with numerous usage models in the digital design process

Texas A&M Repository

Scalably Verifiable Cache Coherence

Author: Zhang Meng
Publication venue
Publication date
Field of study

The correctness of a cache coherence protocol is crucial to the system since a subtle bug in the protocol may lead to disastrous consequences. However, the verification of a cache coherence protocol is never an easy task due to the complexity of the protocol. Moreover, as more and more cores are compressed into a single chip, there is an urge for the cache coherence protocol to have higher performance, lower power consumption, and less storage overhead. People perform various optimizations to meet these goals, which unfortunately, further exacerbate the verification problem. The current situation is that there are no efficient and universal methods for verifying a realistic cache coherence protocol for a many-core system. We, as architects, believe that we can alleviate the verification problem by changing the traditional design paradigm. We suggest taking verifiability as a first-class design constraint, just as we do with other traditional metrics, such as performance, power consumption, and area overhead. To do this, we need to incorporate verification effort in the early design stage of a cache coherence protocol and make wise design decisions regarding the verifiability. Such a protocol will be amenable to verification and easier to be verified in a later stage. Specifically, we propose two methods in this thesis for designing scalably verifiable cache coherence protocols. The first method is Fractal Coherence, targeting verifiable hierarchical protocols. Fractal Coherence leverages the fractal idea to design a cache coherence protocol. The self-similarity of the fractal enables the inductive verification of the protocol. Such a verification process is independent of the number of nodes and thus is scalable. We also design example protocols to show that Fractal Coherence protocols can attain comparable performance compared to a traditional snooping or directory protocol. As a system scales hierarchically, Fractal Coherence can perfectly solve the verification problem of the implemented cache coherence protocol. However, Fractal Coherence cannot help if the system scales horizontally. Therefore, we propose the second method, PVCoherence, targeting verifiable flat protocols. PVCoherence is based on parametric verification, a widely used method for verifying the coherence of a flat protocol with infinite number of nodes. PVCoherence captures the fundamental requirements and limitations of parametric verification and proposes a set of guidelines for designing cache coherence protocols that are compatible with parametric verification. As long as designers follow these guidelines, their protocols can be easily verified. We further show that Fractal Coherence and PVCoherence can also facilitate the verification of memory consistency, another extremely challenging problem. One piece of previous work proves that the verification of memory consistency can be decomposed into three steps. The most complex and non-scalable step is the verification of the cache coherence protocol. If we design the protocol following the design methodology of Fractal Coherence or PVCoherence, we can easily verify the cache coherence protocol and overcome the biggest obstacle in the verification of memory consistency. As system expands and cache coherence protocols get more complex, the verification problem of the protocol becomes more prominent. We believe it is time to reconsider the traditional design flow in which verification is totally separated from the design stage. We show that by incorporating the verifiability in the early design stage and designing protocols to be scalably verifiable in the first place, we can greatly reduce the burden of verification. Meanwhile, we perform various experiments and show that we do not lose benefits in performance as well as in other metrics when we obtain the correctness guarantee.Dissertatio

DukeSpace

Smart hardware designs for probabilistically-analyzable processor architectures

Author: Benedicte Illescas Pedro
Publication venue: Universitat Politècnica de Catalunya
Publication date: 07/04/2022
Field of study

Future Critical Real-Time Embedded Systems (CRTES), like those is planes, cars or trains, require more and more guaranteed performance in order to satisfy the increasing performance demands of advanced complex software features. While increased performance can be achieved by deploying processor techniques currently used in High-Performance Computing (HPC) and mainstream domains, their use challenges the software timing analysis, a necessary step in CRTES' verification and validation. Cache memories are known to have high impact in performance, and in fact, current CRTES include multicores usually with several levels of cache. In this line, this Thesis aims at increasing the guaranteed performance of CRTES by using techniques for caches building upon time randomization and providing probabilistic guarantees of tasks' execution time. In this Thesis, we first focus on on improving cache placement and replacement to improve guaranteed performance. For placement, different existing policies are explored in a multi-level cache setup, and a solution is reached in which different of those policies are combined. For cache replacement, we analyze a pathological scenario that no cache policy so far accounts and propose several policies that fix this pathological scenario. For shared caches in multicore we observe that contention is mainly caused by private writes that go through to the shared cache, yet using a pure write-back policy also has its drawbacks. We propose a hybrid approach to mitigate this contention. Building on this solution, the next contribution tackles a problem caused by the need of some reliability mechanisms in CRTES. Implementing reliability close to the processor's core has a significant impact in performance. A look-ahead error detection solution is proposed to greatly mitigate the performance impact. The next contribution proposes the first hardware prefetcher for CRTES with arbitrary cache hierarchies. Given its speculative nature, prefetchers that have a guaranteed positive impact on performance are difficult to design. We present a framework that provides execution time guarantees and obtains a performance benefit. Finally, we focus on the impact of timing anomalies in CRTES with caches. For the first time, a definition and taxonomy of timing anomalies is given for Measurement-Based Timing Analysis. Then, we focus on a specific timing anomaly that can happen with caches and provide a solution to account for it in the execution time estimates.Los Sistemas Empotrados de Tiempo-Real Crítico (SETRC), como los de los aviones, coches o trenes, requieren más y más rendimiento garantizado para satisfacer la demanda al alza de rendimiento para funciones complejas y avanzadas de software. Aunque el incremento en rendimiento puede ser adquirido utilizando técnicas de arquitectura de procesadores actualmente utilizadas en la Computación de Altas Prestaciones (CAP) i en los dominios convencionales, este uso presenta retos para el análisis del tiempo de software, un paso necesario en la verificación y validación de SETRC. Las memorias caches son conocidas por su gran impacto en rendimiento y, de hecho, los actuales SETRC incluyen multicores normalmente con diversos niveles de cache. En esta línea, esta Tesis tiene como objetivo mejorar el rendimiento garantizado de los SETRC utilizando técnicas para caches y utilizando métodos como la randomización del tiempo y proveyendo garantías probabilísticas de tiempo de ejecución de las tareas. En esta Tesis, primero nos centramos en mejorar la colocación y el reemplazo de caches para mejorar el rendimiento garantizado. Para la colocación, diferentes políticas son exploradas en un sistema cache multi-nivel, y se llega a una solución donde diversas de estas políticas son combinadas. Para el reemplazo, analizamos un escenario patológico que ninguna política actual tiene en cuenta, y proponemos varias políticas que solucionan este escenario patológico. Para caches compartidas en multicores, observamos que la contención es causada principalmente por escrituras privadas que van a través de la cache compartida, pero usar una política de escritura retardada pura también tiene sus consecuencias. Proponemos un enfoque híbrido para mitigar la contención. Sobre esta solución, la siguiente contribución ataca un problema causado por la necesidad de mecanismos de fiabilidad en SETRC. Implementar fiabilidad cerca del núcleo del procesador tiene un impacto significativo en rendimiento. Una solución basada en anticipación se propone para mitigar el impacto en rendimiento. La siguiente contribución propone el primer prefetcher hardware para SETRC con una jerarquía de caches arbitraria. Por primera vez, se da una definición y taxonomía de anomalías temporales para Análisis Temporal Basado en Medidas. Después, nos centramos en una anomalía temporal concreta que puede pasar con caches y ofrecemos una solución que la tiene en cuenta en las estimaciones del tiempo de ejecución.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Automatic generation of highly concurrent, hierarchical and heterogeneous cache coherence protocols from atomic specifications

Author: Oswald Nicolai Alexander
Publication venue: The University of Edinburgh
Publication date: 24/05/2023
Field of study

Cache coherence protocols are often specified using only stable states and atomic transactions for a single cache hierarchy level. Designing highly-concurrent, hierarchical and heterogeneous directory cache coherence protocols from these atomic specifications for modern multicore architectures is a complicated task. To overcome these design challenges we have developed the novel *Gen algorithms (ProtoGen, HieraGen and HeteroGen). Using the *Gen algorithms highly-concurrent, hierarchical and heterogeneous cache coherence protocols can be automatically generated for a wide range of atomic input stable state protocol (SSP) speci fications - including the MOESI variants, as well as for protocols that are targeted towards Total Store Order and Release Consistency. In addition, for each *Gen algorithm we have developed and published an eponymous tool. The ProtoGen tool takes as input a single SSP (i.e., no concurrency) generating the corresponding protocol for a multicore architecture with non-atomic transactions. The ProtoGen algorithm automatically enforces the correct interleaving of conflicting coherence transactions for a given atomic coherence protocol specification. HieraGen is a tool for automatically generating hierarchical cache coherence protocols. Its inputs are SSPs for each level of the hierarchy and its output is a highly concurrent hierarchical protocol. HieraGen thus reduces the complexity that architects face by offloading the challenging task of composing protocols and managing concurrency. HeteroGen is a tool for automatically generating heterogeneous protocols that adhere to precise consistency models. As input, HeteroGen takes SSPs of the per-cluster coherence protocols, each of which satisfies its own per-cluster consistency model. The output is a concurrent (i.e., with transient states) heterogeneous protocol that satisfies a precisely defined consistency model that we refer to as a compound consistency model. To validate the correctness of the *Gen algorithms, the generated output protocols were verified for safety and deadlock freedom using a model checker. To verify the correctness of protocols that need to adhere to a specific compound consistency model generated by HeteroGen, novel litmus tests for multiple compound consistency models were developed. The protocols automatically generated using the *Gen tools have a comparable or better performance than manually generated cache coherence protocols, often discovering opportunities to reduce stalls. Thus, the *Gen tools reduce the complexity that architects face by offloading the challenging tasks of composing protocols and managing concurrency

Edinburgh Research Archive

SIMD based multicore processor for image and video processing

Author: He Xun
Publication venue
Publication date: 01/01/2012
Field of study

制度:新 ; 報告番号:甲3602号 ; 学位の種類:博士(工学) ; 授与年月日:2012/3/15 ; 早大学位記番号:新595

Waseda University Repository