1,873 research outputs found

    Formal Verification of a MESI-based Cache Implementation

    Get PDF
    Cache coherency is crucial to multi-core systems with a shared memory programming model. Coherency protocols have been formally verified at the architectural level with relative ease. However, several subtle issues creep into the hardware realization of cache in a multi-processor environment. The assumption, made in the abstract model, that state transitions are atomic, is invalid for the HDL implementation. Each transition is composed of many concurrent multi-core operations. As a result, even with a blocking bus, several transient states come into existence. Most modern processors optimize communication with a split-transaction bus, this results in further transient states and race conditions. Therefore, the design and verification of cache coherency is increasingly complex and challenging. Simulation techniques are insufficient to ensure memory consistency and the absence of deadlock, livelock, and starvation. At best, it is tediously complex and time consuming to reach confidence in functionality with simulation. Formal methods are ideally suited to identify the numerous race conditions and subtle failures. In this study, we perform formal property verification on the RTL of a multi-core level-1 cache design based on snooping MESI protocol. We demonstrate full-proof verification of the coherence module in JasperGold using complexity reduction techniques through parameterization. We verify that the assumptions needed to constrain inputs of the stand-alone cache coherence module are satisfied as valid assertions in the instantiation environment. We compare results obtained from formal property verification against a state-of-the-art UVM environment. We highlight the benefits of a synergistic collaboration between simulation and formal techniques. We present formal analysis as a generic toolkit with numerous usage models in the digital design process

    CROSS-LAYER CUSTOMIZATION FOR LOW POWER AND HIGH PERFORMANCE EMBEDDED MULTI-CORE PROCESSORS

    Get PDF
    Due to physical limitations and design difficulties, computer processor architecture has shifted to multi-core and even many-core based approaches in recent years. Such architectures provide potentials for sustainable performance scaling into future peta-scale/exa-scale computing platforms, at affordable power budget, design complexity, and verification efforts. To date, multi-core processor products have been replacing uni-core processors in almost every market segment, including embedded systems, general-purpose desktops and laptops, and super computers. However, many issues still remain with multi-core processor architectures that need to be addressed before their potentials could be fully realized. People in both academia and industry research community are still seeking proper ways to make efficient and effective use of these processors. The issues involve hardware architecture trade-offs, the system software service, the run-time management, and user application design, which demand more research effort into this field. Due to the architectural specialties with multi-core based computers, a Cross-Layer Customization framework is proposed in this work, which combines application specific information and system platform features, along with necessary operating system service support, to achieve exceptional power and performance efficiency for targeted multi-core platforms. Several topics are covered with specific optimization goals, including snoop cache coherence protocol, inter-core communication for producer-consumer applications, synchronization mechanisms, and off-chip memory bandwidth limitations. Analysis of benchmark program execution with conventional mechanisms is made to reveal the overheads in terms of power and performance. Specific customizations are proposed to eliminate such overheads with support from hardware, system software, compiler, and user applications. Experiments show significant improvement on system performance and power efficiency

    SIMD based multicore processor for image and video processing

    Get PDF
    制度:新 ; 報告番号:甲3602号 ; 学位の種類:博士(工学) ; 授与年月日:2012/3/15 ; 早大学位記番号:新595

    On How to Identify Cache Coherence: Case of the NXP QorIQ T4240

    Get PDF
    Architectures used in safety critical systems have to pass certain certification standards, which require sufficient proof that they will behave as expected. Multi-core processors make this challenging by featuring complex interactions between the tasks they run. A lot of these interactions are made without explicit instructions from the program designers. Furthermore, they can have strong negative impacts on performance (and potentially affect correctness). One important such source of interactions is cache coherence, which speeds up operations in most cases, but can also lead to unexpected variations in execution time if not fully understood. Architecture documentations often lack details on the implementation of cache coherence. We thus propose a strategy to ascertain that the platform does indeed implement the cache coherence protocol its user believes it to. We also apply this strategy to the NXP QorIQ T4240, resulting in the identification of a protocol (MESIF) other than the one this architecture’s documentation led us to believe it was using (MESI)

    Prototyping Methodologies and Design of Communication-centric Heterogeneous Many-core Architectures

    Get PDF

    Asynchronous Validity Resolution in Sequentially Consistent Shared Virtual Memory

    Get PDF
    Shared Virtual Memory (SVM) is an effort to provide a mechanism for a distributed system, such as a cluster, to execute shared memory parallel programs. Unfortunately, SVM has performance problems due to its underlying distributed architecture. Recent developments have increased performance of SVM by reducing communication. Unfortunately this performance gain was only possible by increasing programming complexity and by restricting the types of programs allowed to execute in the system. Validity resolution is the process of resolving the validity of a memory object such as a page. Current SVM systems use synchronous or deferred validity resolution techniques in which user processing is blocked during the validity resolution process. This is the case even when resolving validity of false shared variables. False-sharing occurs when two or more processes access unrelated variables stored within the same shared block of memory and at least one of the processes is writing. False sharing unnecessarily reduces overall performance of SVM systems?because user processing is blocked during validity resolution although no actual data dependencies exist. This thesis presents Asynchronous Validity Resolution (AVR), a new approach to SVM which reduces the performance losses associated with false sharing while maintaining the ease of programming found with regular shared memory parallel programming methodology. Asynchronous validity resolution allows concurrent user process execution and data validity resolution. AVR is evaluated by com-paring performance of an application suite using both an AVR sequentially con-sistent SVM system and a traditional sequentially consistent (SC) SVM system. The results show that AVR can increase performance over traditional sequentially consistent SVM for programs which exhibit false sharing. Although AVR outperforms regular SC by as much as 26%, performance of AVR is dependent on the number of false-sharing vs. true-sharing accesses, the number of pages in the program’s working set, the amount of user computation that completes per page request, and the internodal round-trip message time in the system. Overall, the results show that AVR could be an important member of the arsenal of tools available to parallel programmers
    corecore