Search CORE

95 research outputs found

Rebasing Microarchitectural Research with Industry Traces

Author: Feliu Josue
Jiménez Daniel A,
Perais Arthur
Ros Alberto
Publication venue
Publication date: 01/10/2023
Field of study

Microarchitecture research relies on performance models with various degrees of accuracy and speed. In the past few years, one such model, ChampSim, has started to gain significant traction by coupling ease of use with a reasonable level of detail and simulation speed. At the same time, datacenter class workloads, which are not trivial to set up and benchmark, have become easier to study via the release of hundreds of industry traces following the first Championship Value Prediction (CVP-1) in 2018. A tool was quickly created to port the CVP-1 traces to the ChampSim format, which, as a result, have been used in many recent works. In this paper, we revisit this conversion tool and find that several key aspects of the CVP-1 traces are not preserved by the conversion. We therefore propose an improved converter that addresses most conversion issues as well as patches known limitations of the CVP-1 traces themselves. We evaluate the impact of our changes on two commits of ChampSim, with one used for the first Instruction Championship Prefetching (IPC-1) in 2020. We find that the performance variation stemming from higher accuracy conversion is significant

DIGITUM Universidad de Murcia (España)

POOR MAN’S TRACE CACHE: A VARIABLE DELAY SLOT ARCHITECTURE

Author: Moore Tino C.
Publication venue: Digital Commons @ Michigan Tech
Publication date: 01/01/2022
Field of study

We introduce a novel fetch architecture called Poor Man’s Trace Cache (PMTC). PMTC constructs taken-path instruction traces via instruction replication in static code and inserts them after unconditional direct and select conditional direct control transfer instructions. These traces extend to the end of the cache line. Since available space for trace insertion may vary by the position of the control transfer instruction within the line, we refer to these fetch slots as variable delay slots. This approach ensures traces are fetched along with the control transfer instruction that initiated the trace. Branch, jump and return instruction semantics as well as the fetch unit are modified to utilize traces in delay slots. PMTC yields the following benefits: 1. Average fetch bandwidth increases as the front end can fetch across taken control transfer instructions in a single cycle. 2. The dynamic number of instruction cache lines fetched by the processor is reduced as multiple non contiguous basic blocks along a given path are encountered in one fetch cycle. 3. Replication of a branch instruction along multiple paths provides path separability for branches, which positively impacts branch prediction accuracy. PMTC mechanism requires minimal modifications to the processor’s fetch unit and the trace insertion algorithm can easily be implemented within the assembler without compiler support

Michigan Technological University

The Effect of Instruction Padding on SFI Overhead

Author: Emamdoost Navid
McCamant Stephen
Publication venue: 'Internet Society'
Publication date: 01/01/2018
Field of study

Software-based fault isolation (SFI) is a technique to isolate a potentially faulty or malicious software module from the rest of a system using instruction-level rewriting. SFI implementations on CISC architectures, including Google Native Client, use instruction padding to enforce an address layout invariant and restrict control flow. However this padding decreases code density and imposes runtime overhead. We analyze this overhead, and show that it can be reduced by allowing some execution of overlapping instructions, as long as those overlapping instructions are still safe according to the original per-instruction policy. We implemented this change for both 32-bit and 64-bit x86 versions of Native Client, and analyzed why the performance benefit is higher on 32-bit. The optimization leads to a consistent decrease in the number of instructions executed and savings averaging 8.6% in execution time (over compatible benchmarks from SPECint2006) for x86-32. We describe how to modify the validation algorithm to check the more permissive policy, and extend a machine-checked Coq proof to confirm that the system's security is preserved.Comment: NDSS Workshop on Binary Analysis Research, February 201

arXiv.org e-Print Archive

Crossref

Control-Flow Security.

Author: Arthur William Patrick
Publication venue
Publication date: 01/01/2016
Field of study

Computer security is a topic of paramount importance in computing today. Though enormous effort has been expended to reduce the software attack surface, vulnerabilities remain. In contemporary attacks, subverting the control-flow of an application is often the cornerstone to a successful attempt to compromise a system. This subversion, known as a control-flow attack, remains as an essential building block of many software exploits. This dissertation proposes a multi-pronged approach to securing software control-flow to harden the software attack surface. The primary domain of this dissertation is the elimination of the basic mechanism in software enabling control-flow attacks. I address the prevalence of such attacks by going to the heart of the problem, removing all of the operations that inject runtime data into program control. This novel approach, Control-Data Isolation, provides protection by subtracting the root of the problem; indirect control-flow. Previous works have attempted to address control-flow attacks by layering additional complexity in an effort to shield software from attack. In this work, I take a subtractive approach; subtracting the primary cause of both contemporary and classic control-flow attacks. This novel approach to security advances the state of the art in control-flow security by ensuring the integrity of the programmer-intended control-flow graph of an application at runtime. Further, this dissertation provides methodologies to eliminate the barriers to adoption of control-data isolation while simultaneously moving ahead to reduce future attacks. The secondary domain of this dissertation is technique which leverages the process by which software is engineered, tested, and executed to pinpoint the statements in software which are most likely to be exploited by an attacker, defined as the Dynamic Control Frontier. Rather than reacting to successful attacks by patching software, the approach in this dissertation will move ahead of the attacker and identify the susceptible code regions before they are compromised. In total, this dissertation combines software and hardware design techniques to eliminate contemporary control-flow attacks. Further, it demonstrates the efficacy and viability of a subtractive approach to software security, eliminating the elements underlying security vulnerabilities.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133304/1/warthur_1.pd

Deep Blue Documents at the University of Michigan

A case for (partially) tagged geometric history length branch prediction

Author: Michaud Pierre
Seznec André
Publication venue: North Carolina State University
Publication date: 01/02/2006
Field of study

International audienceIt is now widely admitted that in order to provide state-of-the-art accuracy, a conditional branch predictor must combine several predictions. Recent research has shown that an adder tree is a very effective approach for the prediction combination function. In this paper, we present a more cost effective solution for this prediction combination function for predictors relying on several predictor components indexed with different history lengths. Using geometric history length as the O-GEHL predictor, the TAGE predictor uses (partially) tagged components as the PPM-like predictor. TAGE relies on (partial) hit-miss detection as the prediction computation function. TAGE provides state-of-the-art prediction accuracy on conditional branches. In particular, at equivalent storage budgets, the TAGE predictor significantly outperforms all the predictors that were presented at the Championship Branch Prediction in december 2004. The accuracy of the prediction of the targets of indirect branches is a major issue on some applications. We show that the principles of the TAGE predictor can be directly applied to the prediction of indirect branches. The ITTAGE predictor (Indirect Target TAgged GEometric history length) significantly outperforms previous state-of-the-art indirect target branch predictors. Both TAGE and ITTAGE predictors feature tagged predictor components indexed with distinct history lengths forming a geometric series. They can be associated in a single cost-effective predictor, sharing tables and predictor logic, the COTTAGE predictor (COnditional and indirect Target TAgged GEometric history length)

INRIA a CCSD electronic archive server

Architectural Verification of Four-instruction Superscalar Processor for MIPS I Instruction Set

Author: Anand Anshuman
Publication venue: 'Oklahoma State University Library'
Publication date: 01/12/2003
Field of study

The study undertaken in this thesis tries to tackle this inefficiency by having extra register locations other than the architectural registers called pseudo-registers, and a pointer scheme is followed to reference both architectural and pseudo registers. This scheme renames each logical destination register of an incoming instruction, to a pseudo register referenced by pointers called pseudo-pointers. Two separate lists of these pointers are maintained, one for all types of instructions and the other for only unspeculated instructions. When a branch instruction preceding the speculated instruction is evaluated and it is established that the prediction was correct, the machine state is altered by updating the pointer lists instead of moving the data. As the pointes are only 6-bits, the inefficiency is considerably reduced. This processor scheme is implemented using the Verilog hardware description language (HDL). The following study provides architectural details of each component used in the processor, stressing issues involved in the implementation and methods used to overcome these issues. This study also discusses verification methodology, documenting steps involved in compiling a 'c' program and loading it onto the simulated instructions cache and data cache for simulation. Finally, simulation results are presented for a sample 'c' program verifying the design

SHAREOK repository