1,026 research outputs found

    Cost-effective compiler directed memory prefetching and bypassing

    Get PDF
    Ever increasing memory latencies and deeper pipelines push memory farther from the processor. Prefetching techniques aim is to bridge these two gaps by fetching data in advance to both the L1 cache and the register file. Our main contribution in this paper is a hybrid approach to the prefetching problem that combines both software and hardware prefetching in a cost-effective way by needing very little hardware support and impacting minimally the design of the processor pipeline. The prefetcher is built on-top of a static memory instruction bypassing, which is in charge of bringing prefetched values in the register file. In this paper we also present a thorough analysis of the limits of both prefetching and memory instruction bypassing. We also compare our prefetching technique with a prior speculative proposal that attacked the same problem, and we show that at much lower cost, our hybrid solution is better than a realistic implementation of speculative prefetching and bypassing. On average, our hybrid implementation achieves a 13% speed-up improvement over a version with software prefetching in a subset of numerical applications and an average of 43% over a version with no software prefetching (achieving up to a 102% for specific benchmarks).Peer ReviewedPostprint (published version

    Low-complexity distributed issue queue

    Get PDF
    As technology evolves, power density significantly increases and cooling systems become more complex and expensive. The issue logic is one of the processor hotspots and, at the same time, its latency is crucial for the processor performance. We present a low-complexity FP issue logic (MB/spl I.bar/distr) that achieves high performance with small energy requirements. The MB/spl I.bar/distr scheme is based on classifying instructions and dispatching them into a set of queues depending on their data dependences. These instructions are selected for issuing based on an estimation of when their operands will be available, so the conventional wakeup activity is not required. Additionally, the functional units are distributed across the different queues. The energy required by the proposed scheme is substantially lower than that required by a conventional issue design, even if the latter has the ability of waking-up only unready operands. MB/spl I.bar/distr scheme reduces the energy-delay product by 35% and the energy-delay product by 18% with respect to a state-of-the-art approach.Peer ReviewedPostprint (published version

    Instruction history management for high-performance microprocessors

    Get PDF
    textHistory-driven dynamic optimization is an important factor in improving instruction throughput in future high-performance microprocessors. Historybased techniques have the ability to improve instruction-level parallelism by breaking program dependencies, eliminating long-latency microarchitecture operations, and improving prioritization within the microarchitecture. However, a combination of factors, such as wider issue widths, smaller transistors, larger die area, and increasing clock frequency, has led to microprocessors that are sensitive to both wire delays and energy consumption. In this environment, the global structures and long-distance communications that characterize current history data management are limiting instruction throughput. This dissertation proposes the ScatterFlow Framework for Instruction History Management. Execution history management tasks, such as history data storage, access, distribution, collection, and modification, are partitioned and dispersed throughout the instruction execution pipeline. History data packets are then associated with active instructions and flow with the instructions as they execute, encountering the history management tasks along the way. Between dynamic instances of the instructions, the history data packets reside in trace-based history storage that is synchronized with the instruction trace cache. Compared to traditional history data management, this ScatterFlow method improves instruction coverage, increases history data access bandwidth, shortens communication distances, improves history data accuracy in many cases, and decreases the effective history data access time. A comparison of general history management effectiveness between the ScatterFlow Framework and traditional hardware tables shows that the ScatterFlow Framework provides superior history maturity and instruction coverage. The unique properties that arise due to trace-based history storage and partitioned history management are analyzed, and novel design enhancements are presented to increase the usefulness of instruction history data within the ScatterFlow Framework. To demonstrate the potential of the proposed framework, specific dynamic optimization techniques are implemented using the ScatterFlow Framework. These illustrative examples combine the history capture advantages with the access latency improvements while exhibiting desirable dynamic energy consumption properties. Compared to a traditional table-based predictor, performing ScatterFlow value prediction improves execution time and reduces dynamic energy consumption. In other detailed examples, ScatterFlowenabled cluster assignment demonstrates improved execution time over previous cluster assignment schemes, and ScatterFlow instruction-level profiling detects more useful execution traits than traditional fixed-size and infinite-size hardware tables.Electrical and Computer Engineerin

    Future value based single assignment program representations and optimizations

    Get PDF
    An optimizing compiler internal representation fundamentally affects the clarity, efficiency and feasibility of optimization algorithms employed by the compiler. Static Single Assignment (SSA) as a state-of-the-art program representation has great advantages though still can be improved. This dissertation explores the domain of single assignment beyond SSA, and presents two novel program representations: Future Gated Single Assignment (FGSA) and Recursive Future Predicated Form (RFPF). Both FGSA and RFPF embed control flow and data flow information, enabling efficient traversal program information and thus leading to better and simpler optimizations. We introduce future value concept, the designing base of both FGSA and RFPF, which permits a consumer instruction to be encountered before the producer of its source operand(s) in a control flow setting. We show that FGSA is efficiently computable by using a series T1/T2/TR transformation, yielding an expected linear time algorithm for combining together the construction of the pruned single assignment form and live analysis for both reducible and irreducible graphs. As a result, the approach results in an average reduction of 7.7%, with a maximum of 67% in the number of gating functions compared to the pruned SSA form on the SPEC2000 benchmark suite. We present a solid and near optimal framework to perform inverse transformation from single assignment programs. We demonstrate the importance of unrestricted code motion and present RFPF. We develop algorithms which enable instruction movement in acyclic, as well as cyclic regions, and show the ease to perform optimizations such as Partial Redundancy Elimination on RFPF

    Assessing Code Obfuscation of Metamorphic JavaScript

    Get PDF
    Metamorphic malware is one of the biggest and most ubiquitous threats in the digital world. It can be used to morph the structure of the target code without changing the underlying functionality of the code, thus making it very difficult to detect using signature-based detection and heuristic analysis. The focus of this project is to analyze Metamorphic JavaScript malware and techniques that can be used to mutate the code in JavaScript. To assess the capabilities of the metamorphic engine, we performed experiments to visualize the degree of code morphing. Further, this project discusses potential methods that have been used to detect metamorphic malware and their potential limitations. Based on the experiments performed, SVM has shown promise when it comes to detecting and classifying metamorphic code with a high accuracy. An accuracy of 86% is observed when classifying benign, malware and metamorphic files

    Robust Watermarking using Hidden Markov Models

    Get PDF
    Software piracy is the unauthorized copying or distribution of software. It is a growing problem that results in annual losses in the billions of dollars. Prevention is a difficult problem since digital documents are easy to copy and distribute. Watermarking is a possible defense against software piracy. A software watermark consists of information embedded in the software, which allows it to be identified. A watermark can act as a deterrent to unauthorized copying, since it can be used to provide evidence for legal action against those responsible for piracy.In this project, we present a novel software watermarking scheme that is inspired by the success of previous research focused on detecting metamorphic viruses. We use a trained hidden Markov model (HMM) to detect a specific copy of software. We give experimental results that show our scheme is robust. That is, we can identify the original software even after it has been extensively modified, as might occur as part of an attack on the watermarking scheme

    An energy efficient dependence driven scalable dispatch scheme

    Get PDF
    This thesis proposes two schemes that target superscalar microarchitectures. The first scheme aims at alleviating the complexity of the dispatch logic by reducing the number of ports to the renamer. The second scheme targets L-1 data cache energy reduction by proposing a cache architecture incorporating IPC-aware dynamic associativity management. The dispatch stage constitutes a key critical path in the scalability of current generation superscalar processors. The rename map table (RMT) access and the dependence check logic (DCL) delays scale unfavorably with the dispatch width (DW) of a superscalar processor. It is a well-known program property that the results of most instructions are consumed within the following 4-6 instruction window. This program behavior can be exploited to reduce the rename delay by reducing the number of read/write ports in the RMT to significantly below the current 3xDW. We propose an algorithm to dynamically allocate reduced number of RMT ports to instructions in the current dispatch window, matching dispatch resources to average needs rather than peak needs. This results in shorter RMT access delays as well as in lower energy in the dispatch stage. The IPC reduction due to rename map table read/write port contention in the proposed scheme stays within 2-4%. The cycle time saved can also be leveraged to support wider dispatch in the same cycle time in order to offset this degradation. Data caches are designed with higher associativities to support data sets corresponding to peak load/store bandwidths. The second part of this thesis explores a more general design space for dynamic associativity (for a 4-way associative cache, consider 1-way, 2-way, and 4-way associative accesses). We use the actual instruction level parallelism exhibited by the instructions surrounding a given load to classify it as belonging to a particular IPC packet. The lookup schedule is fixed in advance for each IPC classifier. The energy savings over SPEC2000 CPU benchmarks average 28.6% for a 32KB, 4-way, L-1 data cache. The resulting performance (IPC) degradation from the dynamic way schedule is restricted to less than 2.25%, mainly because IPC based placement ends up being an excellent classifier

    Applying Deep Learning Techniques to the Analysis of Android APKs

    Get PDF
    Malware targeting mobile devices is a pervasive problem in modern life and as such tools to detect and classify malware are of great value. This paper seeks to demonstrate the effectiveness of Deep Learning Techniques, specifically Convolutional Neural Networks, in detecting and classifying malware targeting the Android operating system. Unlike many current detection techniques, which require the use of relatively rigid features to aid in detection, deep neural networks are capable of automatically learning flexible features which may be more resilient to obfuscation. We present a parsing for extracting sequences of API calls which can be used to describe a hypothetical execution of a given application. We then show how to use this sequence of API calls to successfully classify Android malware using a Convolutional Neural Network

    Memory dependence prediction using store sets

    Full text link
    • …
    corecore