Search CORE

1,328 research outputs found

SEPARATING INSTRUCTION FETCHES FROM MEMORY ACCESSES : ILAR (INSTRUCTION LINE ASSOCIATIVE REGISTERS)

Author: Lim Nien Yi
Publication venue: UKnowledge
Publication date: 01/01/2009
Field of study

Due to the growing mismatch between processor performance and memory latency, many dynamic mechanisms which are “invisible” to the user have been proposed: for example, trace caches and automatic pre-fetch units. However, these dynamic mechanisms have become inadequate due to implicit memory accesses that have become so expensive. On the other hand, compiler-visible mechanisms like SWAR (SIMD Within A Register) and LARs (Line Associative Registers) are potentially more effective at improving data access performance. This thesis investigates applying the same ideas to improve instruction access. ILAR (Instruction LARs) store instructions in wide registers. Instruction blocks are explicitly loaded into ILAR, using block compression to enhance memory bandwidth. The control flow of the program then refers to instructions directly by their position within an ILAR, rather than by lengthy memory addresses. Because instructions are accessed directly from within registers, there is no implicit instruction fetch from memory. This thesis proposes an instruction set architecture for ILAR, investigates a mechanism to load ILAR using the best available block compression algorithm and also develop hardware descriptions for both ILAR and a conventional memory cache model so that performance comparisons could be made on the instruction fetch stage

University of Kentucky

Elliptic Curve Cryptography on Modern Processor Architectures

Author: Costigan Neil
Publication venue: Dublin City University. School of Computing
Publication date: 25/03/2010
Field of study

Abstract Elliptic Curve Cryptography (ECC) has been adopted by the US National Security Agency (NSA) in Suite "B" as part of its "Cryptographic Modernisation Program ". Additionally, it has been favoured by an entire host of mobile devices due to its superior performance characteristics. ECC is also the building block on which the exciting field of pairing/identity based cryptography is based. This widespread use means that there is potentially a lot to be gained by researching efficient implementations on modern processors such as IBM's Cell Broadband Engine and Philip's next generation smart card cores. ECC operations can be thought of as a pyramid of building blocks, from instructions on a core, modular operations on a finite field, point addition & doubling, elliptic curve scalar multiplication to application level protocols. In this thesis we examine an implementation of these components for ECC focusing on a range of optimising techniques for the Cell's SPU and the MIPS smart card. We show significant performance improvements that can be achieved through of adoption of EC

Irish Universities

DCU Online Research Access Service

A Comparative Xeon and CBE Performance Analysis

Author: Fort Randy
Publication venue: SJSU ScholarWorks
Publication date: 01/01/2008
Field of study

The Cell Broadband Engine is a high performance multicore processor with superb performance on certain types of problems. However, it does not perform as well running other algorithms, particularly those with heavy branching. The Intel Xeon processor is a high performance superscalar processor. It utilizes a high clock speed and deep pipelines to help it achieve superior performance. But deep pipelines can perform poorly with frequent memory accesses. This paper is a study and attempt at quantifying the types of programmatic structures that are more suitable to a particular architecture. It focuses on the issues of pipelines, memory access and branching on these two microprocessor architectures

SJSU ScholarWorks

Decision Support Database Management System Acceleration Using Vector Processor

Author: Hayes Timothy
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2011
Field of study

English: This work takes a top-down approach to accelerating decision support systems (DSS) on x86-64 microprocessors using true vector ISA extensions. First, a state of art DSS database management system (DBMS) is pro led and bottlenecks are identi ed. From this, the bottlenecked functions are analysed for data-level parallelism and a discussion is given as to why the existing multimedia SIMD extensions (SSE) are not suitable for capturing this parallelism. A vector ISA is derived from what is found to be necessary in these functions; additionally, a complementary microarchitecture is proposed that draws on prior research done in vector microprocessors but is also optimised for the properties found in the pro led application. Finally, the ISA and microarchitecture are implemented and evaluated using a cycle-accurate x86-64 microarchitecture simulator

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Design of a Five Stage Pipeline CPU with Interruption System

Author: Abdulraqeb Alnabihi
Publication venue: Global Journals Inc. (US)
Publication date: 17/03/2015
Field of study

A central processing unit (CPU), also referred to as a central processor unit, is the hardware within a computer that carries out the instructions of a computer program by performing the basic arithmetical, logical, and input/output operations of the system. The term has been in use in the computer industry at least since the early 1960s.The form, design, and implementation of CPUs have changed over the course of their history, but their fundamental operation remains much the same. A computer can have more than one CPU; this is called multiprocessing. All modern CPUs are microprocessors, meaning contained on a single chip. Some integrated circuits (ICs) can contain multiple CPUs on a single chip; those ICs are called multi-core processors. An IC containing a CPU can also contain peripheral devices, and other components of a computer system; this is called a system on a chip (SoC).Two typical components of a CPU are the arithmetic logic unit (ALU), which performs arithmetic and logical operations, and the control unit (CU), which extracts instructions from memory and decodes and executes them, calling on the ALU when necessary. Not all computational systems rely on a central processing unit. An array processor or vector processor has multiple parallel computing elements, with no one unit considered the "center". In the distributed computing model, problems are solved by a distributed interconnected set of processors

Global Journal of Computer Science and Technology (GJCST)

Future Computer Requirements for Computational Aerodynamics

Author
Publication venue
Publication date
Field of study

Recent advances in computational aerodynamics are discussed as well as motivations for and potential benefits of a National Aerodynamic Simulation Facility having the capability to solve fluid dynamic equations at speeds two to three orders of magnitude faster than presently possible with general computers. Two contracted efforts to define processor architectures for such a facility are summarized

NASA Technical Reports Server

MIRACLE: MIcRo-ArChitectural Leakage Evaluation

Author: Marshall Ben
Page Daniel
Webb James
Publication venue: 'Universitatsbibliothek der Ruhr-Universitat Bochum'
Publication date: 01/11/2021
Field of study

In this paper, we describe an extensible experimental infrastructure for evaluating the micro-architectural leakage, based on power consumption, that stems from a physical device. Building on existing literature, we use it to systematically study 14 different devices, which span 4 different instruction set architectures and 4 different vendors. The study allows a characterisation of each device with respect to any leakage effects stemming from sources within the micro-architectural implementation. We use it, for example, to identify and document several novel leakage effects (e.g., due to speculative instruction execution), and scenarios where an assumption about leakage is non-portable between different yet compatible devices. Ours is the widest study of its kind we are aware of, and highlights a range of challenges with respect to 1) the design, implementation, and evaluation of, e.g., masking schemes, 2) construction of accurate leakage models, and 3) selection of suitable devices for experimental research. For example, in relation to 1), we cast further doubt on whether a given device upholds the assumptions required by a given masking scheme; in relation to 2), we conclude that (statistical or formal) device leakage models must include information about the micro-architecture being modelled; in relation to 3), we claim the near mono-culture of devices that dominates existing literature is insufficient to support general claims regarding leakage. This is particularly important in the context of the FIPS 140-3 standard for non-invasive side-channel evaluation

Directory of Open Access Journals

Explore Bristol Research

Feasibility study for a numerical aerodynamic simulation facility. Volume 1

Author: Bergman R. O.
Bonstrom D. B.
Brinkman T. W.
Chiu S. H. J.
Green S. S.
Hansen S. D.
Klein D. L.
Krohn H. E.
Lincoln N. R.
Prow R. P.
Publication venue
Publication date
Field of study

A Numerical Aerodynamic Simulation Facility (NASF) was designed for the simulation of fluid flow around three-dimensional bodies, both in wind tunnel environments and in free space. The application of numerical simulation to this field of endeavor promised to yield economies in aerodynamic and aircraft body designs. A model for a NASF/FMP (Flow Model Processor) ensemble using a possible approach to meeting NASF goals is presented. The computer hardware and software are presented, along with the entire design and performance analysis and evaluation

NASA Technical Reports Server