Search CORE

1,462 research outputs found

DIA: A complexity-effective decoding architecture

Author: Falcón Samper Ayose Jesus
Ramírez Bellido Alejandro
Santana Jaria Oliverio J.
Valero Cortés Mateo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Fast instruction decoding is a true challenge for the design of CISC microprocessors implementing variable-length instructions. A well-known solution to overcome this problem is caching decoded instructions in a hardware buffer. Fetching already decoded instructions avoids the need for decoding them again, improving processor performance. However, introducing such special--purpose storage in the processor design involves an important increase in the fetch architecture complexity. In this paper, we propose a novel decoding architecture that reduces the fetch engine implementation cost. Instead of using a special-purpose hardware buffer, our proposal stores frequently decoded instructions in the memory hierarchy. The address where the decoded instructions are stored is kept in the branch prediction mechanism, enabling it to guide our decoding architecture. This makes it possible for the processor front end to fetch already decoded instructions from the memory instead of the original nondecoded instructions. Our results show that using our decoding architecture, a state-of-the-art superscalar processor achieves competitive performance improvements, while requiring less chip area and energy consumption in the fetch architecture than a hardware code caching mechanism.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Recommended from our members

A performance comparison of several superscalar processsor [sic] models with a VLIW processor

Author: Bagherzadeh Nader
Lenell John
Publication venue: eScholarship, University of California
Publication date: 01/01/1992
Field of study

Superscalar and VLIW processors can both execute multiple instructions each cycle. Each employs a different instruction scheduling method to achieve multiple instruction execution. Superscalar processors schedule instructions dynamically, and VLIW processors execute statically scheduled instructions. This paper quantitatively compares various superscalar processor architectures with a Very Long Instruction Word architecture developed at the University of California, Irvine. An architectural overview and performance analysis of the superscalar processor models and VIPER, a VLIW processor designed to take advantage of the parallelizing capabilities of Percolation Scheduling, are presented. The motivation for this comparison is to study the capability of a dynamically scheduled processor to obtain the same performance achieved by a statically scheduled processor, and examine the hardware resources required by each

eScholarship - University of California

The "MIND" Scalable PIM Architecture

Author: Brodowicz Maciej
Sterling Thomas
Publication venue
Publication date: 01/01/2005
Field of study

MIND (Memory, Intelligence, and Network Device) is an advanced parallel computer architecture for high performance computing and scalable embedded processing. It is a Processor-in-Memory (PIM) architecture integrating both DRAM bit cells and CMOS logic devices on the same silicon die. MIND is multicore with multiple memory/processor nodes on each chip and supports global shared memory across systems of MIND components. MIND is distinguished from other PIM architectures in that it incorporates mechanisms for efficient support of a global parallel execution model based on the semantics of message-driven multithreaded split-transaction processing. MIND is designed to operate either in conjunction with other conventional microprocessors or in standalone arrays of like devices. It also incorporates mechanisms for fault tolerance, real time execution, and active power management. This paper describes the major elements and operational methods of the MIND architecture

Caltech Authors

Simulation-based Fault Injection with QEMU for Speeding-up Dependability Analysis of Embedded Software

Author: Ferraretto Davide
Pravadelli Graziano
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Simulation-based fault injection (SFI) represents a valuable solu- tion for early analysis of software dependability and fault tolerance properties before the physical prototype of the target platform is available. Some SFI approaches base the fault injection strategy on cycle-accurate models imple- mented by means of Hardware Description Languages (HDLs). However, cycle- accurate simulation has revealed to be too time-consuming when the objective is to emulate the effect of soft errors on complex microprocessors. To overcome this issue, SFI solutions based on virtual prototypes of the target platform has started to be proposed. However, current approaches still present some draw- backs, like, for example, they work only for specific CPU architectures, or they require code instrumentation, or they have a different target (i.e., design errors instead of dependability analysis). To address these disadvantages, this paper presents an efficient fault injection approach based on QEMU, one of the most efficient and popular instruction-accurate emulator for several microprocessor architectures. As main goal, the proposed approach represents a non intrusive technique for simulating hardware faults affecting CPU behaviours. Perma- nent and transient/intermittent hardware fault models have been abstracted without losing quality for software dependability analysis. The approach mini- mizes the impact of the fault injection procedure in the emulator performance by preserving the original dynamic binary translation mechanism of QEMU. Experimental results for both x86 and ARM processors proving the efficiency and effectiveness of the proposed approach are presented

Crossref

Catalogo dei prodotti della ricerca

Instruction prefetch strategies in a pipelined processor

Author: McLellan Hubert Rae
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1983
Field of study

Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1983.MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERINGIncludes bibliographical references.by Hubert Rae McLellan, Jr.M.S

DSpace@MIT

Integrating a floating point unit into the AT&T Hobbit(tm) microprocessor

Author: Holler Paul T.
Publication venue: Lehigh Preserve
Publication date
Field of study

Lehigh University: Lehigh Preserve

A low-power cache system for high-performance processors

Author: Ye Jiongyao
Publication venue
Publication date: 01/01/2011
Field of study

制度:新 ; 報告番号:甲3439号 ; 学位の種類:博士(工学) ; 授与年月日:12-Sep-11 ; 早大学位記番号:新576

Waseda University Repository

Decoding billions of integers per second through vectorization

Author: Aksyonoff A
Büttcher S
Jones DM
Witten IH
Publication venue: 'Wiley'
Publication date: 01/01/2015
Field of study

In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time. Therefore, substantial effort has been made to reduce costs associated with compression and decompression. In particular, researchers have exploited the superscalar nature of modern processors and SIMD instructions. Nevertheless, we introduce a novel vectorized scheme called SIMD-BP128 that improves over previously proposed vectorized approaches. It is nearly twice as fast as the previously fastest schemes on desktop processors (varint-G8IU and PFOR). At the same time, SIMD-BP128 saves up to 2 bits per integer. For even better compression, we propose another new vectorized scheme (SIMD-FastPFOR) that has a compression ratio within 10% of a state-of-the-art scheme (Simple-8b) while being two times faster during decoding.Comment: For software, see https://github.com/lemire/FastPFor, For data, see http://boytsov.info/datasets/clueweb09gap

arXiv.org e-Print Archive

R-libre

Crossref

An Investigation of thread scheduling heuristics for a simultaneous multithreaded processor

Author: Zajac Ralph, Jr
Publication venue: RIT Scholar Works
Publication date: 01/09/2000
Field of study

Over the years, the von Neumann model of computing has undergone many enhancements. These changes include an improved memory hierarchy, multiple instruction issue and branch predic tion. Since the model\u27s introduction, the performance of processors has increased at a much greater rate than that of memory. Several modifications to hide this ever widening gap in performance are being examined in current research. A very promising one is the Simultaneous Multithreaded processor. This architecture strives to further reduce the effects of long latency instructions, such as memory accesses, by allowing multiple threads of execution to be active in the processor at the same time. With the introduction of multiple active threads in a single processor, several new aspects of processor operation can have a sizeable effect on performance. One such aspect is how to choose from which thread to fetch instructions during the next cycle. For this project, three different classes of fetch scheduling mechanisms were defined and exam ples of each were either studied or proposed. The proposed mechanisms were then tested using a set of four sample programs by adding the mechanisms to a Simultaneous Multithreading sim ulator based on the Simple Scalar tool set from the University of Wisconsin-Madison. With the proper configuration, each of the proposed mechanisms improved the performance of the simulated architecture. However, the best increase in performance was produced by the Event History Table. It achieved an IPC of 2.0995 for two threads while overriding the primary scheduling mechanism only 0.070% of the time

RIT Scholar Works