Search CORE

4,243 research outputs found

Instruction fetch architectures and code layout optimizations

Author: Larriba Pey Josep
Ramírez Bellido Alejandro
Valero Cortés Mateo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2001
Field of study

The design of higher performance processors has been following two major trends: increasing the pipeline depth to allow faster clock rates, and widening the pipeline to allow parallel execution of more instructions. Designing a higher performance processor implies balancing all the pipeline stages to ensure that overall performance is not dominated by any of them. This means that a faster execution engine also requires a faster fetch engine, to ensure that it is possible to read and decode enough instructions to keep the pipeline full and the functional units busy. This paper explores the challenges faced by the instruction fetch stage for a variety of processor designs, from early pipelined processors, to the more aggressive wide issue superscalars. We describe the different fetch engines proposed in the literature, the performance issues involved, and some of the proposed improvements. We also show how compiler techniques that optimize the layout of the code in memory can be used to improve the fetch performance of the different engines described Overall, we show how instruction fetch has evolved from fetching one instruction every few cycles, to fetching one instruction per cycle, to fetching a full basic block per cycle, to several basic blocks per cycle: the evolution of the mechanism surrounding the instruction cache, and the different compiler optimizations used to better employ these mechanisms.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Asymptotic Analysis of Plausible Tree Hash Modes for SHA-3

Author: Atighehchi Kevin
Bonnecaze Alexis
Publication venue
Publication date: 01/01/2017
Field of study

Discussions about the choice of a tree hash mode of operation for a standardization have recently been undertaken. It appears that a single tree mode cannot address adequately all possible uses and specifications of a system. In this paper, we review the tree modes which have been proposed, we discuss their problems and propose remedies. We make the reasonable assumption that communicating systems have different specifications and that software applications are of different types (securing stored content or live-streamed content). Finally, we propose new modes of operation that address the resource usage problem for the three most representative categories of devices and we analyse their asymptotic behavior

arXiv.org e-Print Archive

HAL - Normandie Université

HAL AMU

Directory of Open Access Journals

Ruhr-Universität Bochum (RUB): Open Journal Systems

Cryptology ePrint Archive

Software trace cache

Author: Larriba Pey Josep
Ramírez Bellido Alejandro
Valero Cortés Mateo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

We explore the use of compiler optimizations, which optimize the layout of instructions in memory. The target is to enable the code to make better use of the underlying hardware resources regardless of the specific details of the processor/architecture in order to increase fetch performance. The Software Trace Cache (STC) is a code layout algorithm with a broader target than previous layout optimizations. We target not only an improvement in the instruction cache hit rate, but also an increase in the effective fetch width of the fetch engine. The STC algorithm organizes basic blocks into chains trying to make sequentially executed basic blocks reside in consecutive memory positions, then maps the basic block chains in memory to minimize conflict misses in the important sections of the program. We evaluate and analyze in detail the impact of the STC, and code layout optimizations in general, on the three main aspects of fetch performance; the instruction cache hit rate, the effective fetch width, and the branch prediction accuracy. Our results show that layout optimized, codes have some special characteristics that make them more amenable for high-performance instruction fetch. They have a very high rate of not-taken branches and execute long chains of sequential instructions; also, they make very effective use of instruction cache lines, mapping only useful instructions which will execute close in time, increasing both spatial and temporal locality.Peer ReviewedPostprint (published version

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

CoCoNUT: an efficient system for the comparison and analysis of genomes

Author: A Darling
A Kasprzyk
B Haas
B Ma
B Mau
B Morgenstern
B Raphael
C Wawra
DR Bentley
E Mardis
E Ohlebusch
E Passarge
E Sonnhammer
Enno Ohlebusch
G Bourque
G Gremme
I Ovcharenko
J Krumsiek
J Peterson
J Thompson
L Florea
M Abouelhoda
M Abouelhoda
M Abouelhoda
M Abouelhoda
M Abouelhoda
M Blanchette
M Brudno
M Clamp
M Höhl
M Kellis
M Margulies
Mohamed I Abouelhoda
P Chain
R Staden
S Altschul
S Karlin
S Kurtz
S Ranganathan
S Schwartz
S Schwartz
S Shibuya
Stefan Kurtz
T Treangen
T Vision
T Wu
The Arabidopsis Genome Initiative
W Kent
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Achieving Marton's Region for Broadcast Channels Using Polar Codes

Author: Hassani S. Hamed
Mondelli Marco
Sason Igal
Urbanke Rüdiger
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/10/2014
Field of study

This paper presents polar coding schemes for the 2-user discrete memoryless broadcast channel (DM-BC) which achieve Marton's region with both common and private messages. This is the best achievable rate region known to date, and it is tight for all classes of 2-user DM-BCs whose capacity regions are known. To accomplish this task, we first construct polar codes for both the superposition as well as the binning strategy. By combining these two schemes, we obtain Marton's region with private messages only. Finally, we show how to handle the case of common information. The proposed coding schemes possess the usual advantages of polar codes, i.e., they have low encoding and decoding complexity and a super-polynomial decay rate of the error probability. We follow the lead of Goela, Abbe, and Gastpar, who recently introduced polar codes emulating the superposition and binning schemes. In order to align the polar indices, for both schemes, their solution involves some degradedness constraints that are assumed to hold between the auxiliary random variables and the channel outputs. To remove these constraints, we consider the transmission of

k

blocks and employ a chaining construction that guarantees the proper alignment of the polarized indices. The techniques described in this work are quite general, and they can be adopted to many other multi-terminal scenarios whenever there polar indices need to be aligned.Comment: 26 pages, 11 figures, accepted to IEEE Trans. Inform. Theory and presented in part at ISIT'1

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Exact genome alignment

Author: Ghosh Nandini
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2015
Field of study

The increase in the volume of genomic data due to the decrease in the cost of whole genome sequencing techniques has opened up new avenues of research in the field of Bioinformatics, like comparative genomics and evolutionary dynamics. The fundamental task in these studies is to align the genome sequences accurately. Sequence alignment helps to identify regions of similarity between the sequences to establish their functional, evolutionary and structural relationship. The thesis investigates the performance of two sequence alignment programs LASTZ, a hash table based faster method and SSEARCH, a slower but more rigorous Smith-Waterman based approach, on whole genome sequences from primates and mammals. An exact genome alignment technique is used by breaking the entire genome into fragments and aligning these fragments with the reference genome using the Smith-Waterman based method. A comparison of the two methods reveals that the second approach performs better for genomes from closely related species

Digital Commons @ New Jersey Institute of Technology (NJIT)

Significant speedup of database searches with HMMs by search space reduction with PSSM family models

Author: Beckstette Michael
Giegerich Robert
Homann Robert
Kurtz Stefan
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Motivation: Profile hidden Markov models (pHMMs) are currently the most popular modeling concept for protein families. They provide sensitive family descriptors, and sequence database searching with pHMMs has become a standard task in today's genome annotation pipelines. On the downside, searching with pHMMs is computationally expensive

CiteSeerX

PubMed Central

Publications at Bielefeld University

Optimization of Tree Modes for Parallel Hash Functions: A Case Study

Author: Atighehchi Kevin
Rolland Robert
Publication venue
Publication date: 11/06/2017
Field of study

This paper focuses on parallel hash functions based on tree modes of operation for an inner Variable-Input-Length function. This inner function can be either a single-block-length (SBL) and prefix-free MD hash function, or a sponge-based hash function. We discuss the various forms of optimality that can be obtained when designing parallel hash functions based on trees where all leaves have the same depth. The first result is a scheme which optimizes the tree topology in order to decrease the running time. Then, without affecting the optimal running time we show that we can slightly change the corresponding tree topology so as to minimize the number of required processors as well. Consequently, the resulting scheme decreases in the first place the running time and in the second place the number of required processors.Comment: Preprint version. Added citations, IEEE Transactions on Computers, 201

arXiv.org e-Print Archive

Crossref

HAL AMU