Search CORE

138 research outputs found

Reducing branch delay to zero in pipelined processors

Author: González Colás Antonio María
Llaberia Griñó José M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1993
Field of study

A mechanism to reduce the cost of branches in pipelined processors is described and evaluated. It is based on the use of multiple prefetch, early computation of the target address, delayed branch, and parallel execution of branches. The implementation of this mechanism using a branch target instruction memory is described. An analytical model of the performance of this implementation makes it possible to measure the efficiency of the mechanism with a very low computational cost. The model is used to determine the size of cache lines that maximizes the processor performance, to compare the performance of the mechanism with that of other schemes, and to analyze the performance of the mechanism with two alternative cache organizations.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

User microprogrammable processors for high data rate telemetry preprocessing

Author: Ogrady E. P.
Pugsley J. H.
Publication venue
Publication date
Field of study

The use of microprogrammable processors for the preprocessing of high data rate satellite telemetry is investigated. The following topics are discussed along with supporting studies: (1) evaluation of commercial microprogrammable minicomputers for telemetry preprocessing tasks; (2) microinstruction sets for telemetry preprocessing; and (3) the use of multiple minicomputers to achieve high data processing. The simulation of small microprogrammed processors is discussed along with examples of microprogrammed processors

NASA Technical Reports Server

Computer aided design of microprograms

Author: Wood William Graham
Publication venue: The University of Edinburgh
Publication date: 01/01/1979
Field of study

Edinburgh Research Archive

Designing HMO, an Integrated Hardware Microcode Optimizer

Author: Bondi James O.
Stigall Paul D.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/09/1974
Field of study

This Paper Discusses an Algorithm for Optimizing the Density and Parallelism of Micro coded Routines in Micro programmable Machines. Besides the Algorithm itself, the Algorithm\u27s Uses, Design Integration Problems, Architectural Requirements, and Adaptability to Conventional Machine Characteristics Are Also Discussed and Analyzed. Even Though the Paper Proposes a Hardware Implementation of the Algorithm, the Algorithm is Viewed as an Integral Part of the Entire Microcode Generation and Usage Process, from Initial High-Level Input into a Software Microcode Compiler Down to Machine-Level Execution of the Resultant Microcode on the Host Machine. It is Believed that, by Removing Much of the Traditionally Time-Consuming and Machine-Dependent Microcode Optimization from the Software Portion of This Process, the Algorithm Can Improve the overall Process

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

A mathematical formulation of the loop pipelining problem

Author: Badia Sala Rosa Maria
Cortadella Jordi
Sánchez Carracedo Fermín
Publication venue: Universitat Politècnica de Catalunya (UPC)
Publication date: 01/01/1996
Field of study

This paper presents a mathematical model for the loop pipelining problem that considers several parameters for optimization and supports any combination of resource and timing constraints. The unrolling degree of the loop is one of the variables explored by the model. By using Farey’s series, an optimal exploration of the unrolling degree is performed and optimal solutions not considered by other methods are obtained. Finding an optimal schedule that minimizes resource and register requirements is solved by using an Integer linear programming (ILP) model. A novel paradigm called branch and prune is proposed to eficiently converge towards the optimal schedule and prune the search tree for integer solutions, thus drastically reducing the running time. This is the first formulation that combines the unrolling degree of the loop with timing and resource constraints in a mathematical model that guarantees optimal solutions.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Towards a design of HMO, an integrated hardware microcode optimizer

Author: Bondi James Oliver
Publication venue: Scholars\u27 Mine
Publication date: 01/01/1974
Field of study

This paper discusses an algorithm for optimizing the density and parallelism of microcoded routines in micro-programmable machines. Besides presenting the algorithm itself, this research also analyzes the algorithm\u27s uses, design integration problems, architectural requirements, and adaptability to conventional machine characteristics. Even though the paper proposes a hardware implementation of the algorithm, the algorithm is viewed as an integral part of the entire microcode generation and usage process, from initial high-level input into a software microcode compiler down to machine-level execution of the resultant microcode on the host machine. It is believed that, by removing much of the traditionally time-consuming and machine-dependent microcode optimization from the software portion of this process, the algorithm can improve the overall process --Abstract, page ii

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Modulo scheduling for a fully-distributed clustered VLIW architecture

Author: González Colás Antonio María
Sánchez Navarro F. Jesús
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2000
Field of study

Clustering is an approach that many microprocessors are adopting in recent times in order to mitigate the increasing penalties of wire delays. We propose a novel clustered VLIW architecture which has all its resources partitioned among clusters, including the cache memory. A modulo scheduling scheme for this architecture is also proposed. This algorithm takes into account both register and memory inter-cluster communications so that the final schedule results in a cluster assignment that favors cluster locality in cache references and register accesses. It has been evaluated for both 2- and 4-cluster configurations and for differing numbers and latencies of inter-cluster buses. The proposed algorithm produces schedules with very low communication requirements and outperforms previous cluster-oriented schedulers.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Recommended from our members

An efficient global resource constrained technique for exploiting instruction level parallelism

Author: Nicolau Alexandru
Novack Steven
Publication venue: eScholarship, University of California
Publication date: 21/01/1992
Field of study

A new Global Resource-constrained Percolation (GRiP) scheduling technique is presented for exploiting instruction level parallelism. Other techniques that have been proposed either have been prohibitively expensive in terms of computation or have limited parallelism. The GRiP technique has been implemented and simulation results are presented

eScholarship - University of California

Modulo scheduling with reduced register pressure

Author: Ayguadé Parra Eduard
González Colás Antonio María
Llosa Espuny José Francisco
Valero Cortés Mateo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1998
Field of study

Software pipelining is a scheduling technique that is used by some product compilers in order to expose more instruction level parallelism out of innermost loops. Module scheduling refers to a class of algorithms for software pipelining. Most previous research on module scheduling has focused on reducing the number of cycles between the initiation of consecutive iterations (which is termed II) but has not considered the effect of the register pressure of the produced schedules. The register pressure increases as the instruction level parallelism increases. When the register requirements of a schedule are higher than the available number of registers, the loop must be rescheduled perhaps with a higher II. Therefore, the register pressure has an important impact on the performance of a schedule. This paper presents a novel heuristic module scheduling strategy that tries to generate schedules with the lowest II, and, from all the possible schedules with such II, it tries to select that with the lowest register requirements. The proposed method has been implemented in an experimental compiler and has been tested for the Perfect Club benchmarks. The results show that the proposed method achieves an optimal II for at least 97.5 percent of the loops and its compilation time is comparable to a conventional top-down approach, whereas the register requirements are lower. In addition, the proposed method is compared with some other existing methods. The results indicate that the proposed method performs better than other heuristic methods and almost as well as linear programming methods, which obtain optimal solutions but are impractical for product compilers because their computing cost grows exponentially with the number of operations in the loop body.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Design and implementation of generalized adaptive neural filters

Author: Tam Chun Yip
Publication venue: Digital Commons @ NJIT
Publication date: 31/01/1998
Field of study

Generalized Adaptive Neural Filters (GANF) are a class of adaptive non-linear filters. This thesis presents a hardware implementation of GANF. Two designs are considered: the single neuron implementation and the multi-neuron implementation. The GANF design includes the window generator, threshold decomposer, training and filtering unit. The designs are verified through a logic design/simulation tool, Logic Works

Digital Commons @ New Jersey Institute of Technology (NJIT)