Search CORE

18 research outputs found

Instruction set extensions for software defined radio on a multithreaded processor

Author: Daniel Iancu
Emily R. Blem
John Glossner
Mayan Moudgill
Michael J. Schulte
Sanjay Jinturkar
Suman Mamidi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2005
Field of study

Software dened radios, which provide a programmable solu-tion for implementing the physical layer processing of multi-ple communication standards, are widely recognized as one of the most important new technologies for wireless com-munication systems. Emerging communication standards, however, require tremendous processing capabilities to per-form high-bandwidth physical-layer processing in real time. In this paper, we present instruction set extensions for sev-eral important communication algorithms including convo-lutional encoding, Viterbi decoding, turbo decoding, and Reed-Solomon encoding and decoding. The performance bene ts of these extensions are evaluated using a supercom-puter class vectorizing compiler and the Sandblaster low-power multithreaded processor for software dened radio. The proposed instruction set extensions provide signicant performance improvements, while maintaining a high degree of programmability. Categories and Subject Descriptors C.3 [Computer Systems Organization]: Special-purpose and Application-based Systems|Real-time and embedded sys

CiteSeerX

Crossref

An Aggressive Approach to Loop Unrolling

Author: Jack W. Davidson
Sanjay Jinturkar
Publication venue
Publication date
Field of study

A well-known code transformation for improving the execution performance of a program is loop unrolling. The most obvious benefit of unrolling a loop is that the transformed loop usually, but not always, requires fewer instruction executions than the original loop. The reduction in instruction executions comes from two sources: the number of branch instructions executed is reduced, and the index variable is modified fewer times. In addition, for architectures with features designed to exploit instruction-level parallelism, loop unrolling can expose greater levels of instructionlevel parallelism. Loop unrolling is an effective code transformation often improving the execution performance of programs that spend much of their execution time in loops by ten to thirty percent. Possibly because of the effectiveness of a simple application of loop unrolling, it has not been studied as extensively as other code improvements such as register allocation or common subexpression elimination. The r..

CiteSeerX

Improving instruction-level parallelism by loop unrolling and dynamic memory disambiguation

Author: Jack W. Davidson
Sanjay Jinturkar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1995
Field of study

Exploitation of instruction-level parallelism is an effective mechanism for improving the performance of modern super-scalar/VLIW processors. Various software techniques can be applied to increase instruction-level parallelism. This paper describes and evaluates a software technique, dynamic memory disambiguation, that permits loops containing loads and stores to be scheduled more aggressively, thereby exposing more instruction-level parallelism. The results of our evaluation show that when dynamic memory disambiguation is applied in conjunction with loop unrolling, register renaming, and static memory disambiguation, the ILP of memory-intensive benchmarks can be increased by as much as 300 percent over loops where dynamic memory disambiguation is not performed. Our measurements also indicate that for the programs that benefit the most from these optimizations, the register usage does not exceed the number of registers on most high-performance processors. Keywords: loop unrolling, dyn..

CiteSeerX

Crossref

Memory Access Coalescing: A Technique for Eliminating Redundant Memory Accesses

Author: Jack W. Davidson
Sanjay Jinturkar
Publication venue
Publication date
Field of study

As microprocessor speeds increase, memory bandwidth is increasingly the performance bottleneck for microprocessors. This has occurred because innovation and technological improvements in processor design have outpaced advances in memory design. Most attempts at addressing this problem have involved hardware solutions. Unfortunately, these solutions do little to help the situation with respect to current microprocessors. In previous work, we developed, implemented, and evaluated an algorithm that exploited the ability of newer machines with wide-buses to load/ store multiple floating-point operands in a single memory reference. This paper describes a general code improvement algorithm that transforms code to better exploit the available memory bandwidth on existing microprocessors as well as widebus machines. Where possible and advantageous, the algorithm coalesces narrow memory references into wide ones. An interesting characteristic of the algorithm is that some decisions about the ap..

CiteSeerX

IMPLEMENTATION OF H.264 DECODER ON SANDBLASTER DSP

Author: John Glossner
Mayan Moudgill
Sanjay Jinturkar
Vaidyanathan Ramadurai
Publication venue
Publication date: 05/03/2020
Field of study

ABSTRACT This paper presents the optimization techniques and results of implementing the H.264/AVC baseline profile decoder in software on the Sandblaster digital signal processor. It has been implemented in ANSI C and optimized to exploit the architectural features of the processor. The software implementation enables the reusability of the processor and lowers the development costs

CiteSeerX

Programming The Sandbridge Multithreaded Processor

Author: Erdem Hokenek
John Glossner
Mayan Moudgill
Sanjay Jinturkar John
Publication venue
Publication date
Field of study

Programmer productivity is a major concern in the development of complex DSP and SDR applications. As most classical DSPs are programmed in assembly language, it takes a large software effort to develop an application. For modern speech coders it may take up to nine months or more before the application performance is known. Then, an intensive period of design verification ensues. This extended period of development and verification can be minimized if a user-friendly software tool chain capable for generating efficient code for the DSP processor [4] for applications written in C were to be available. Sandbridge Technologies software tool chain is a very user-friendly tool chain, capable of generating highly efficient object code for out of the box C code, and simulating the code on an ultra fast simulation environment. This tool chain provides significant advantages in software productivity

CiteSeerX