Search CORE

727 research outputs found

Design of application specific instruction set processors for the EFT and FHT algorithms

Author: Atak Oğuzhan
Publication venue: Bilkent University
Publication date: 01/01/2006
Field of study

Cataloged from PDF version of article.Orthogonal Frequency Division Multiplexing (OFDM) is a multicarrier transmission technique which is used in many digital communication systems. In this technique, Fast Fourier Transformation (FFT) and inverse FFT (IFFT) are kernel processing blocks which are used for data modulation and demodulation respectively. Another algorithm which can be used for multi-carrier transmission is the Fast Hartley Transform algorithm. The FHT is a real valued transformation and can give significantly better results than FFT algorithm in terms of energy efficiency, speed and die area. This thesis presents Application Specific Instruction Set Processors (ASIP) for the FFT and FHT algorithms. ASIPs combine the flexibility of general purpose processors and efficiency of application specific integrated circuits (ASIC). Programmability makes the processor flexible and special instructions, memory architecture and pipeline makes the processor efficient. In order to design a low power processor we have selected the recently proposed cached FFT algorithm which outperforms standard FFT. For the cached FFT algorithm we have designed two ASIPs one having a single execution unitAtak, OğuzhanM.S

Bilkent University Institutional Repository

Cache Equalizer: A Cache Pressure Aware Block Placement Scheme for Large-Scale Chip Multiprocessors

Author: Cho Sangyeun
Hammoud Mohammad
Melhem Rami
Publication venue: Department of Computer Science, University of Pittsburgh
Publication date: 01/01/2009
Field of study

This paper describes Cache Equalizer (CE), a novel distributed cache management scheme for large scale chip multiprocessors (CMPs). Our work is motivated by large asymmetry in cache sets usages. CE decouples the physical locations of cache blocks from their addresses for the sake of reducing misses caused by destructive interferences. Temporal pressure at the on-chip last-level cache, is continuously collected at a group (comprised of cache sets) granularity, and periodically recorded at the memory controller to guide the placement process. An incoming block is consequently placed at a cache group that exhibits the minimum pressure. CE provides Quality of Service (QoS) by robustly offering better performance than the baseline shared NUCA cache. Simulation results using a full-system simulator demonstrate that CE outperforms shared NUCA caches by an average of 15.5% and by as much as 28.5% for the benchmark programs we examined. Furthermore, evaluations manifested the outperformance of CE versus related CMP cache designs

D-Scholarship@Pitt

Design of High Speed Memory-Based FFT Processor Using 90nm Technology

Author: T. Prasada Babu
Publication venue: Auricle Global Society of Education and Research
Publication date: 04/05/2024
Field of study

In order to enhance performance, the Fast Fourier Transformation is a important operation in Digital Signal Processing (DSP) systems had been extensively studied. State-of-the-art transmission technology uses Orthogonal frequency division multiplexing (OFDM), which primary operation is the Fast fourier transform (FFT). This analysis presents the design of a high-speed memory-based FFT processor using 90nm technology. The novel hybrid multiplier and hybrid adder is used in this analysis. The main objective of this method is to develop an efficient, memory-efficient FFT processor that requires less area.  Using 90nm CMOS (Complementary Metal Oxide Semiconductor) technology, the proposed FFT processor was created and implemented in process. With reduced processing time, this means that the proposed FFT processor performs better than the prior memory-based FFT processors in terms of performance and the number of LUTs required which reduces area and memory utilization

International Journal on Recent and Innovation Trends in Computing and Communication

Overview of Parallel Platforms for Common High Performance Computing

Author: Adamec Filip
Fryza Tomas
Marsalek Roman
Prokopec Jan
Svobodova Jitka
Publication venue: Společnost pro radioelektronické inženýrství
Publication date: 01/04/2012
Field of study

The paper deals with various parallel platforms used for high performance computing in the signal processing domain. More precisely, the methods exploiting the multicores central processing units such as message passing interface and OpenMP are taken into account. The properties of the programming methods are experimentally proved in the application of a fast Fourier transform and a discrete cosine transform and they are compared with the possibilities of MATLAB's built-in functions and Texas Instruments digital signal processors with very long instruction word architectures. New FFT and DCT implementations were proposed and tested. The implementation phase was compared with CPU based computing methods and with possibilities of the Texas Instruments digital signal processing library on C6747 floating-point DSPs. The optimal combination of computing methods in the signal processing domain and new, fast routines' implementation is proposed as well

Directory of Open Access Journals

Digital library of Brno University of Technology

Asynchronous Validity Resolution in Sequentially Consistent Shared Virtual Memory

Author: Thomas Jonathan
Publication venue: DigitalCommons@UMaine
Publication date: 01/01/2001
Field of study

Shared Virtual Memory (SVM) is an effort to provide a mechanism for a distributed system, such as a cluster, to execute shared memory parallel programs. Unfortunately, SVM has performance problems due to its underlying distributed architecture. Recent developments have increased performance of SVM by reducing communication. Unfortunately this performance gain was only possible by increasing programming complexity and by restricting the types of programs allowed to execute in the system. Validity resolution is the process of resolving the validity of a memory object such as a page. Current SVM systems use synchronous or deferred validity resolution techniques in which user processing is blocked during the validity resolution process. This is the case even when resolving validity of false shared variables. False-sharing occurs when two or more processes access unrelated variables stored within the same shared block of memory and at least one of the processes is writing. False sharing unnecessarily reduces overall performance of SVM systems?because user processing is blocked during validity resolution although no actual data dependencies exist. This thesis presents Asynchronous Validity Resolution (AVR), a new approach to SVM which reduces the performance losses associated with false sharing while maintaining the ease of programming found with regular shared memory parallel programming methodology. Asynchronous validity resolution allows concurrent user process execution and data validity resolution. AVR is evaluated by com-paring performance of an application suite using both an AVR sequentially con-sistent SVM system and a traditional sequentially consistent (SC) SVM system. The results show that AVR can increase performance over traditional sequentially consistent SVM for programs which exhibit false sharing. Although AVR outperforms regular SC by as much as 26%, performance of AVR is dependent on the number of false-sharing vs. true-sharing accesses, the number of pages in the program’s working set, the amount of user computation that completes per page request, and the internodal round-trip message time in the system. Overall, the results show that AVR could be an important member of the arsenal of tools available to parallel programmers

University of Maine

Fast Fourier Transform Processors: Implementing FFT and IFFT Cores for OFDM Communication Systems

Author: A. Cortes
I. Velez
J. F. Sevillano
M. Turrillas
Publication venue: 'IntechOpen'
Publication date: 11/04/2012
Field of study

IntechOpen

A tuneable software cache coherence protocol for heterogeneous MPSoCs

Author: Ophelders F.E.B.
Publication venue
Publication date: 01/01/2009
Field of study

Repository TU/e

Pure OAI Repository

Software Coherence in Multiprocessor Memory Systems

Author: Bolosky William Joseph
Publication venue
Publication date
Field of study

Processors are becoming faster and multiprocessor memory interconnection systems are not keeping up. Therefore, it is necessary to have threads and the memory they access as near one another as possible. Typically, this involves putting memory or caches with the processors, which gives rise to the problem of coherence: if one processor writes an address, any other processor reading that address must see the new value. This coherence can be maintained by the hardware or with software intervention. Systems of both types have been built in the past; the hardware-based systems tended to outperform the software ones. However, the ratio of processor to interconnect speed is now so high that the extra overhead of the software systems may no longer be significant. This issue is explored both by implementing a software maintained system and by introducing and using the technique of offline optimal analysis of memory reference traces. It finds that in properly built systems, software maintained coherence can perform comparably to or even better than hardware maintained coherence. The architectural features necessary for efficient software coherence to be profitable include a small page size, a fast trap mechanism, and the ability to execute instructions while remote memory references are outstanding

NASA Technical Reports Server

Deadlock-free fine-grained thread migration

Author: Keun Sup Shim
Mieszko Lis
Myong Hyon Cho
Omer Khan
Srinivas Devadas
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

Several recent studies have proposed fine-grained, hardware-level thread migration in multicores as a solution to power, reliability, and memory coherence problems. The need for fast thread migration has been well documented, however, a fast, deadlock-free migration protocol is sorely lacking: existing solutions either deadlock or are too slow and cumbersome to ensure performance with frequent, fine-grained thread migrations. In this study, we introduce the Exclusive Native Context (ENC) protocol, a general, provably deadlock-free migration protocol for instruction-level thread migration architectures. Simple to implement, ENC does not require additional hardware beyond common migration-based architectures. Our evaluation using synthetic migrations and the SPLASH-2 application suite shows that ENC offers performance within 11.7% of an idealized deadlock-free migration protocol with infinite resources

CiteSeerX

DSpace@MIT

Crossref