Search CORE

462 research outputs found

LUXOR: An FPGA Logic Cell Architecture for Efficient Compressor Tree Implementations

Author: Ajay
Dadda Luigi
Earle J. G.
Ercegovac Milos D
Hubara Itay
IBM ILOG CPLEX.
Kumm Martin
Umuroglu Yaman
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 06/03/2020
Field of study

We propose two tiers of modifications to FPGA logic cell architecture to deliver a variety of performance and utilization benefits with only minor area overheads. In the irst tier, we augment existing commercial logic cell datapaths with a 6-input XOR gate in order to improve the expressiveness of each element, while maintaining backward compatibility. This new architecture is vendor-agnostic, and we refer to it as LUXOR. We also consider a secondary tier of vendor-speciic modifications to both Xilinx and Intel FPGAs, which we refer to as X-LUXOR+ and I-LUXOR+ respectively. We demonstrate that compressor tree synthesis using generalized parallel counters (GPCs) is further improved with the proposed modifications. Using both the Intel adaptive logic module and the Xilinx slice at the 65nm technology node for a comparative study, it is shown that the silicon area overhead is less than 0.5% for LUXOR and 5-6% for LUXOR+, while the delay increments are 1-6% and 3-9% respectively. We demonstrate that LUXOR can deliver an average reduction of 13-19% in logic utilization on micro-benchmarks from a variety of domains.BNN benchmarks benefit the most with an average reduction of 37-47% in logic utilization, which is due to the highly-efficient mapping of the XnorPopcount operation on our proposed LUXOR+ logic cells.Comment: In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA'20), February 23-25, 2020, Seaside, CA, US

arXiv.org e-Print Archive

Crossref

A low cost reconfigurable soft processor for multimedia applications: design synthesis and programming model

Author: Chalamalasetti S.R.
Margala M.
Purohit S.
Vanderbauwhede W.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

This paper presents an FPGA implementation of a low cost 8 bit reconfigurable processor core for media processing applications. The core is optimized to provide all basic arithmetic and logic functions required by the media processing and other domains, as well as to make it easily integrable into a 2D array. This paper presents an investigation of the feasibility of the core as a potential soft processing architecture for FPGA platforms. The core was synthesized on the entire Virtex FPGA family to evaluate its overall performance, scalability and portability. A special feature of the proposed architecture is its simple programming model which allows low level programming. Throughput results for popular benchmarks coded using the programming model and cycle accurate simulator are presented

Crossref

Enlighten

Arithmetic core generation using bit heaps

Author: Brunie Nicolas
de Dinechin Florent
Illyes Kinga
Istoan Matei
Popa Bogdan
Sergent Guillaume
Publication venue: HAL CCSD
Publication date: 02/09/2013
Field of study

International audienceA bit heap is a data structure that holds the unevaluated sum of an arbitrary number of bits, each weighted by some power of two. Most advanced arithmetic cores can be viewed as involving one or several bit heaps. We claim here that this point of view leads to better global optimization at the algebraic level, at the circuit level, and in terms of software engineering. To demonstrate it, a generic software framework is introduced for the definition and optimization of bit heaps. This framework, targeting DSP-enabled FPGAs, is developed within the open-source FloPoCo arithmetic core generator. Its versatility is demonstrated on several examples: multipliers, complex multipliers, polynomials, and discrete cosine transform

HAL-ENS-LYON

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

A Scalable Correlator Architecture Based on Modular FPGA Hardware, Reuseable Gateware, and Data Packetization

Author: Aaron Parsons
Andrew Siemion
Arash Parsa
Blackman R.
Bradley R.
Dan Werthimer
David MacMahon
Demorest P.
Donald Backer
Heiles C.
Henry Chen
Jason Manley
Melvyn Wright
Peter McMahon
Pierre Droz
Terry Filiba
Weinreb S.
Yen J. L.
Publication venue: 'University of Chicago Press'
Publication date: 17/03/2009
Field of study

A new generation of radio telescopes is achieving unprecedented levels of sensitivity and resolution, as well as increased agility and field-of-view, by employing high-performance digital signal processing hardware to phase and correlate large numbers of antennas. The computational demands of these imaging systems scale in proportion to BMN^2, where B is the signal bandwidth, M is the number of independent beams, and N is the number of antennas. The specifications of many new arrays lead to demands in excess of tens of PetaOps per second. To meet this challenge, we have developed a general purpose correlator architecture using standard 10-Gbit Ethernet switches to pass data between flexible hardware modules containing Field Programmable Gate Array (FPGA) chips. These chips are programmed using open-source signal processing libraries we have developed to be flexible, scalable, and chip-independent. This work reduces the time and cost of implementing a wide range of signal processing systems, with correlators foremost among them,and facilitates upgrading to new generations of processing technology. We present several correlator deployments, including a 16-antenna, 200-MHz bandwidth, 4-bit, full Stokes parameter application deployed on the Precision Array for Probing the Epoch of Reionization.Comment: Accepted to Publications of the Astronomy Society of the Pacific. 31 pages. v2: corrected typo, v3: corrected Fig. 1

arXiv.org e-Print Archive

Crossref

Block-synchronous Harmonic Control for Scalable Trajectory Planning

Author: Amine Boumaza
Bernard Girau
Bruno Scherrer
Cesar Torres-Huitzil
Publication venue: 'IntechOpen'
Publication date: 01/01/2008
Field of study

ISBN : 978-953-7619-20-6Trajectory planning consists in finding a way to get from a starting position to a goal position while avoiding obstacles within a given environment or navigation space. Harmonic functions may be used as potential fields for trajectory planning. Such functions do not have local extrema, so that control algorithms may reduce to locally descend the potential field until reaching a minimum, when obstacles correspond to maxima of the potential and goals correspond to minima. This chapter presents a parallel hardware implementation of this navigation method on reconfigurable digital circuits. Trajectories are estimated after the iterated computation of the harmonic function, given the goal and obstacle positions of the navigation problem. The proposed massively distributed implementation locally computes the direction to choose to get to the goal position at any point of the environment. Changes in this environment may be immediately taken into account, for example when obstacles are discovered during an on-line exploration. To fit real-world applications, our implementation has been designed to deal with very large navigation environments while optimizing computation time

IntechOpen

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Energy-Efficient FPGA-Based Parallel Quasi-Stochastic Computing

Author: Choi Minsu
Metku Prashanthi
Seva Ramu
Publication venue: Scholars\u27 Mine
Publication date: 01/11/2017
Field of study

The high performance of FPGA (Field Programmable Gate Array) in image processing applications is justified by its flexible reconfigurability, its inherent parallel nature and the availability of a large amount of internal memories. Lately, the Stochastic Computing (SC) paradigm has been found to be significantly advantageous in certain application domains including image processing because of its lower hardware complexity and power consumption. However, its viability is deemed to be limited due to its serial bitstream processing and excessive run-time requirement for convergence. To address these issues, a novel approach is proposed in this work where an energy-efficient implementation of SC is accomplished by introducing fast-converging Quasi-Stochastic Number Generators (QSNGs) and parallel stochastic bitstream processing, which are well suited to leverage FPGA\u27s reconfigurability and abundant internal memory resources. The proposed approach has been tested on the Virtex-4 FPGA, and results have been compared with the serial and parallel implementations of conventional stochastic computation using the well-known SC edge detection and multiplication circuits. Results prove that by using this approach, execution time, as well as the power consumption are decreased by a factor of 3.5 and 4.5 for the edge detection circuit and multiplication circuit, respectively

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine