Search CORE

19,227 research outputs found

Hardware and software status of QCDOC

Author: A. Gara
A. Yamaguchi
B. Joó
Bodin
Boyle
Boyle
Boyle
C. Cristian
C. Jung
C. Kim
Chen
D. Chen
G. Liu
K. Petrov
L. Levkova
M. Clark
N.H. Christ
P.A. Boyle
R.D. Mawhinney
S. Ohta
S.D. Cohen
T. Wettig
X. Liao
Z. Dong
Publication venue: 'Elsevier BV'
Publication date: 15/09/2003
Field of study

QCDOC is a massively parallel supercomputer whose processing nodes are based on an application-specific integrated circuit (ASIC). This ASIC was custom-designed so that crucial lattice QCD kernels achieve an overall sustained performance of 50% on machines with several 10,000 nodes. This strong scalability, together with low power consumption and a price/performance ratio of $1 per sustained MFlops, enable QCDOC to attack the most demanding lattice QCD problems. The first ASICs became available in June of 2003, and the testing performed so far has shown all systems functioning according to specification. We review the hardware and software status of QCDOC and present performance figures obtained in real hardware as well as in simulation.Comment: Lattice2003(machine), 6 pages, 5 figure

arXiv.org e-Print Archive

Crossref

A High performance and low cost hardware arcitecture for H.264 transform and quantization algorithms

Author: Hamzaoglu Ilker
Hamzaoğlu İlker
Tasdizen Ozgur
Taşdizen Özgür
Publication venue
Publication date: 01/09/2005
Field of study

In this paper, we present a high performance and low cost hardware architecture for real-time implementation of forward transform and quantization and inverse transform and quantization algorithms used in H.264 / MPEG4 Part 10 video coding standard. The hard-ware architecture is based on a reconfigurable datapath with only one multiplier. This hardware is designed to be used as part of a complete low power H.264 video coding system for portable appli-cations. The proposed architecture is implemented in Verilog HDL. The Verilog RTL code is verified to work at 81 MHz in a Xilinx Virtex II FPGA and it is verified to work at 210 MHz in a 0.18´ ASIC implementation. The FPGA and ASIC implementations can code 27 and 70 VGA frames (640x480) per second respectively

CiteSeerX

Sabanci University Research Database

A High performance and low power hardware architecture for H.264 cavlc algorithm

Author: Hamzaoğlu İlker
Şahin Esra
Publication venue
Publication date: 01/09/2005
Field of study

In this paper, we present a high performance and low power hard-ware architecture for real-time implementation of Context Adap-tive Variable Length Coding (CAVLC) algorithm used in H.264 / MPEG4 Part 10 video coding standard. This hardware is designed to be used as part of a complete low power H.264 video coding system for portable applications. The proposed architecture is im-plemented in Verilog HDL. The Verilog RTL code is verified to work at 76 MHz in a Xilinx Virtex II FPGA and it is verified to work at 233 MHz in a 0.18´ ASIC implementation. The FPGA and ASIC implementations can code 22 and 67 VGA frames (640x480) per second respectively

Sabanci University Research Database

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

A Micro Power Hardware Fabric for Embedded Computing

Author: Mehta Gayatri
Publication venue
Publication date: 25/09/2009
Field of study

Field Programmable Gate Arrays (FPGAs) mitigate many of the problemsencountered with the development of ASICs by offering flexibility, faster time-to-market, and amortized NRE costs, among other benefits. While FPGAs are increasingly being used for complex computational applications such as signal and image processing, networking, and cryptology, they are far from ideal for these tasks due to relatively high power consumption and silicon usage overheads compared to direct ASIC implementation. A reconfigurable device that exhibits ASIC-like power characteristics and FPGA-like costs and tool support is desirable to fill this void. In this research, a parameterized, reconfigurable fabric model named as domain specific fabric (DSF) is developed that exhibits ASIC-like power characteristics for Digital Signal Processing (DSP) style applications. Using this model, the impact of varying different design parameters on power and performance has been studied. Different optimization techniques like local search and simulated annealing are used to determine the appropriate interconnect for a specific set of applications. A design space exploration tool has been developed to automate and generate a tailored architectural instance of the fabric.The fabric has been synthesized on 160 nm cell-based ASIC fabrication process from OKI and 130 nm from IBM. A detailed power-performance analysis has been completed using signal and image processing benchmarks from the MediaBench benchmark suite and elsewhere with comparisons to other hardware and software implementations. The optimized fabric implemented using the 130 nm process yields energy within 3X of a direct ASIC implementation, 330X better than a Virtex-II Pro FPGA and 2016X better than an Intel XScale processor

D-Scholarship@Pitt

Protein alignment HW/SW optimizations

Author: Awais Muhammad
Frache Stefano
Graziano Mariagrazia
Urgese Gianvito
Vacca Marco
Zamboni Maurizio
Publication venue: IEEE - INST ELECTRICAL ELECTRONICS ENGINEERS INC
Publication date: 01/01/2012
Field of study

Biosequence alignment recently received an amazing support from both commodity and dedicated hardware platforms. The limitless requirements of this application motivate the search for improved implementations to boost processing time and capabilities. We propose an unprecedented hardware improvement to the classic Smith-Waterman (S-W) algorithm based on a twofold approach: i) an on-the-fly gap-open/gap-extension selection that reduces the hardware implementation complexity; ii) a pre-selection filter that uses reduced amino-acid alphabets to screen out not-significant sequences and to shorten the S-Witerations on huge reference databases.We demonstrated the improvements w.r.t. a classic approach both from the point of view of algorithm efficiency and of HW performance (FPGA and ASIC post-synthesis analysis)

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Auto-Generation of Pipelined Hardware Designs for Polar Encoder

Author: You Xiaohu
Zhang Chuan
Zhong Zhiwei
Publication venue
Publication date: 01/01/2018
Field of study

This paper presents a general framework for auto-generation of pipelined polar encoder architectures. The proposed framework could be well represented by a general formula. Given arbitrary code length

N

and the level of parallelism

M

, the formula could specify the corresponding hardware architecture. We have written a compiler which could read the formula and then automatically generate its register-transfer level (RTL) description suitable for FPGA or ASIC implementation. With this hardware generation system, one could explore the design space and make a trade-off between cost and performance. Our experimental results have demonstrated the efficiency of this auto-generator for polar encoder architectures

arXiv.org e-Print Archive

Crossref

Real Time 3-D Graphics Processing Hardware Design using Field-Programmable Gate Arrays.

Author: Warner James Ryan
Publication venue
Publication date: 28/01/2009
Field of study

Three dimensional graphics processing requires many complex algebraic and matrix based operations to be performed in real-time. In early stages of graphics processing, such tasks were delegated to a Central Processing Unit (CPU). Over time as more complex graphics rendering was demanded, CPU solutions became inadequate. To meet this demand, custom hardware solutions that take advantage of pipelining and massive parallelism become more preferable to CPU software based solutions. This fact has lead to the many custom hardware solutions that are available today. Since real time graphics processing requires extreme high performance, hardware solutions using Application Specific Integrated Circuits (ASICs) are the standard within the industry. While ASICs are a more than adequate solution for implementing high performance custom hardware, the design, implementation and testing of ASIC based designs are becoming cost prohibitive due to the massive up front verification effort needed as well as the cost of fixing design defects.Field Programmable Gate Arrays (FPGAs) provide an alternative to the ASIC design flow. More importantly, in recent years FPGA technology have begun to improve in performance to the point where ASIC and FPGA performance has become comparable. In addition, FPGAs address many of the issues of the ASIC design flow. The ability to reconfigure FPGAs reduces the upfront verification effort and allows design defects to be fixed easily. This thesis demonstrates that a 3-D graphics processor implementation on and FPGA is feasible by implementing both a two dimensional and three dimensional graphics processor prototype. By using a Xilinx Virtex 5 ML506 FPGA development kit a fully functional wireframe graphics rendering engine is implemented using VHDL and Xilinx's development tools. A VHDL testbench was designed to verify that the graphics engine works functionally. This is followed by synthesizing the design and real hardware and developing test applications to verify functionality and performance of the design. This thesis provides the ground work for push forward the use of FPGA technology in graphics processing applications

D-Scholarship@Pitt