Search CORE

22,227 research outputs found

Mapping DSP algorithms to a reconfigurable architecture Adaptive Wireless Networking (AWGN)

Author: Rauwerda Gerard
Publication venue
Publication date: 01/01/2003
Field of study

This report will discuss the Adaptive Wireless Networking project. The vision of the Adaptive Wireless Networking project will be given. The strategy of the project will be the implementation of multiple communication systems in dynamically reconfigurable heterogeneous hardware. An overview of a wireless LAN communication system, namely HiperLAN/2, and a Bluetooth communication system will be given. Possible implementations of these systems in a dynamically reconfigurable architecture are discussed. Suggestions for future activities in the Adaptive Wireless Networking project are also given

University of Twente Research Information

Efficient Irregular Wavefront Propagation Algorithms on Hybrid CPU-GPU Machines

Author: Cooper Lee
Kong Jun
Kurc Tahsin
Pan Tony
Saltz Joel
Teodoro George
Publication venue
Publication date: 14/09/2012
Field of study

In this paper, we address the problem of efficient execution of a computation pattern, referred to here as the irregular wavefront propagation pattern (IWPP), on hybrid systems with multiple CPUs and GPUs. The IWPP is common in several image processing operations. In the IWPP, data elements in the wavefront propagate waves to their neighboring elements on a grid if a propagation condition is satisfied. Elements receiving the propagated waves become part of the wavefront. This pattern results in irregular data accesses and computations. We develop and evaluate strategies for efficient computation and propagation of wavefronts using a multi-level queue structure. This queue structure improves the utilization of fast memories in a GPU and reduces synchronization overheads. We also develop a tile-based parallelization strategy to support execution on multiple CPUs and GPUs. We evaluate our approaches on a state-of-the-art GPU accelerated machine (equipped with 3 GPUs and 2 multicore CPUs) using the IWPP implementations of two widely used image processing operations: morphological reconstruction and euclidean distance transform. Our results show significant performance improvements on GPUs. The use of multiple CPUs and GPUs cooperatively attains speedups of 50x and 85x with respect to single core CPU executions for morphological reconstruction and euclidean distance transform, respectively.Comment: 37 pages, 16 figure

arXiv.org e-Print Archive

CiteSeerX

Realizing arbitrary-precision modular multiplication with a fixed-precision multiplier datapath

Author: Grossschaedl Johann
Savas Erkay
Savaş Erkay
Yumbul Kazım
Yumbul Kazim
Publication venue: IEEE (Institute of Electrical and Electronics Engineers)
Publication date: 30/09/2009
Field of study

Within the context of cryptographic hardware, the term scalability refers to the ability to process operands of any size, regardless of the precision of the underlying data path or registers. In this paper we present a simple yet effective technique for increasing the scalability of a fixed-precision Montgomery multiplier. Our idea is to extend the datapath of a Montgomery multiplier in such a way that it can also perform an ordinary multiplication of two n-bit operands (without modular reduction), yielding a 2n-bit result. This conventional (nxn->2n)-bit multiplication is then used as a “sub-routine” to realize arbitrary-precision Montgomery multiplication according to standard software algorithms such as Coarsely Integrated Operand Scanning (CIOS). We show that performing a 2n-bit modular multiplication on an n-bit multiplier can be done in 5n clock cycles, whereby we assume that the n-bit modular multiplication takes n cycles. Extending a Montgomery multiplier for this extra functionality requires just some minor modifications of the datapath and entails a slight increase in silicon area

Crossref

Sabanci University Research Database

Open Repository and Bibliography - Luxembourg

Crypto-test-lab for security validation of ECC co-processor test infrastructure

Author: Lupón Roses Emilio
Manich Bou Salvador
Rodríguez Montañés Rosa
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting /republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksElliptic Curve Cryptography (ECC) is a technology for public-key cryptography that is becoming increasingly popular because it provides greater speed and implementation compactness than other public-key technologies. Calculations, however, may not be executed by software, since it would be so time consuming, thus an ECC co-processor is commonly included to accelerate the speed. Test infrastructure in crypto co-processors is often avoided because it poses serious security holes against adversaries. However, ECC co-processors include complex modules for which only functional test methodologies are unsuitable, because they would take an unacceptably long time during the production test. Therefore, some internal test infrastructure is always included to permit the application of structural test techniques. Designing a secure test infrastructure is quite a complex task that relies on the designer's experience and on trial & error iterations over a series of different types of attacks. Most of the severe attacks cannot be simulated because of the demanding computational effort and the lack of proper attack models. Therefore, prototypes are prepared using FPGAs. In this paper, a Crypto-Test-Lab is presented that includes an ECC co-processor with flexible test infrastructure. Its purpose is to facilitate the design and validation of secure strategies for testing in this type of co-processor.Postprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

QCDOC: A 10-teraflops scale computer for lattice QCD

Author: Chen D.
Christ N. H.
Cristian C.
Dong Z.
Gara A.
Garg K.
Joo B.
Kim C.
Levkova L.
Liao X.
Mawhinney R. D.
Ohta S.
Wettig T.
Publication venue: 'Elsevier BV'
Publication date: 17/08/2000
Field of study

The architecture of a new class of computers, optimized for lattice QCD calculations, is described. An individual node is based on a single integrated circuit containing a PowerPC 32-bit integer processor with a 1 Gflops 64-bit IEEE floating point unit, 4 Mbyte of memory, 8 Gbit/sec nearest-neighbor communications and additional control and diagnostic circuitry. The machine's name, QCDOC, derives from ``QCD On a Chip''.Comment: Lattice 2000 (machines) 8 pages, 4 figure

arXiv.org e-Print Archive

UNT Digital Library

CERN Document Server