Search CORE

1,458 research outputs found

Constructing cluster of simple FPGA boards for cryptologic computations

Author: Doroz Yarkin
Doröz Yarkın
Savas Erkay
Savaş Erkay
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

In this paper, we propose an FPGA cluster infrastructure, which can be utilized in implementing cryptanalytic attacks and accelerating cryptographic operations. The cluster can be formed using simple and inexpensive, off-the-shelf FPGA boards featuring an FPGA device, local storage, CPLD, and network connection. Forming the cluster is simple and no effort for the hardware development is needed except for the hardware design for the actual computation. Using a softcore processor on FPGA, we are able to configure FPGA devices dynamically and change their configuration on the fly from a remote computer. The softcore on FPGA can execute relatively complicated programs for mundane tasks unworthy of FPGA resources. Finally, we propose and implement a fast and efficient dynamic configuration switch technique that is shown to be useful especially in cryptanalytic applications. Our infrastructure provides a cost-effective alternative for formerly proposed cryptanalytic engines based on FPGA devices

Crossref

Sabanci University Research Database

Performance comparison of single-precision SPICE Model-Evaluation on FPGA, GPU, Cell, and multi-core processors

Author: DeHon André
Kapre Nachiket
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Automated code generation and performance tuning techniques for concurrent architectures such as GPUs, Cell and FPGAs can provide integer factor speedups over multi-core processor organizations for data-parallel, floating-point computation in SPICE model-evaluation. Our Verilog AMS compiler produces code for parallel evaluation of non-linear circuit models suitable for use in SPICE simulations where the same model is evaluated several times for all the devices in the circuit. Our compiler uses architecture specific parallelization strategies (OpenMP for multi-core, PThreads for Cell, CUDA for GPU, statically scheduled VLIW for FPGA) when producing code for these different architectures. We automatically explore different implementation configurations (e.g. unroll factor, vector length) using our performance-tuner to identify the best possible configuration for each architecture. We demonstrate speedups of 3- 182times for a Xilinx Virtex5 LX 330T, 1.3-33times for an IBM Cell, and 3-131times for an NVIDIA 9600 GT GPU over a 3 GHz Intel Xeon 5160 implementation for a variety of single-precision device models

Crossref

Caltech Authors

DR-NTU (Digital Repository of NTU)

Fuzzy matching template attacks on multivariate cryptography : a case study

Author: Huang Xian
Li Weijian
Lu Fuxiang
Xie Guoliang
Zhao Huimin
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2020
Field of study

Multivariate cryptography is one of the most promising candidates for post-quantum cryptography. Applying machine learning techniques in this paper, we experimentally investigate the side-channel security of the multivariate cryptosystems, which seriously threatens the hardware implementations of cryptographic systems. Generally, registers are required to store values of monomials and polynomials during the encryption of multivariate cryptosystems. Based on maximum-likelihood and fuzzy matching techniques, we propose a template-based least-square technique to efficiently exploit the side-channel leakage of registers. Using QUAD for a case study, which is a typical multivariate cryptosystem with provable security, we perform our attack against both serial and parallel QUAD implementations on field programmable gate array (FPGA). Experimental results show that our attacks on both serial and parallel implementations require only about 30 and 150 power traces, respectively, to successfully reveal the secret key with a success rate close to 100%. Finally, efficient and low-cost strategies are proposed to resist side-channel attacks

University of Strathclyde Institutional Repository

Directory of Open Access Journals

Adaptively Lossy Image Compression for Onboard Processing

Author: Goodwill Justin
Publication venue
Publication date: 29/07/2020
Field of study

More efficient image-compression codecs are an emerging requirement for spacecraft because increasingly complex, onboard image sensors can rapidly saturate downlink bandwidth of communication transceivers. While these codecs reduce transmitted data volume, many are compute-intensive and require rapid processing to sustain sensor data rates. Emerging next-generation small satellite (SmallSat) computers provide compelling computational capability to enable more onboard processing and compression than previously considered. For this research, we apply two compression algorithms for deployment on modern flight hardware: (1) end-to-end, neural-network-based, image compression (CNN-JPEG); and (2) adaptive image compression through feature-point detection (FPD-JPEG). These algorithms rely on intelligent data-processing pipelines that adapt to sensor data to compress it more effectively, ensuring efficient use of limited downlink bandwidths. The first algorithm, CNN-JPEG, employs a hybrid approach adapted from literature combining convolutional neural networks (CNNs) and JPEG; however, we modify and tune the training scheme for satellite imagery to account for observed training instabilities. This hybrid CNN-JPEG approach shows 23.5% better average peak signal-to-noise ratio (PSNR) and 33.5% better average structural similarity index (SSIM) versus standard JPEG on a dataset collected on the Space Test Program – Houston 5 (STP-H5-CSP) mission onboard the International Space Station (ISS). For our second algorithm, we developed a novel adaptive image-compression pipeline based upon JPEG that leverages the Oriented FAST and Rotated BRIEF (ORB) feature-point detection algorithm to adaptively tune the compression ratio to allow for a tradeoff between PSNR/SSIM and combined file size over a batch of STP-H5-CSP images. We achieve a less than 1% drop in average PSNR and SSIM while reducing the combined file size by 29.6% compared to JPEG using a static quality factor (QF) of 90

D-Scholarship@Pitt

Flexible quantum circuits using scalable continuous-variable cluster states

Author: Alexander Rafael N.
Menicucci Nicolas C.
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2016
Field of study

We show that measurement-based quantum computation on scalable continuous-variable (CV) cluster states admits more quantum-circuit flexibility and compactness than similar protocols for standard square-lattice CV cluster states. This advantage is a direct result of the macronode structure of these states---that is, a lattice structure in which each graph node actually consists of several physical modes. These extra modes provide additional measurement degrees of freedom at each graph location, which can be used to manipulate the flow and processing of quantum information more robustly and with additional flexibility that is not available on an ordinary lattice.Comment: (v2) consistent with published version; (v1) 11 pages (9 figures

arXiv.org e-Print Archive

RMIT Research Repository

ON FPGA BASED ACCELERATION OF IMAGE PROCESSING IN MOBILE ROBOTICS

Author: Cížek Petr
Faigl Jan
Publication venue: 'Czech Technical University in Prague - Central Library'
Publication date: 01/12/2015
Field of study

In visual navigation tasks, a lack of the computational resources is one of the main limitations of micro robotic platforms to be deployed in autonomous missions. It is because the most of nowadays techniques of visual navigation relies on a detection of salient points that is computationally very demanding. In this paper, an FPGA assisted acceleration of image processing is considered to overcome limitations of computational resources available on-board and to enable high processing speeds while it may lower the power consumption of the system. The paper reports on performance evaluation of the CPU–based and FPGA–based implementations of a visual teach-and-repeat navigation system based on detection and tracking of the FAST image salient points. The results indicate that even a computationally efficient FAST algorithm can benefit from a parallel (low–cost) FPGA–based implementation that has a competitive processing time but more importantly it is a more power efficient

Crossref

Directory of Open Access Journals

CTU Open Journal Systems (Czech Technical University, Prague / České vysoké učení technické v Praze)

Throughput Optimized Implementations of QUAD

Author: Jason R. Hamlet
Robert W. Brocato
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 13/05/2013
Field of study

We present several software and hardware implementations of QUAD, a recently introduced stream cipher designed to be provably secure and practical to implement. The software implementations target both a personal computer and an ARM microprocessor. The hardware implementations target field programmable gate arrays. The purpose of our work was to first find the baseline performance of QUAD implementations, then to optimize our implementations for throughput. Our software implementations perform comparably to prior work. Our hardware implementations are the first known implementations to use random coefficients, in agreement with QUAD’s security argument, and achieve much higher throughput than prior implementations

Cryptology ePrint Archive

High Speed and Low Latency ECC Implementation over GF(2m) on FPGA

Author: Benaissa M.
Khan Z.U.A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

In this paper, a novel high-speed elliptic curve cryptography (ECC) processor implementation for point multiplication (PM) on field-programmable gate array (FPGA) is proposed. A new segmented pipelined full-precision multiplier is used to reduce the latency, and the Lopez-Dahab Montgomery PM algorithm is modified for careful scheduling to avoid data dependency resulting in a drastic reduction in the number of clock cycles (CCs) required. The proposed ECC architecture has been implemented on Xilinx FPGAs' Virtex4, Virtex5, and Virtex7 families. To the best of our knowledge, our single- and three-multiplier-based designs show the fastest performance to date when compared with reported works individually. Our one-multiplier-based ECC processor also achieves the highest reported speed together with the best reported area-time performance on Virtex4 (5.32 μs at 210 MHz), on Virtex5 (4.91 μs at 228 MHz), and on the more advanced Virtex7 (3.18 μs at 352 MHz). Finally, the proposed three-multiplier-based ECC implementation is the first work reporting the lowest number of CCs and the fastest ECC processor design on FPGA (450 CCs to get 2.83 μs on Virtex7)

Crossref

White Rose Research Online