Search CORE

1,377 research outputs found

Parallel Integer Polynomial Multiplication

Author: Chen Changbo
Covanov Svyatoslav
Mansouri Farnam
Maza Marc Moreno
Xie Ning
Xie Yuzhen
Publication venue
Publication date: 24/09/2016
Field of study

We propose a new algorithm for multiplying dense polynomials with integer coefficients in a parallel fashion, targeting multi-core processor architectures. Complexity estimates and experimental comparisons demonstrate the advantages of this new approach

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Fast, Dense Feature SDM on an iPhone

Author: Fagg Ashton
Lucey Simon
Sridharan Sridha
Publication venue
Publication date: 15/12/2016
Field of study

In this paper, we present our method for enabling dense SDM to run at over 90 FPS on a mobile device. Our contributions are two-fold. Drawing inspiration from the FFT, we propose a Sparse Compositional Regression (SCR) framework, which enables a significant speed up over classical dense regressors. Second, we propose a binary approximation to SIFT features. Binary Approximated SIFT (BASIFT) features, which are a computationally efficient approximation to SIFT, a commonly used feature with SDM. We demonstrate the performance of our algorithm on an iPhone 7, and show that we achieve similar accuracy to SDM

arXiv.org e-Print Archive

Crossref

Queensland University of Technology ePrints Archive

06271 Abstracts Collection -- Challenges in Symbolic Computation Software

Author: Decker Wolfram
Dewar Mike
Kaltofen Erich
Watt Stephen M.
Publication venue: Dagstuhl Seminar Proceedings. 06271 - Challenges in Symbolic Computation Software
Publication date: 01/01/2006
Field of study

From 02.07.06 to 07.07.06, the Dagstuhl Seminar 06271 ``Challenges in Symbolic Computation Software\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

Dagstuhl Research Online Publication Server

Correct and Compositional Hardware Generators

Author: Gabizon Ethan
Lam Edmund
Nigam Rachit
Sampson Adrian
Publication venue
Publication date: 04/01/2024
Field of study

Hardware generators help designers explore families of concrete designs and their efficiency trade-offs. Both parameterized hardware description languages (HDLs) and higher-level programming models, however, can obstruct composability. Different concrete designs in a family can have dramatically different timing behavior, and high-level hardware generators rarely expose a consistent HDL-level interface. Composition, therefore, is typically only feasible at the level of individual instances: the user generates concrete designs and then composes them, sacrificing the ability to parameterize the combined design. We design Parafil, a system for correctly composing hardware generators. Parafil builds on Filament, an HDL with strong compile-time guarantees, and lifts those guarantees to generators to prove that all possible instantiations are free of timing bugs. Parafil can integrate with external hardware generators via a novel system of output parameters and a framework for invoking generator tools. We conduct experiments with two other generators, FloPoCo and Google's XLS, and we implement a parameterized FFT generator to show that Parafil ensures correct design space exploration.Comment: 13 page

arXiv.org e-Print Archive

TREBUCHET: Fully Homomorphic Encryption Accelerator for Deep Computation

Author: Badawi Ahmad Al
Canida Kellie
Cousins David Bruce
French Matthew
Gamil Homer
Jacob Ajey
Jaiswal Akhilesh
Maniatakos Michail
Mathew Clynn
Neda Negar
Polyakov Yuriy
Reagen Brandon
Reynwar Benedict
Schmidt Andrew
Soni Deepraj
Publication venue
Publication date: 11/04/2023
Field of study

Secure computation is of critical importance to not only the DoD, but across financial institutions, healthcare, and anywhere personally identifiable information (PII) is accessed. Traditional security techniques require data to be decrypted before performing any computation. When processed on untrusted systems the decrypted data is vulnerable to attacks to extract the sensitive information. To address these vulnerabilities Fully Homomorphic Encryption (FHE) keeps the data encrypted during computation and secures the results, even in these untrusted environments. However, FHE requires a significant amount of computation to perform equivalent unencrypted operations. To be useful, FHE must significantly close the computation gap (within 10x) to make encrypted processing practical. To accomplish this ambitious goal the TREBUCHET project is leading research and development in FHE processing hardware to accelerate deep computations on encrypted data, as part of the DARPA MTO Data Privacy for Virtual Environments (DPRIVE) program. We accelerate the major secure standardized FHE schemes (BGV, BFV, CKKS, FHEW, etc.) at >=128-bit security while integrating with the open-source PALISADE and OpenFHE libraries currently used in the DoD and in industry. We utilize a novel tile-based chip design with highly parallel ALUs optimized for vectorized 128b modulo arithmetic. The TREBUCHET coprocessor design provides a highly modular, flexible, and extensible FHE accelerator for easy reconfiguration, deployment, integration and application on other hardware form factors, such as System-on-Chip or alternate chip areas.Comment: 6 pages, 5figures, 2 table

arXiv.org e-Print Archive

A multi-point 2D interface: Audio-rate signals for controlling complex multi-parametric sound synthesis

Author: James Stuart
Publication venue: Edith Cowan University, Research Online, Perth, Western Australia
Publication date: 01/01/2016
Field of study

This paper documents a method of controlling complex sound synthesis processes such as granular synthesis, additive synthesis, timbre morphology, swarm-based spatialisation, spectral spatialisation, and timbre spatialisation via a multi-parametric 2D interface. This paper evaluates the use of audio-rate control signals for sound synthesis, and discussing approaches to de-interleaving, synchronization, and mapping. The paper also outlines a number of ways of extending the expressivity of such a control interface by coupling this with another 2D multi-parametric nodes interface and audio-rate 2D table lookup. The paper proceeds to review methods of navigating multi-parameter sets via interpolation and transformation. Some case studies are finally discussed in the paper. The author has used this method to control complex sound synthesis processes that require control data for more that a thousand parameters

Research Online @ ECU

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

SAR processing on the MPP

Author: Batcher K. E.
Eddey E. E.
Faiss R. O.
Gilmore P. A.
Publication venue
Publication date
Field of study

The processing of synthetic aperture radar (SAR) signals using the massively parallel processor (MPP) is discussed. The fast Fourier transform convolution procedures employed in the algorithms are described. The MPP architecture comprises an array unit (ARU) which processes arrays of data; an array control unit which controls the operation of the ARU and performs scalar arithmetic; a program and data management unit which controls the flow of data; and a unique staging memory (SM) which buffers and permutes data. The ARU contains a 128 by 128 array of bit-serial processing elements (PE). Two-by-four surarrays of PE's are packaged in a custom VLSI HCMOS chip. The staging memory is a large multidimensional-access memory which buffers and permutes data flowing with the system. Efficient SAR processing is achieved via ARU communication paths and SM data manipulation. Real time processing capability can be realized via a multiple ARU, multiple SM configuration

NASA Technical Reports Server

RPU: The Ring Processing Unit

Author: Ahmad Al Badawi
Andrew Schmidt
Benedict Reynwar
Benjamin Heyman
Brandon Reagen
David Bruce Cousins
Deepraj Soni
Franz Franchetti
Homer Gamil
Kellie Canida
Massoud Pedram
Matthew French
Michail Maniatakos
Mohammed Nabeel Thari Moopan
Naifeng Zhang
Negar Neda
Yuriy Polyakov
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 30/03/2023
Field of study

Ring-Learning-with-Errors (RLWE) has emerged as the foundation of many important techniques for improving security and privacy, including homomorphic encryption and post-quantum cryptography. While promising, these techniques have received limited use due to their extreme overheads of running on general-purpose machines. In this paper, we present a novel vector Instruction Set Architecture (ISA) and microarchitecture for accelerating the ring-based computations of RLWE. The ISA, named B512, is developed to meet the needs of ring processing workloads while balancing high-performance and general-purpose programming support. Having an ISA rather than fixed hardware facilitates continued software improvement post-fabrication and the ability to support the evolving workloads. We then propose the ring processing unit (RPU), a high-performance, modular implementation of B512. The RPU has native large word modular arithmetic support, capabilities for very wide parallel processing, and a large capacity high-bandwidth scratchpad to meet the needs of ring processing. We address the challenges of programming the RPU using a newly developed SPIRAL backend. A configurable simulator is built to characterize design tradeoffs and quantify performance. The best performing design was implemented in RTL and used to validate simulator performance. In addition to our characterization, we show that a RPU using 20.5mm2 of GF 12nm can provide a speedup of 1485x over a CPU running a 64k, 128-bit NTT, a core RLWE workloa

Cryptology ePrint Archive