1,377 research outputs found
Parallel Integer Polynomial Multiplication
We propose a new algorithm for multiplying dense polynomials with integer
coefficients in a parallel fashion, targeting multi-core processor
architectures. Complexity estimates and experimental comparisons demonstrate
the advantages of this new approach
Fast, Dense Feature SDM on an iPhone
In this paper, we present our method for enabling dense SDM to run at over 90
FPS on a mobile device. Our contributions are two-fold. Drawing inspiration
from the FFT, we propose a Sparse Compositional Regression (SCR) framework,
which enables a significant speed up over classical dense regressors. Second,
we propose a binary approximation to SIFT features. Binary Approximated SIFT
(BASIFT) features, which are a computationally efficient approximation to SIFT,
a commonly used feature with SDM. We demonstrate the performance of our
algorithm on an iPhone 7, and show that we achieve similar accuracy to SDM
06271 Abstracts Collection -- Challenges in Symbolic Computation Software
From 02.07.06 to 07.07.06, the Dagstuhl Seminar 06271 ``Challenges in Symbolic Computation Software\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl.
During the seminar, several participants presented their current
research, and ongoing work and open problems were discussed. Abstracts of
the presentations given during the seminar as well as abstracts of
seminar results and ideas are put together in this paper. The first section
describes the seminar topics and goals in general.
Links to extended abstracts or full papers are provided, if available
Correct and Compositional Hardware Generators
Hardware generators help designers explore families of concrete designs and
their efficiency trade-offs. Both parameterized hardware description languages
(HDLs) and higher-level programming models, however, can obstruct
composability. Different concrete designs in a family can have dramatically
different timing behavior, and high-level hardware generators rarely expose a
consistent HDL-level interface. Composition, therefore, is typically only
feasible at the level of individual instances: the user generates concrete
designs and then composes them, sacrificing the ability to parameterize the
combined design.
We design Parafil, a system for correctly composing hardware generators.
Parafil builds on Filament, an HDL with strong compile-time guarantees, and
lifts those guarantees to generators to prove that all possible instantiations
are free of timing bugs. Parafil can integrate with external hardware
generators via a novel system of output parameters and a framework for invoking
generator tools. We conduct experiments with two other generators, FloPoCo and
Google's XLS, and we implement a parameterized FFT generator to show that
Parafil ensures correct design space exploration.Comment: 13 page
TREBUCHET: Fully Homomorphic Encryption Accelerator for Deep Computation
Secure computation is of critical importance to not only the DoD, but across
financial institutions, healthcare, and anywhere personally identifiable
information (PII) is accessed. Traditional security techniques require data to
be decrypted before performing any computation. When processed on untrusted
systems the decrypted data is vulnerable to attacks to extract the sensitive
information. To address these vulnerabilities Fully Homomorphic Encryption
(FHE) keeps the data encrypted during computation and secures the results, even
in these untrusted environments. However, FHE requires a significant amount of
computation to perform equivalent unencrypted operations. To be useful, FHE
must significantly close the computation gap (within 10x) to make encrypted
processing practical. To accomplish this ambitious goal the TREBUCHET project
is leading research and development in FHE processing hardware to accelerate
deep computations on encrypted data, as part of the DARPA MTO Data Privacy for
Virtual Environments (DPRIVE) program. We accelerate the major secure
standardized FHE schemes (BGV, BFV, CKKS, FHEW, etc.) at >=128-bit security
while integrating with the open-source PALISADE and OpenFHE libraries currently
used in the DoD and in industry. We utilize a novel tile-based chip design with
highly parallel ALUs optimized for vectorized 128b modulo arithmetic. The
TREBUCHET coprocessor design provides a highly modular, flexible, and
extensible FHE accelerator for easy reconfiguration, deployment, integration
and application on other hardware form factors, such as System-on-Chip or
alternate chip areas.Comment: 6 pages, 5figures, 2 table
A multi-point 2D interface: Audio-rate signals for controlling complex multi-parametric sound synthesis
This paper documents a method of controlling complex sound synthesis processes such as granular synthesis, additive synthesis, timbre morphology, swarm-based spatialisation, spectral spatialisation, and timbre spatialisation via a multi-parametric 2D interface. This paper evaluates the use of audio-rate control signals for sound synthesis, and discussing approaches to de-interleaving, synchronization, and mapping. The paper also outlines a number of ways of extending the expressivity of such a control interface by coupling this with another 2D multi-parametric nodes interface and audio-rate 2D table lookup. The paper proceeds to review methods of navigating multi-parameter sets via interpolation and transformation. Some case studies are finally discussed in the paper. The author has used this method to control complex sound synthesis processes that require control data for more that a thousand parameters
SAR processing on the MPP
The processing of synthetic aperture radar (SAR) signals using the massively parallel processor (MPP) is discussed. The fast Fourier transform convolution procedures employed in the algorithms are described. The MPP architecture comprises an array unit (ARU) which processes arrays of data; an array control unit which controls the operation of the ARU and performs scalar arithmetic; a program and data management unit which controls the flow of data; and a unique staging memory (SM) which buffers and permutes data. The ARU contains a 128 by 128 array of bit-serial processing elements (PE). Two-by-four surarrays of PE's are packaged in a custom VLSI HCMOS chip. The staging memory is a large multidimensional-access memory which buffers and permutes data flowing with the system. Efficient SAR processing is achieved via ARU communication paths and SM data manipulation. Real time processing capability can be realized via a multiple ARU, multiple SM configuration
RPU: The Ring Processing Unit
Ring-Learning-with-Errors (RLWE) has emerged as the foundation of many important techniques for improving security and privacy, including homomorphic encryption and post-quantum cryptography. While promising, these techniques have received limited use due to their extreme overheads of running on general-purpose machines. In this paper, we present a novel vector Instruction Set Architecture (ISA) and microarchitecture for accelerating the ring-based computations of RLWE. The ISA, named B512, is developed to meet the needs of ring processing workloads while balancing high-performance and general-purpose programming support. Having an ISA rather than fixed hardware facilitates continued software improvement post-fabrication and the ability to support the evolving workloads. We then propose the ring processing unit (RPU), a high-performance, modular implementation of B512. The RPU has native large word modular arithmetic support, capabilities for very wide parallel processing, and a large capacity high-bandwidth scratchpad to meet the needs of ring processing. We address the challenges of programming the RPU using a newly developed SPIRAL backend. A configurable simulator is built to characterize design tradeoffs and quantify performance. The best performing design was implemented in RTL and used to validate simulator performance. In addition to our characterization, we show that a RPU using 20.5mm2 of GF 12nm can provide a speedup of 1485x over a CPU running a 64k, 128-bit NTT, a core RLWE workloa
- …