Search CORE

316 research outputs found

Turvaliste reaalarvuoperatsioonide efektiivsemaks muutmine

Author: Krips Toomas
Publication venue
Publication date: 07/05/2019
Field of study

Tänapäeval on andmed ja nende analüüsimine laialt levinud ja neist on palju kasu. Selle populaarsuse tõttu on ka rohkem levinud igasugused kombinatsioonid, kuidas andmed ja nende põhjal arvutamine omavahel suhestuda võivad. Meie töö fookuseks on siinkohal need juhtumid, kus andmete omanikud ja need osapooled, kes neid analüüsima peaks, ei lange kas osaliselt või täielikult kokku. Selle näiteks võib tuua meditsiiniandmed, mida nende omanikud tahaks ühest küljest salajas hoida, aga mille kollektiivsel analüüsimine on kasulik. Teiseks näiteks on arvutuste delegeerimine suurema arvutusvõimsusega, ent mitte täiesti usaldusväärsele osapoolele. Valdkond, mis selliseid probleeme uurib, kannab nime turvaline ühisarvutus. Antud valdkond on eelkõige keskendunud juhtumile, kus andmed on kas täisarvulisel või bitilisel kujul, kuna neid on lihtsam analüüsida ja teised juhtumid saab nendest tuletada, sest kõige, mis üldse arvutatav on, väljaarvutamiseks piisab bittide liitmisest ja korrutamisest. See on teoorias tõsi, samas, kui kõike otse bittide või täisarvude tasemel teha, on tulemus ebaefektiivne. Seepärast vaatleb see doktoritöö turvalist ühisarvutust reaalarvudel ja meetodeid, kuidas seda efektiivsemaks teha. Esiteks vaatleme ujukoma- ja püsikomaarve. Ujukomaarvud on väga paindlikud ja täpsed, aga on teisalt jälle üsna keeruka struktuuriga. Püsikomaarvud on lihtsa olemusega, ent kannatavad täpsuses. Töö esimene meetod vaatlebki nende kombineerimist, et mõlema häid omadusi ära kasutada. Teine tehnika baseerub tõigal, et antud paradigmas juhtub, et ei ole erilist ajalist vahet, kas paralleelis teha üks tehe või miljon. Sestap katsume töö teises meetodis teha paralleelselt hästi palju mingit lihtsat operatsiooni, et välja arvutada mõnd keerulisemat. Kolmas tehnika kasutab reaalarvude kujutamiseks täisarvupaare, (a,b), mis kujutavad reaalarvu a- φb, kus φ=1.618... on kuldlõige. Osutub, et see võimaldab meil üsna efektiivselt liita ja korrutada ja saavutada mõistlik täpsus.Nowadays data and its analysis are ubiquitous and very useful. Due to this popularity, different combinations of how these two can relate to each other proliferate. We focus on the cases where the owners of the data and those who compute on them don't coincide either partially or totally. Examples are medicinal data where the owners want secrecy but where doing statistics on them collectively is useful, or outsourcing computation. The discipline that studies these cases is called secure computation. This field has been mostly working on integer and bit data types, as they are easier to work on, and due to it being possible to reduce the other cases to integer and bit manipulations. However, using these reductions bluntly will give inefficient results. Thus this thesis studies secure computation on real numbers and presents three methods for improving efficiency. The first method concerns with fixed-point and floating-point numbers. Fixed-point numbers are simple in construction, but can lack precision and flexibility. Floating-point numbers, on the other hand, are precise and flexible, but are rather complicated in nature, which in secure setting translates to expensive operations. The first method thus combines those two number types for greater efficiency. The second method is based on the fact that in the concrete paradigm we use, it does not matter timewise whether we perform one or million operations in parallel. Thus we attempt to perform many instances of a fast operation in parallel in order to evaluate a more complicated one. Thirdly we introduce a new real number type. We use pairs of integers (a,b) to represent the real number a- φb where φ=1.618... is the golden ratio. This number type allows us to perform addition and multiplication relatively quicky and also achieves reasonable granularity.https://www.ester.ee/record=b522708

DSpace at Tartu University Library

Serial-data computation in VLSI

Author: Smith Stewart Gresty
Publication venue: The University of Edinburgh
Publication date: 01/01/1987
Field of study

Edinburgh Research Archive

Highly accelerated simulations of glassy dynamics using GPUs: caveats on limited floating-point precision

Author: Anderson
Block
Dekker
Felix Höfling
Flenner
Frenkel
Götze
Hansen
Harvey
Knuth
Knuth
Kob
Kob
Kob
Lippert
Liu
Mosayebi
Peter H. Colberg
Plimpton
Preis
Rapaport
Sagan
Stone
van Meel
Voelz
Weeks
Xu
Yang
Zagha
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

Modern graphics processing units (GPUs) provide impressive computing resources, which can be accessed conveniently through the CUDA programming interface. We describe how GPUs can be used to considerably speed up molecular dynamics (MD) simulations for system sizes ranging up to about 1 million particles. Particular emphasis is put on the numerical long-time stability in terms of energy and momentum conservation, and caveats on limited floating-point precision are issued. Strict energy conservation over 10^8 MD steps is obtained by double-single emulation of the floating-point arithmetic in accuracy-critical parts of the algorithm. For the slow dynamics of a supercooled binary Lennard-Jones mixture, we demonstrate that the use of single-floating point precision may result in quantitatively and even physically wrong results. For simulations of a Lennard-Jones fluid, the described implementation shows speedup factors of up to 80 compared to a serial implementation for the CPU, and a single GPU was found to compare with a parallelised MD simulation using 64 distributed cores.Comment: 12 pages, 7 figures, to appear in Comp. Phys. Comm., HALMD package licensed under the GPL, see http://research.colberg.org/projects/halm

arXiv.org e-Print Archive

Institute of Transport Research:Publications

Crossref

Computing the fast Fourier transform on SIMD microprocessors

Author: Blake Anthony Martin
Publication venue: 'University of Waikato'
Publication date: 18/06/2012
Field of study

This thesis describes how to compute the fast Fourier transform (FFT) of a power-of-two length signal on single-instruction, multiple-data (SIMD) microprocessors faster than or very close to the speed of state of the art libraries such as FFTW (“Fastest Fourier Transform in the West”), SPIRAL and Intel Integrated Performance Primitives (IPP). The conjugate-pair algorithm has advantages in terms of memory bandwidth, and three implementations of this algorithm, which incorporate latency and spatial locality optimizations, are automatically vectorized at the algorithm level of abstraction. Performance results on 2- way, 4-way and 8-way SIMD machines show that the performance scales much better than FFTW or SPIRAL. The implementations presented in this thesis are compiled into a high-performance FFT library called SFFT (“Streaming Fast Fourier Trans- form”), and benchmarked against FFTW, SPIRAL, Intel IPP and Apple Accelerate on sixteen x86 machines and two ARM NEON machines, and shown to be, in many cases, faster than these state of the art libraries, but without having to perform extensive machine specific calibration, thus demonstrating that there are good heuristics for predicting the performance of the FFT on SIMD microprocessors (i.e., the need for empirical optimization may be overstated)

Research Commons@Waikato

Portable high-performance superconducting : high-level platform-dependent optimization

Author: Brewer Eric A
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1994
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1994.Includes bibliographical references (p. 163-172).by Eric Allen Brewer.Ph.D

DSpace@MIT

Acceleration Techniques for Sparse Recovery Based Plane-wave Decomposition of a Sound Field

Author: Samarawickrama Mahendra
Publication venue: Faculty of Engineering and Information Technologies, School of Electrical and Information Engineering
Publication date: 28/02/2017
Field of study

Plane-wave decomposition by sparse recovery is a reliable and accurate technique for plane-wave decomposition which can be used for source localization, beamforming, etc. In this work, we introduce techniques to accelerate the plane-wave decomposition by sparse recovery. The method consists of two main algorithms which are spherical Fourier transformation (SFT) and sparse recovery. Comparing the two algorithms, the sparse recovery is the most computationally intensive. We implement the SFT on an FPGA and the sparse recovery on a multithreaded computing platform. Then the multithreaded computing platform could be fully utilized for the sparse recovery. On the other hand, implementing the SFT on an FPGA helps to flexibly integrate the microphones and improve the portability of the microphone array. For implementing the SFT on an FPGA, we develop a scalable FPGA design model that enables the quick design of the SFT architecture on FPGAs. The model considers the number of microphones, the number of SFT channels and the cost of the FPGA and provides the design of a resource optimized and cost-effective FPGA architecture as the output. Then we investigate the performance of the sparse recovery algorithm executed on various multithreaded computing platforms (i.e., chip-multiprocessor, multiprocessor, GPU, manycore). Finally, we investigate the influence of modifying the dictionary size on the computational performance and the accuracy of the sparse recovery algorithms. We introduce novel sparse-recovery techniques which use non-uniform dictionaries to improve the performance of the sparse recovery on a parallel architecture

Sydney eScholarship

Comparison of logarithmic and floating-point number systems implemented on Xilinx Virtex-II field-programmable gate arrays

Author: Lee Barry Roland
Publication venue
Publication date
Field of study

The aim of this thesis is to compare the implementation of parameterisable LNS (logarithmic number system) and floating-point high dynamic range number systems on FPGA. The Virtex/Virtex-II range of FPGAs from Xilinx, which are the most popular FPGA technology, are used to implement the designs. The study focuses on using the low level primitives of the technology in an efficient way and so initially the design issues in implementing fixed-point operators are considered. The four basic operations of addition, multiplication, division and square root are considered. Carry- free adders, ripple-carry adders, parallel multipliers and digit recurrence division and square root are discussed. The floating-point operators use the word format and exceptions as described by the IEEE std-754. A dual-path adder implementation is described in detail, as are floating-point multiplier, divider and square root components. Results and comparisons with other works are given. The efficient implementation of function evaluation methods is considered next. An overview of current FPGA methods is given and a new piecewise polynomial implementation using the Taylor series is presented and compared with other designs in the literature. In the next section the LNS word format, accuracy and exceptions are described and two new LNS addition/subtraction function approximations are described. The algorithms for performing multiplication, division and powering in the LNS domain are also described and are compared with other designs in the open literature. Parameterisable conversion algorithms to convert to/from the fixed-point domain from/to the LNS and floating-point domain are described and implementation results given. In the next chapter MATLAB bit-true software models are given that have the exact functionality as the hardware models. The interfaces of the models are given and a serial communication system to perform low speed system tests is described. A comparison of the LNS and floating-point number systems in terms of area and delay is given. Different functions implemented in LNS and floating-point arithmetic are also compared and conclusions are drawn. The results show that when the LNS is implemented with a 6-bit or less characteristic it is superior to floating-point. However, for larger characteristic lengths the floating-point system is more efficient due to the delay and exponential area increase of the LNS addition operator. The LNS is beneficial for larger characteristics than 6-bits only for specialist applications that require a high portion of division, multiplication, square root, powering operations and few additions

Online Research @ Cardiff