316 research outputs found

    Turvaliste reaalarvuoperatsioonide efektiivsemaks muutmine

    Get PDF
    TĂ€napĂ€eval on andmed ja nende analĂŒĂŒsimine laialt levinud ja neist on palju kasu. Selle populaarsuse tĂ”ttu on ka rohkem levinud igasugused kombinatsioonid, kuidas andmed ja nende pĂ”hjal arvutamine omavahel suhestuda vĂ”ivad. Meie töö fookuseks on siinkohal need juhtumid, kus andmete omanikud ja need osapooled, kes neid analĂŒĂŒsima peaks, ei lange kas osaliselt vĂ”i tĂ€ielikult kokku. Selle nĂ€iteks vĂ”ib tuua meditsiiniandmed, mida nende omanikud tahaks ĂŒhest kĂŒljest salajas hoida, aga mille kollektiivsel analĂŒĂŒsimine on kasulik. Teiseks nĂ€iteks on arvutuste delegeerimine suurema arvutusvĂ”imsusega, ent mitte tĂ€iesti usaldusvÀÀrsele osapoolele. Valdkond, mis selliseid probleeme uurib, kannab nime turvaline ĂŒhisarvutus. Antud valdkond on eelkĂ”ige keskendunud juhtumile, kus andmed on kas tĂ€isarvulisel vĂ”i bitilisel kujul, kuna neid on lihtsam analĂŒĂŒsida ja teised juhtumid saab nendest tuletada, sest kĂ”ige, mis ĂŒldse arvutatav on, vĂ€ljaarvutamiseks piisab bittide liitmisest ja korrutamisest. See on teoorias tĂ”si, samas, kui kĂ”ike otse bittide vĂ”i tĂ€isarvude tasemel teha, on tulemus ebaefektiivne. SeepĂ€rast vaatleb see doktoritöö turvalist ĂŒhisarvutust reaalarvudel ja meetodeid, kuidas seda efektiivsemaks teha. Esiteks vaatleme ujukoma- ja pĂŒsikomaarve. Ujukomaarvud on vĂ€ga paindlikud ja tĂ€psed, aga on teisalt jĂ€lle ĂŒsna keeruka struktuuriga. PĂŒsikomaarvud on lihtsa olemusega, ent kannatavad tĂ€psuses. Töö esimene meetod vaatlebki nende kombineerimist, et mĂ”lema hĂ€id omadusi Ă€ra kasutada. Teine tehnika baseerub tĂ”igal, et antud paradigmas juhtub, et ei ole erilist ajalist vahet, kas paralleelis teha ĂŒks tehe vĂ”i miljon. Sestap katsume töö teises meetodis teha paralleelselt hĂ€sti palju mingit lihtsat operatsiooni, et vĂ€lja arvutada mĂ”nd keerulisemat. Kolmas tehnika kasutab reaalarvude kujutamiseks tĂ€isarvupaare, (a,b), mis kujutavad reaalarvu a- φb, kus φ=1.618... on kuldlĂ”ige. Osutub, et see vĂ”imaldab meil ĂŒsna efektiivselt liita ja korrutada ja saavutada mĂ”istlik tĂ€psus.Nowadays data and its analysis are ubiquitous and very useful. Due to this popularity, different combinations of how these two can relate to each other proliferate. We focus on the cases where the owners of the data and those who compute on them don't coincide either partially or totally. Examples are medicinal data where the owners want secrecy but where doing statistics on them collectively is useful, or outsourcing computation. The discipline that studies these cases is called secure computation. This field has been mostly working on integer and bit data types, as they are easier to work on, and due to it being possible to reduce the other cases to integer and bit manipulations. However, using these reductions bluntly will give inefficient results. Thus this thesis studies secure computation on real numbers and presents three methods for improving efficiency. The first method concerns with fixed-point and floating-point numbers. Fixed-point numbers are simple in construction, but can lack precision and flexibility. Floating-point numbers, on the other hand, are precise and flexible, but are rather complicated in nature, which in secure setting translates to expensive operations. The first method thus combines those two number types for greater efficiency. The second method is based on the fact that in the concrete paradigm we use, it does not matter timewise whether we perform one or million operations in parallel. Thus we attempt to perform many instances of a fast operation in parallel in order to evaluate a more complicated one. Thirdly we introduce a new real number type. We use pairs of integers (a,b) to represent the real number a- φb where φ=1.618... is the golden ratio. This number type allows us to perform addition and multiplication relatively quicky and also achieves reasonable granularity.https://www.ester.ee/record=b522708

    Serial-data computation in VLSI

    Get PDF

    Highly accelerated simulations of glassy dynamics using GPUs: caveats on limited floating-point precision

    Full text link
    Modern graphics processing units (GPUs) provide impressive computing resources, which can be accessed conveniently through the CUDA programming interface. We describe how GPUs can be used to considerably speed up molecular dynamics (MD) simulations for system sizes ranging up to about 1 million particles. Particular emphasis is put on the numerical long-time stability in terms of energy and momentum conservation, and caveats on limited floating-point precision are issued. Strict energy conservation over 10^8 MD steps is obtained by double-single emulation of the floating-point arithmetic in accuracy-critical parts of the algorithm. For the slow dynamics of a supercooled binary Lennard-Jones mixture, we demonstrate that the use of single-floating point precision may result in quantitatively and even physically wrong results. For simulations of a Lennard-Jones fluid, the described implementation shows speedup factors of up to 80 compared to a serial implementation for the CPU, and a single GPU was found to compare with a parallelised MD simulation using 64 distributed cores.Comment: 12 pages, 7 figures, to appear in Comp. Phys. Comm., HALMD package licensed under the GPL, see http://research.colberg.org/projects/halm

    Computing the fast Fourier transform on SIMD microprocessors

    Get PDF
    This thesis describes how to compute the fast Fourier transform (FFT) of a power-of-two length signal on single-instruction, multiple-data (SIMD) microprocessors faster than or very close to the speed of state of the art libraries such as FFTW (“Fastest Fourier Transform in the West”), SPIRAL and Intel Integrated Performance Primitives (IPP). The conjugate-pair algorithm has advantages in terms of memory bandwidth, and three implementations of this algorithm, which incorporate latency and spatial locality optimizations, are automatically vectorized at the algorithm level of abstraction. Performance results on 2- way, 4-way and 8-way SIMD machines show that the performance scales much better than FFTW or SPIRAL. The implementations presented in this thesis are compiled into a high-performance FFT library called SFFT (“Streaming Fast Fourier Trans- form”), and benchmarked against FFTW, SPIRAL, Intel IPP and Apple Accelerate on sixteen x86 machines and two ARM NEON machines, and shown to be, in many cases, faster than these state of the art libraries, but without having to perform extensive machine specific calibration, thus demonstrating that there are good heuristics for predicting the performance of the FFT on SIMD microprocessors (i.e., the need for empirical optimization may be overstated)

    Portable high-performance superconducting : high-level platform-dependent optimization

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1994.Includes bibliographical references (p. 163-172).by Eric Allen Brewer.Ph.D

    Acceleration Techniques for Sparse Recovery Based Plane-wave Decomposition of a Sound Field

    Get PDF
    Plane-wave decomposition by sparse recovery is a reliable and accurate technique for plane-wave decomposition which can be used for source localization, beamforming, etc. In this work, we introduce techniques to accelerate the plane-wave decomposition by sparse recovery. The method consists of two main algorithms which are spherical Fourier transformation (SFT) and sparse recovery. Comparing the two algorithms, the sparse recovery is the most computationally intensive. We implement the SFT on an FPGA and the sparse recovery on a multithreaded computing platform. Then the multithreaded computing platform could be fully utilized for the sparse recovery. On the other hand, implementing the SFT on an FPGA helps to flexibly integrate the microphones and improve the portability of the microphone array. For implementing the SFT on an FPGA, we develop a scalable FPGA design model that enables the quick design of the SFT architecture on FPGAs. The model considers the number of microphones, the number of SFT channels and the cost of the FPGA and provides the design of a resource optimized and cost-effective FPGA architecture as the output. Then we investigate the performance of the sparse recovery algorithm executed on various multithreaded computing platforms (i.e., chip-multiprocessor, multiprocessor, GPU, manycore). Finally, we investigate the influence of modifying the dictionary size on the computational performance and the accuracy of the sparse recovery algorithms. We introduce novel sparse-recovery techniques which use non-uniform dictionaries to improve the performance of the sparse recovery on a parallel architecture

    Comparison of logarithmic and floating-point number systems implemented on Xilinx Virtex-II field-programmable gate arrays

    Get PDF
    The aim of this thesis is to compare the implementation of parameterisable LNS (logarithmic number system) and floating-point high dynamic range number systems on FPGA. The Virtex/Virtex-II range of FPGAs from Xilinx, which are the most popular FPGA technology, are used to implement the designs. The study focuses on using the low level primitives of the technology in an efficient way and so initially the design issues in implementing fixed-point operators are considered. The four basic operations of addition, multiplication, division and square root are considered. Carry- free adders, ripple-carry adders, parallel multipliers and digit recurrence division and square root are discussed. The floating-point operators use the word format and exceptions as described by the IEEE std-754. A dual-path adder implementation is described in detail, as are floating-point multiplier, divider and square root components. Results and comparisons with other works are given. The efficient implementation of function evaluation methods is considered next. An overview of current FPGA methods is given and a new piecewise polynomial implementation using the Taylor series is presented and compared with other designs in the literature. In the next section the LNS word format, accuracy and exceptions are described and two new LNS addition/subtraction function approximations are described. The algorithms for performing multiplication, division and powering in the LNS domain are also described and are compared with other designs in the open literature. Parameterisable conversion algorithms to convert to/from the fixed-point domain from/to the LNS and floating-point domain are described and implementation results given. In the next chapter MATLAB bit-true software models are given that have the exact functionality as the hardware models. The interfaces of the models are given and a serial communication system to perform low speed system tests is described. A comparison of the LNS and floating-point number systems in terms of area and delay is given. Different functions implemented in LNS and floating-point arithmetic are also compared and conclusions are drawn. The results show that when the LNS is implemented with a 6-bit or less characteristic it is superior to floating-point. However, for larger characteristic lengths the floating-point system is more efficient due to the delay and exponential area increase of the LNS addition operator. The LNS is beneficial for larger characteristics than 6-bits only for specialist applications that require a high portion of division, multiplication, square root, powering operations and few additions
    • 

    corecore