Search CORE

2,366 research outputs found

Fast Exact Bayesian Inference for Sparse Signals in the Normal Sequence Model

Author: Szabo Botond
van Erven Tim
Publication venue
Publication date: 15/04/2020
Field of study

We consider exact algorithms for Bayesian inference with model selection priors (including spike-and-slab priors) in the sparse normal sequence model. Because the best existing exact algorithm becomes numerically unstable for sample sizes over n=500, there has been much attention for alternative approaches like approximate algorithms (Gibbs sampling, variational Bayes, etc.), shrinkage priors (e.g. the Horseshoe prior and the Spike-and-Slab LASSO) or empirical Bayesian methods. However, by introducing algorithmic ideas from online sequential prediction, we show that exact calculations are feasible for much larger sample sizes: for general model selection priors we reach n=25000, and for certain spike-and-slab priors we can easily reach n=100000. We further prove a de Finetti-like result for finite sample sizes that characterizes exactly which model selection priors can be expressed as spike-and-slab priors. The computational speed and numerical accuracy of the proposed methods are demonstrated in experiments on simulated data, on a differential gene expression data set, and to compare the effect of multiple hyper-parameter settings in the beta-binomial prior. In our experimental evaluation we compute guaranteed bounds on the numerical accuracy of all new algorithms, which shows that the proposed methods are numerically reliable whereas an alternative based on long division is not

arXiv.org e-Print Archive

Archivio istituzionale della Ricerca - Bocconi

Efficient algorithms for pairing-based cryptosystems

Author: A. Joux
A.J. Menezes
A.J. Menezes
D. Boneh
D. Boneh
E. Verheul
E. Verheul
G. Frey
N. Koblitz
N. Tzanakis
R. Schroeppel
S. Galbraith
S. Galbraith
T. Itoh
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2002
Field of study

We describe fast new algorithms to implement recent cryptosystems based on the Tate pairing. In particular, our techniques improve pairing evaluation speed by a factor of about 55 compared to previously known methods in characteristic 3, and attain performance comparable to that of RSA in larger characteristics.We also propose faster algorithms for scalar multiplication in characteristic 3 and square root extraction over Fpm, the latter technique being also useful in contexts other than that of pairing-based cryptography

CiteSeerX

Crossref

DCU Online Research Access Service

Cryptology ePrint Archive

Wavemoth -- Fast spherical harmonic transforms by butterfly matrix compression

Author: D. S. Seljebotn
Eriksen
Górski
Hupca
Press
Suda
Szydlarski
Wiaux
Publication venue: 'IOP Publishing'
Publication date: 12/01/2012
Field of study

We present Wavemoth, an experimental open source code for computing scalar spherical harmonic transforms (SHTs). Such transforms are ubiquitous in astronomical data analysis. Our code performs substantially better than existing publicly available codes due to improvements on two fronts. First, the computational core is made more efficient by using small amounts of precomputed data, as well as paying attention to CPU instruction pipelining and cache usage. Second, Wavemoth makes use of a fast and numerically stable algorithm based on compressing a set of linear operators in a precomputation step. The resulting SHT scales as O(L^2 (log L)^2) for the resolution range of practical interest, where L denotes the spherical harmonic truncation degree. For low and medium-range resolutions, Wavemoth tends to be twice as fast as libpsht, which is the current state of the art implementation for the HEALPix grid. At the resolution of the Planck experiment, L ~ 4000, Wavemoth is between three and six times faster than libpsht, depending on the computer architecture and the required precision. Due to the experimental nature of the project, only spherical harmonic synthesis is currently supported, although adding support or spherical harmonic analysis should be trivial.Comment: 13 pages, 6 figures, accepted by ApJ

arXiv.org e-Print Archive

Crossref

Multiple point compression on curves

Author: Fan X.
Otemissov A.
Sica F.
Sidorenko A.
Publication venue: Nazarbayev University
Publication date: 01/01/2014
Field of study

Multiple point compression is an important feature to improve the implementation of elliptic curve cryptography. This can be extended to other curves, in particular hyperelliptic curves, with divisors represented in Mumford form

Nazarbayev University Repository

온-디바이스 합성곱 신경망 연산 가속기를 위한 고성능 연산 유닛 설계

Author: 강종성
Publication venue: 서울대학교 대학원
Publication date: 01/08/2020
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2020. 8. 김태환.Optimizing computing units for an on-device neural network accelerator can bring less energy and latency, more throughput, and might enable unprecedented new applications. This dissertation studies on two specific optimization opportunities of multiplyaccumulate (MAC) unit for on-device neural network accelerator stem from precision quantization methodology. Firstly, we propose an enhanced MAC processing unit structure efficiently processing mixed-precision model with majority operations with low precision. Precisely, two essential works are: (1) MAC unit structure supporting two precision modes is designed for fully utilizing its computation logic when processing lower precision data, which brings more computation efficiency for mixed-precision models whose major operations are in lower precision; (2) for a set of input CNNs, we formulate the exploration of the size of a single internal multiplier in MAC unit to derive an economical instance, in terms of computation and energy cost, of MAC unit structure across the whole network layers. Experimental results with two well-known CNN models, AlexNet and VGG-16, and two experimental precision settings showed that proposed units can reduce computational cost per multiplication by 4.68∼30.3% and save energy cost by 43.3% on average over conventional units. Secondly, we propose an acceleration technique for processing multiplication operations using stochastic computing (SC). MUX-FSM based SC, which employs a MUX controlled by an FSM to generate a bit sequence of a binary number to count up for a MAC operation, considerably reduces the hardware cost for implementing MAC operations over the traditional stochastic number generator (SNG) based SC. Nevertheless, the existing MUX-FSM based SC still does not meet the multiplication processing time required for a wide adoption of on-device neural networks in practice even though it offers a very economical hardware implementation. Also, conventional enhancements have their limitation for sub-maximal cycle reduction, parameter conversion cost, etc. This work proposes a solution to the problem of further speeding up the conventional MUX-FSM based SC. Precisely, we analyze the bit counting pattern produced by MUX-FSM and replace the counting redundancy by shift operation, resulting in reducing the length of the required bit sequence significantly, theoretically speeding up the worst-case multiplication processing time by 2X or more. Through experiments, it is shown that our enhanced SC technique is able to shorten the average processing time by 38.8% over the conventional MUX-FSM based SC.온-디바이스 인공 신경망 연산 가속기를 위한 연산 회로 최적화는 저전력, 저지연시간, 높은 처리량, 그리고 이전에 불가하였던 새로운 응용을 가능케 할 수 있다. 본 논문에서는 온-디바이스 인공 신경망 연산 가속기의 곱셈-누적합 연산기(MAC)에 대해 정밀도 양자화 기법 적용 과정에서 파생한 두 가지 특정한 최적화 문제에 대해 논의한다. 첫 번째로, 낮은 정밀도 연산이 대다수를 차지하도록 준비된 다중 정밀도가 적용된 모델을 효율적으로 처리하기 위해 개선된 MAC 연산 유닛 구조를 제안한다. 구체적으로, 다음 두 가지 기여점을 제안한다: (1) 제안한 두 가지 정밀도 모드를 지원하는 MAC 유닛 구조는 낮은 정밀도 데이터를 연산할 때 유닛의 연산 회로를 최대한 활용하도록 설계되며, 낮은 정밀도 연산 비율이 대다수를 차지하는 다중 정밀도 연산 모델에 더 높은 연산 효율을 제공한다; (2) 연산 대상 CNN 네트워크에 대해, MAC 유닛의 내부 곱셈기의 `경제적인' (비트) 크기를 탐색하기 위한 비용 함수를, 전체 네트워크 레이어를 연산 대상으로 하여 연산 비용과 에너지 비용 항으로 나타냈다. 널리 알려진 AlexNet과 VGG-16 CNN 모델에 대하여, 그리고 두 가지 실험 상 정밀도 구성에 대하여, 실험 결과 제안한 유닛이 기존 유닛 대비 단위 곱셈당 연산 비용을 4.68~30.3% 절감하였으며 에너지 비용을 43.3% 절감하였다. 두 번째로, 스토캐스틱 컴퓨팅 (SC) 기반 MAC 연산 유닛의 연산 사이클 절감을 위한 기법 및 연관된 하드웨어 유닛 구조를 제안한다. FSM으로 제어되는 MUX를 통해 입력 이진수에서 만든 비트 수열을 세어 MAC 연산을 구현하는 MUX-FSM 기반 SC는 기존 스토캐스틱 숫자 생성기 기반 SC 대비 하드웨어 비용을 상당히 줄일 수 있다. 그러나 현재 MUX-FSM 기반 SC는 효율적인 하드웨어 구현과 별개로 여전히 다수의 연산 사이클을 요구하여 온-디바이스 신경망 연산기에 적용되기 어려웠다. 또한, 기존에 제안된 대안은 제각기 절감 효과에 한계가 있거나 모델 변수 변환 비용이 있는 등 한계점이 있었다. 제안하는 방법은 기존 MUX-FSM 기반 SC의 추가 성능 향상을 위한 방법을 제시한다. MUX-FSM 기반 SC의 비트 집계 패턴을 파악하고, 중복 집계를 시프트 연산으로 교체하였다. 이로부터 필요 비트 패턴의 길이를 크게 줄이며, 곱셈 연산 중 최악의 경우의 처리 시간을 이론적으로 2배 이상 향상하는 결과를 얻었다. 실험 결과에서 제안한 개선된 SC 기법이 기존MUX-FSM 기반 SC 대비 평균 처리 시간을 38.8% 줄일 수 있었다.1 INTRODUCTION 1 1.1 Neural network accelerator and its optimizations 1 1.2 Necessity of optimizing computational block of neural network accelerator 5 1.3 Contributions of This Dissertation 7 2 MAC Design Considering Mixed Precision 9 2.1 Motivation 9 2.2 Internal Multiplier Size Determination 14 2.3 Proposed hardware structure 16 2.4 Experiments 21 2.4.1 Implementation of Reference MAC units 23 2.4.2 Area, Wirelength, Power, Energy, and Performance of MAC units for AlexNet 24 2.4.3 Area, Wirelength, Power, Energy, and Performance of MAC units for VGG-16 31 2.4.4 Power Saving by Clock Gating 35 3 Speeding up MUX-FSM based Stochastic Computing Unit Design 37 3.1 Motivations 37 3.1.1 MUX-FSM based SC and previous enhancements 42 3.2 The Proposed MUX-FSM based SC 48 3.2.1 Refined Algorithm for Stochastic Computing 48 3.3 The Supporting Hardware Architecture 55 3.3.1 Bit Counter with shift operation 55 3.3.2 Controller 57 3.3.3 Combining with preceding architectures 58 3.4 Experiments 59 3.4.1 Experiments Setup 59 3.4.2 Generating input bit selection pattern 60 3.4.3 Performance Comparison 61 3.4.4 Hardware Area and Energy Comparison 63 4 CONCLUSIONS 67 4.1 MAC Design Considering Mixed Precision 67 4.2 Speeding up MUX-FSM based Stochastic Computing Unit Design 68 Abstract (In Korean) 73Docto

SNU Open Repository and Archive