32 research outputs found

    High-Level Synthesis Hardware Design for FPGA-Based Accelerators: Models, Methodologies, and Frameworks

    Get PDF
    Hardware accelerators based on field programmable gate array (FPGA) and system on chip (SoC) devices have gained attention in recent years. One of the main reasons is that these devices contain reconfigurable logic, which makes them feasible for boosting the performance of applications. High-level synthesis (HLS) tools facilitate the creation of FPGA code from a high level of abstraction using different directives to obtain an optimized hardware design based on performance metrics. However, the complexity of the design space depends on different factors such as the number of directives used in the source code, the available resources in the device, and the clock frequency. Design space exploration (DSE) techniques comprise the evaluation of multiple implementations with different combinations of directives to obtain a design with a good compromise between different metrics. This paper presents a survey of models, methodologies, and frameworks proposed for metric estimation, FPGA-based DSE, and power consumption estimation on FPGA/SoC. The main features, limitations, and trade-offs of these approaches are described. We also present the integration of existing models and frameworks in diverse research areas and identify the different challenges to be addressed

    Information geometric measurements of generalisation

    Get PDF
    Neural networks can be regarded as statistical models, and can be analysed in a Bayesian framework. Generalisation is measured by the performance on independent test data drawn from the same distribution as the training data. Such performance can be quantified by the posterior average of the information divergence between the true and the model distributions. Averaging over the Bayesian posterior guarantees internal coherence; Using information divergence guarantees invariance with respect to representation. The theory generalises the least mean squares theory for linear Gaussian models to general problems of statistical estimation. The main results are: (1)~the ideal optimal estimate is always given by average over the posterior; (2)~the optimal estimate within a computational model is given by the projection of the ideal estimate to the model. This incidentally shows some currently popular methods dealing with hyperpriors are in general unnecessary and misleading. The extension of information divergence to positive normalisable measures reveals a remarkable relation between the dlt dual affine geometry of statistical manifolds and the geometry of the dual pair of Banach spaces Ld and Ldd. It therefore offers conceptual simplification to information geometry. The general conclusion on the issue of evaluating neural network learning rules and other statistical inference methods is that such evaluations are only meaningful under three assumptions: The prior P(p), describing the environment of all the problems; the divergence Dd, specifying the requirement of the task; and the model Q, specifying available computing resources

    Simulation of the UKQCD computer

    Get PDF

    Entanglement-enhanced dual-comb spectroscopy

    Full text link
    Dual-comb interferometry harnesses the interference of two laser frequency comb to provide unprecedented capability in spectroscopy applications. In the past decade, the state-of-the-art systems have reached a point where the signal-to-noise ratio per unit acquisition time is fundamentally limited by shot noise from vacuum fluctuations. To address the issue, we propose an entanglement-enhanced dual comb spectroscopy protocol that leverages quantum resources to significantly improve the signal-to-noise ratio performance. To analyze the performance of real systems, we develop a quantum model of dual-comb spectroscopy that takes practical noises into consideration. Based on this model, we propose quantum combs with side-band entanglement around each comb lines to suppress the shot noise in heterodyne detection. Our results show significant quantum advantages in the uW to mW power range, making this technique particularly attractive for biological and chemical sensing applications. Furthermore, the quantum comb can be engineered using nonlinear optics and promises near-term experimentation.Comment: 10+2 pages, 5 figures; typos corrected in v

    High-Throughput Deep Convolutional Neural Networks on Fully Homomorphic Encryption Using Channel-By-Channel Packing

    Get PDF
    Secure Machine Learning as a Service is a viable solution where clients seek secure delegation of the ML computation while protecting their sensitive data. We propose an efficient method to securely evaluate deep standard convolutional neural networks based on CKKS fully homomorphic encryption, in the manner of batch inference. In this paper, we introduce a packing method called Channel-by-Channel Packing that maximizes the slot compactness and single-instruction-multipledata capabilities in ciphertexts. Along with further optimizations such as lazy rescaling, lazy Baby-Step Giant-Step, and ciphertext level management, we could significantly reduce the computational cost of standard ResNet inference. Simulation results show that our work has improvements in amortized time by 5.04× (from 79.46s to 15.76s) and 5.20×(from 455.56s to 87.60s) for ResNet-20 and ResNet-110, compared to the previous best results, resp. We also got a dramatic reduction in memory usage for rotation keys from several hundred GBs to 6.91GB, which is about 38× smaller than the previous result

    Optimal Prediction for Prefetching in the Worst Case

    Get PDF
    This is the published version. Copyright © 1998 Society for Industrial and Applied MathematicsResponse time delays caused by I/O are a major problem in many systems and database applications. Prefetching and cache replacement methods are attracting renewed attention because of their success in avoiding costly I/Os. Prefetching can be looked upon as a type of online sequential prediction, where the predictions must be accurate as well as made in a computationally efficient way. Unlike other online problems, prefetching cannot admit a competitive analysis, since the optimal offline prefetcher incurs no cost when it knows the future page requests. Previous analytical work on prefetching [. Vitter Krishnan 1991.] [J. Assoc. Comput. Mach., 143 (1996), pp. 771--793] consisted of modeling the user as a probabilistic Markov source. In this paper, we look at the much stronger form of worst-case analysis and derive a randomized algorithm for pure prefetching. We compare our algorithm for every page request sequence with the important class of finite state prefetchers, making no assumptions as to how the sequence of page requests is generated. We prove analytically that the fault rate of our online prefetching algorithm converges almost surely for every page request sequence to the fault rate of the optimal finite state prefetcher for the sequence. This analysis model can be looked upon as a generalization of the competitive framework, in that it compares an online algorithm in a worst-case manner over all sequences with a powerful yet nonclairvoyant opponent. We simultaneously achieve the computational goal of implementing our prefetcher in optimal constant expected time per prefetched page using the optimal dynamic discrete random variate generator of [. Matias Matias, Vitter, and Ni [Proc. 4th Annual SIAM/ACM Symposium on Discrete Algorithms, Austin, TX, January 1993]

    Optimal Prediction for Prefetching in the Worst Case

    Get PDF
    AMS subject classi cations. 68Q25, 68T05, 68P20, 68N25, 60J20 PII. S0097539794261817Response time delays caused by I/O are a major problem in many systems and database applications. Prefetching and cache replacement methods are attracting renewed attention because of their success in avoiding costly I/Os. Prefetching can be looked upon as a type of online sequential prediction, where the predictions must be accurate as well as made in a computationally e cient way. Unlike other online problems, prefetching cannot admit a competitive analysis, since the optimal o ine prefetcher incurs no cost when it knows the future page requests. Previous analytical work on prefetching [J. Assoc. Comput. Mach., 143 (1996), pp. 771{793] consisted of modeling the user as a probabilistic Markov source. In this paper, we look at the much stronger form of worst-case analysis and derive a randomized algorithm for pure prefetching. We compare our algorithm for every page request sequence with the important class of nite state prefetchers, making no assumptions as to how the sequence of page requests is generated. We prove analytically that the fault rate of our online prefetching algorithm converges almost surely for every page request sequence to the fault rate of the optimal nite state prefetcher for the sequence. This analysis model can be looked upon as a generalization of the com- petitive framework, in that it compares an online algorithm in a worst-case manner over all sequences with a powerful yet nonclairvoyant opponent. We simultaneously achieve the computational goal of implementing our prefetcher in optimal constant expected time per prefetched page using the optimal dynamic discrete random variate generator of Matias, Vitter, and Ni [Proc. 4th Annual SIAM/ACM Symposium on Discrete Algorithms, Austin, TX, January 1993]

    REED: Chiplet-Based Scalable Hardware Accelerator for Fully Homomorphic Encryption

    Get PDF
    Fully Homomorphic Encryption (FHE) has emerged as a promising technology for processing encrypted data without the need for decryption. Despite its potential, its practical implementation has faced challenges due to substantial computational overhead. To address this issue, we propose the firstfirst chiplet-based FHE accelerator design `REED\u27, which enables scalability and offers high throughput, thereby enhancing homomorphic encryption deployment in real-world scenarios. It incorporates well-known wafer yield issues during fabrication which significantly impacts production costs. In contrast to state-of-the-art approaches, we also address data exchange overhead by proposing a non-blocking inter-chiplet communication strategy. We incorporate novel pipelined Number Theoretic Transform and automorphism techniques, leveraging parallelism and providing high throughput. Experimental results demonstrate that REED 2.5D integrated circuit consumes 177 mm2^2 chip area, 82.5 W average power in 7nm technology, and achieves an impressive speedup of up to 5,982×\times compared to a CPU (24-core 2×\timesIntel X5690), and 2×\times better energy efficiency and 50\% lower development cost than state-of-the-art ASIC accelerator. To evaluate its practical impact, we are the firstfirst to benchmark an encrypted deep neural network training. Overall, this work successfully enhances the practicality and deployability of fully homomorphic encryption in real-world scenarios
    corecore