321 research outputs found

    Accelerating Extreme Learning Machine on FPGA by Hardware Implementation of Given Rotation - QRD

    Get PDF
    Currently, Extreme Learning Machine (ELM) is one of the research trends in the machine learning field due to its remarkable performances in terms of complexity and computational speed. However, the big data era and the limitations of general-purpose processor cause the increasing of interest in hardware implementation of ELM in order to reduce the computational time. Hence, this work presents the hardware-software co-design of ELM to improve the overall performances. In the co-design paradigm, one of the important components of ELM, namely Given Rotation-QRD (GR-QRD) is developed as a hardware core. Field Programmable Gate Array (FPGA) is chosen as the platform for ELM implementation due to its reconfigurable capability and high parallelism. Moreover, the learning accuracy and computational time would be used to evaluate the performances of the proposed ELM design. Our experiment has shown that GR-QRD accelerator helps to reduce the computational time of ELM training by 41.75% while maintaining the same training accuracy in comparison to pure software of ELM

    Matrix Diagonalization as a Board Game: Teaching an Eigensolver the Fastest Path to Solution

    Full text link
    Matrix diagonalization is at the cornerstone of numerous fields of scientific computing. Diagonalizing a matrix to solve an eigenvalue problem requires a sequential path of iterations that eventually reaches a sufficiently converged and accurate solution for all the eigenvalues and eigenvectors. This typically translates into a high computational cost. Here we demonstrate how reinforcement learning, using the AlphaZero framework, can accelerate Jacobi matrix diagonalizations by viewing the selection of the fastest path to solution as a board game. To demonstrate the viability of our approach we apply the Jacobi diagonalization algorithm to symmetric Hamiltonian matrices that appear in quantum chemistry calculations. We find that a significant acceleration can often be achieved. Our findings highlight the opportunity to use machine learning as a promising tool to improve the performance of numerical linear algebra.Comment: 14 page

    The Acceleration of Polynomial Methods for Blind Image Deconvolution Using Graphical Processing Units (GPUs)

    Get PDF
    Image processing has become an integral part of many areas of study. Unfortunately, the process of capturing images can often result in undesirable blurring and noise, and thus can make processing the resulting images problematic. Methods are therefore required that attempt to remove blurring. The main body of work in this field is in Bayesian methods for image deblurring, with many algorithms aimed at solving this problem relying on the Fourier transform. The Fourier transform results in the amplification of noise in the image, which can lead to many of the same problems as blurring. Winkler presented a method of blind image deconvolution (BID) without the Fourier transform, which treated the rows and columns of the blurred image as the coefficients of univariate polynomials. By treating the rows and columns of the image in this way, the problem of computing the blurring function becomes a problem of computing the greatest common divisor (GCD) of these polynomials. The computation of the GCD of two polynomials is ill posed, as any noise in the polynomials causes them to be coprime. Thus an approximate GCD (AGCD) must be computed instead. The computation of an AGCD is a computationally expensive process, resulting in the BID algorithm being expensive. The research presented in this thesis investigates the fundamental mathematical processes underpinning such an algorithm, and presents multiple methods through which this algorithm can be accelerated using a GPU. This acceleration results in an implementation that is 30 times faster than a CPU parallel approach. The process of accelerating the BID algorithm in this way required a first of its kind GPU accelerated algorithm for the computation of an AGCD, with multiple novel techniques utilised to achieve this acceleration

    Mixed-Precision Quantization and Parallel Implementation of Multispectral Riemannian Classification for Brain--Machine Interfaces

    Full text link
    With Motor-Imagery (MI) Brain--Machine Interfaces (BMIs) we may control machines by merely thinking of performing a motor action. Practical use cases require a wearable solution where the classification of the brain signals is done locally near the sensor using machine learning models embedded on energy-efficient microcontroller units (MCUs), for assured privacy, user comfort, and long-term usage. In this work, we provide practical insights on the accuracy-cost tradeoff for embedded BMI solutions. Our proposed Multispectral Riemannian Classifier reaches 75.1% accuracy on 4-class MI task. We further scale down the model by quantizing it to mixed-precision representations with a minimal accuracy loss of 1%, which is still 3.2% more accurate than the state-of-the-art embedded convolutional neural network. We implement the model on a low-power MCU with parallel processing units taking only 33.39ms and consuming 1.304mJ per classification

    Towards the First Practical Applications of Quantum Computers

    Full text link
    Noisy intermediate-scale quantum (NISQ) computers are coming online. The lack of error-correction in these devices prevents them from realizing the full potential of fault-tolerant quantum computation, a technology that is known to have significant practical applications, but which is years, if not decades, away. A major open question is whether NISQ devices will have practical applications. In this thesis, we explore and implement proposals for using NISQ devices to achieve practical applications. In particular, we develop and execute variational quantum algorithms for solving problems in combinatorial optimization and quantum chemistry. We also execute a prototype of a protocol for generating certified random numbers. We perform our experiments on a superconducting qubit processor developed at Google. While we do not perform any quantum computations that are beyond the capabilities of classical computers, we address many implementation challenges that must be overcome to succeed in such an endeavor, including optimization, efficient compilation, and error mitigation. In addressing these challenges, we push the limits of what can currently be done with NISQ technology, going beyond previous quantum computing demonstrations in terms of the scale of our experiments and the types of problems we tackle. While our experiments demonstrate progress in the utilization of quantum computers, the limits that we reached underscore the fundamental challenges in scaling up towards the classically intractable regime. Nevertheless, our results are a promising indication that NISQ devices may indeed deliver practical applications.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163016/1/kevjsung_1.pd
    corecore