15,188 research outputs found

    Computing Matrix Trigonometric Functions with GPUs through Matlab

    Full text link
    [EN] This paper presents an implementation of one of the most up-to-day algorithms proposed to compute the matrix trigonometric functions sine and cosine. The method used is based on Taylor series approximations which intensively uses matrix multiplications. To accelerate matrix products, our application can use from one to four NVIDIA GPUs by using the NVIDIA cublas and cublasXt libraries. The application, implemented in C++, can be used from the Matlab command line thanks to the mex files provided. We experimentally assess our implementation in modern and very high-performance NVIDIA GPUs.This work has been supported by Spanish Ministerio de Economia y Competitividad and the European Regional Development Fund (ERDF) Grants TIN2014-59294-P and TEC2015-67387-C4-1-RAlonso-Jordá, P.; Peinado Pinilla, J.; Ibáñez González, JJ.; Sastre, J.; Defez Candel, E. (2019). Computing Matrix Trigonometric Functions with GPUs through Matlab. The Journal of Supercomputing. 75(3):1227-1240. https://doi.org/10.1007/s11227-018-2354-1S12271240753Serbin SM (1979) Rational approximations of trigonometric matrices with application to second-order systems of differential equations. Appl Math Comput 5(1):75–92Serbin Steven M, Blalock Sybil A (1980) An algorithm for computing the matrix cosine. SIAM J Sci Stat Comput 1(2):198–204Hargreaves GI, Higham NJ (2005) Efficient algorithms for the matrix cosine and sine. Numer Algorithms 40:383–400Al-Mohy Awad H, Higham Nicholas J (2009) A new scaling and squaring algorithm for the matrix exponential. SIAM J Matrix Anal Appl 31(3):970–989Defez E, Sastre J, Ibáñez Javier J, Ruiz Pedro A (2011) Computing matrix functions arising in engineering models with orthogonal matrix polynomials. Math Comput Model 57:1738–1743Sastre J, Ibáñez J, Ruiz P, Defez E (2013) Efficient computation of the matrix cosine. Appl Math Comput 219:7575–7585Al-Mohy Awad H, Higham Nicholas J, Relton Samuel D (2015) New algorithms for computing the matrix sine and cosine separately or simultaneously. SIAM J Sci Comput 37(1):A456–A487Alonso P, Ibáñez J, Sastre J, Peinado J, Defez E (2017) Efficient and accurate algorithms for computing matrix trigonometric functions. J Comput Appl Math 309(1):325–332CUBLAS library (2017) http://docs.nvidia.com/cuda/cublas/index.html . Accessed May 2017Alonso Jordá P, Boratto M, Peinado Pinilla J, Ibáñez González JJ, Sastre Martínez J (2014) On the evaluation of matrix polynomials using several GPGPUs. Universitat Politècnica de València, 2014. http://hdl.handle.net/10251/39615 . Accessed Sept 2017Boratto Murilo, Alonso Pedro, Giménez Domingo, Lastovetsky Alexey L (2017) Automatic tuning to performance modelling of matrix polynomials on multicore and multi-gpu systems. J Supercomput 73(1):227–239Alonso P, Peinado J, Ibáñez J, Sastre J, Defez E (2017) A fast implementation of matrix trigonometric functions sine and cosine. In: Proceedings of the 17th International Conference on Computational and Mathematical Methods in Science and Engineering (CMMSE 2017), pp 51–55, Costa Ballena, Rota, Cadiz (Spain), July 4th–8thSastre Jorge, Ibáñez Javier, Alonso Pedro, Peinado Jesús, Defez Emilio (2017) Two algorithms for computing the matrix cosine function. Appl Math Comput 312:66–77Paterson Michael S, Stockmeyer Larry J (1973) On the number of nonscalar multiplications necessary to evaluate polynomials. SIAM J Comput 2(1):60–66Higham Nicholas J (2008) Functions of matrices: theory and computation. SIAM, PhiladelphiaSastre J, Ibáñez Javier J, Defez E, Ruiz Pedro A (2011) Efficient orthogonal matrix polynomial based method for computing matrix exponential. Appl Math Comput 217:6451–6463Sastre J, Ibáñez Javier J, Defez E, Ruiz Pedro A (2015) Efficient scaling-squaring Taylor method for computing matrix exponential. SIAM J Sci Comput 37(1):A439–455Higham NJ, Tisseur F (2000) A block algorithm for matrix 1-norm estimation, with an application to 1-norm pseudospectra. SIAM J Matrix Anal Appl 21:1185–1201Demmel JW (1987) A counterexample for two conjectures about stability. IEEE Trans Autom Control 32:340–343Wright Thomas G (2002) EigTool library. http://www.comlab.ox.ac.uk/pseudospectra/eigtool/ . Accessed May 201

    Signal Flow Graph Approach to Efficient DST I-IV Algorithms

    Get PDF
    In this paper, fast and efficient discrete sine transformation (DST) algorithms are presented based on the factorization of sparse, scaled orthogonal, rotation, rotation-reflection, and butterfly matrices. These algorithms are completely recursive and solely based on DST I-IV. The presented algorithms have low arithmetic cost compared to the known fast DST algorithms. Furthermore, the language of signal flow graph representation of digital structures is used to describe these efficient and recursive DST algorithms having (n1)(n-1) points signal flow graph for DST-I and nn points signal flow graphs for DST II-IV

    Discrete Cosine Transforms on Quantum Computers

    Get PDF
    A classical computer does not allow to calculate a discrete cosine transform on N points in less than linear time. This trivial lower bound is no longer valid for a computer that takes advantage of quantum mechanical superposition, entanglement, and interference principles. In fact, we show that it is possible to realize the discrete cosine transforms and the discrete sine transforms of size NxN and types I,II,III, and IV with as little as O(log^2 N) operations on a quantum computer, whereas the known fast algorithms on a classical computer need O(N log N) operations.Comment: 5 pages, LaTeX 2e, IEEE ISPA01, Pula, Croatia, 200

    On the Irresistible Efficiency of Signal Processing Methods in Quantum Computing

    Get PDF
    We show that many well-known signal transforms allow highly efficient realizations on a quantum computer. We explain some elementary quantum circuits and review the construction of the Quantum Fourier Transform. We derive quantum circuits for the Discrete Cosine and Sine Transforms, and for the Discrete Hartley transform. We show that at most O(log^2 N) elementary quantum gates are necessary to implement any of those transforms for input sequences of length N.Comment: 15 pages, LaTeX 2e. Expanded version of quant-ph/0111038. SPECLOG 2000, Tampere, Finlan

    Algebraic Signal Processing Theory: Cooley-Tukey Type Algorithms for Polynomial Transforms Based on Induction

    Full text link
    A polynomial transform is the multiplication of an input vector x\in\C^n by a matrix \PT_{b,\alpha}\in\C^{n\times n}, whose (k,)(k,\ell)-th element is defined as p(αk)p_\ell(\alpha_k) for polynomials p_\ell(x)\in\C[x] from a list b={p0(x),,pn1(x)}b=\{p_0(x),\dots,p_{n-1}(x)\} and sample points \alpha_k\in\C from a list α={α0,,αn1}\alpha=\{\alpha_0,\dots,\alpha_{n-1}\}. Such transforms find applications in the areas of signal processing, data compression, and function interpolation. Important examples include the discrete Fourier and cosine transforms. In this paper we introduce a novel technique to derive fast algorithms for polynomial transforms. The technique uses the relationship between polynomial transforms and the representation theory of polynomial algebras. Specifically, we derive algorithms by decomposing the regular modules of these algebras as a stepwise induction. As an application, we derive novel O(nlogn)O(n\log{n}) general-radix algorithms for the discrete Fourier transform and the discrete cosine transform of type 4.Comment: 19 pages. Submitted to SIAM Journal on Matrix Analysis and Application

    Synthesis of Quantum Logic Circuits

    Full text link
    We discuss efficient quantum logic circuits which perform two tasks: (i) implementing generic quantum computations and (ii) initializing quantum registers. In contrast to conventional computing, the latter task is nontrivial because the state-space of an n-qubit register is not finite and contains exponential superpositions of classical bit strings. Our proposed circuits are asymptotically optimal for respective tasks and improve published results by at least a factor of two. The circuits for generic quantum computation constructed by our algorithms are the most efficient known today in terms of the number of expensive gates (quantum controlled-NOTs). They are based on an analogue of the Shannon decomposition of Boolean functions and a new circuit block, quantum multiplexor, that generalizes several known constructions. A theoretical lower bound implies that our circuits cannot be improved by more than a factor of two. We additionally show how to accommodate the severe architectural limitation of using only nearest-neighbor gates that is representative of current implementation technologies. This increases the number of gates by almost an order of magnitude, but preserves the asymptotic optimality of gate counts.Comment: 18 pages; v5 fixes minor bugs; v4 is a complete rewrite of v3, with 6x more content, a theory of quantum multiplexors and Quantum Shannon Decomposition. A key result on generic circuit synthesis has been improved to ~23/48*4^n CNOTs for n qubit

    A New Algorithm for Computing the Actions of Trigonometric and Hyperbolic Matrix Functions

    Full text link
    A new algorithm is derived for computing the actions f(tA)Bf(tA)B and f(tA1/2)Bf(tA^{1/2})B, where ff is cosine, sinc, sine, hyperbolic cosine, hyperbolic sinc, or hyperbolic sine function. AA is an n×nn\times n matrix and BB is n×n0n\times n_0 with n0nn_0 \ll n. A1/2A^{1/2} denotes any matrix square root of AA and it is never required to be computed. The algorithm offers six independent output options given tt, AA, BB, and a tolerance. For each option, actions of a pair of trigonometric or hyperbolic matrix functions are simultaneously computed. The algorithm scales the matrix AA down by a positive integer ss, approximates f(s1tA)Bf(s^{-1}tA)B by a truncated Taylor series, and finally uses the recurrences of the Chebyshev polynomials of the first and second kind to recover f(tA)Bf(tA)B. The selection of the scaling parameter and the degree of Taylor polynomial are based on a forward error analysis and a sequence of the form Ak1/k\|A^k\|^{1/k} in such a way the overall computational cost of the algorithm is optimized. Shifting is used where applicable as a preprocessing step to reduce the scaling parameter. The algorithm works for any matrix AA and its computational cost is dominated by the formation of products of AA with n×n0n\times n_0 matrices that could take advantage of the implementation of level-3 BLAS. Our numerical experiments show that the new algorithm behaves in a forward stable fashion and in most problems outperforms the existing algorithms in terms of CPU time, computational cost, and accuracy.Comment: 4 figures, 16 page
    corecore