15,188 research outputs found
Computing Matrix Trigonometric Functions with GPUs through Matlab
[EN] This paper presents an implementation of one of the most up-to-day algorithms proposed to compute the matrix trigonometric functions sine and cosine. The method used is based on Taylor series approximations which intensively uses matrix multiplications. To accelerate matrix products, our application can use from one to four NVIDIA GPUs by using the NVIDIA cublas and cublasXt libraries. The application, implemented in C++, can be used from the Matlab command line thanks to the mex files provided. We experimentally assess our implementation in modern and very high-performance NVIDIA GPUs.This work has been supported by Spanish Ministerio de Economia y Competitividad and the European Regional Development Fund (ERDF) Grants TIN2014-59294-P and TEC2015-67387-C4-1-RAlonso-Jordá, P.; Peinado Pinilla, J.; Ibáñez González, JJ.; Sastre, J.; Defez Candel, E. (2019). Computing Matrix Trigonometric Functions with GPUs through Matlab. The Journal of Supercomputing. 75(3):1227-1240. https://doi.org/10.1007/s11227-018-2354-1S12271240753Serbin SM (1979) Rational approximations of trigonometric matrices with application to second-order systems of differential equations. Appl Math Comput 5(1):75–92Serbin Steven M, Blalock Sybil A (1980) An algorithm for computing the matrix cosine. SIAM J Sci Stat Comput 1(2):198–204Hargreaves GI, Higham NJ (2005) Efficient algorithms for the matrix cosine and sine. Numer Algorithms 40:383–400Al-Mohy Awad H, Higham Nicholas J (2009) A new scaling and squaring algorithm for the matrix exponential. SIAM J Matrix Anal Appl 31(3):970–989Defez E, Sastre J, Ibáñez Javier J, Ruiz Pedro A (2011) Computing matrix functions arising in engineering models with orthogonal matrix polynomials. Math Comput Model 57:1738–1743Sastre J, Ibáñez J, Ruiz P, Defez E (2013) Efficient computation of the matrix cosine. Appl Math Comput 219:7575–7585Al-Mohy Awad H, Higham Nicholas J, Relton Samuel D (2015) New algorithms for computing the matrix sine and cosine separately or simultaneously. SIAM J Sci Comput 37(1):A456–A487Alonso P, Ibáñez J, Sastre J, Peinado J, Defez E (2017) Efficient and accurate algorithms for computing matrix trigonometric functions. J Comput Appl Math 309(1):325–332CUBLAS library (2017) http://docs.nvidia.com/cuda/cublas/index.html . Accessed May 2017Alonso Jordá P, Boratto M, Peinado Pinilla J, Ibáñez González JJ, Sastre Martínez J (2014) On the evaluation of matrix polynomials using several GPGPUs. Universitat Politècnica de València, 2014. http://hdl.handle.net/10251/39615 . Accessed Sept 2017Boratto Murilo, Alonso Pedro, Giménez Domingo, Lastovetsky Alexey L (2017) Automatic tuning to performance modelling of matrix polynomials on multicore and multi-gpu systems. J Supercomput 73(1):227–239Alonso P, Peinado J, Ibáñez J, Sastre J, Defez E (2017) A fast implementation of matrix trigonometric functions sine and cosine. In: Proceedings of the 17th International Conference on Computational and Mathematical Methods in Science and Engineering (CMMSE 2017), pp 51–55, Costa Ballena, Rota, Cadiz (Spain), July 4th–8thSastre Jorge, Ibáñez Javier, Alonso Pedro, Peinado Jesús, Defez Emilio (2017) Two algorithms for computing the matrix cosine function. Appl Math Comput 312:66–77Paterson Michael S, Stockmeyer Larry J (1973) On the number of nonscalar multiplications necessary to evaluate polynomials. SIAM J Comput 2(1):60–66Higham Nicholas J (2008) Functions of matrices: theory and computation. SIAM, PhiladelphiaSastre J, Ibáñez Javier J, Defez E, Ruiz Pedro A (2011) Efficient orthogonal matrix polynomial based method for computing matrix exponential. Appl Math Comput 217:6451–6463Sastre J, Ibáñez Javier J, Defez E, Ruiz Pedro A (2015) Efficient scaling-squaring Taylor method for computing matrix exponential. SIAM J Sci Comput 37(1):A439–455Higham NJ, Tisseur F (2000) A block algorithm for matrix 1-norm estimation, with an application to 1-norm pseudospectra. SIAM J Matrix Anal Appl 21:1185–1201Demmel JW (1987) A counterexample for two conjectures about stability. IEEE Trans Autom Control 32:340–343Wright Thomas G (2002) EigTool library. http://www.comlab.ox.ac.uk/pseudospectra/eigtool/ . Accessed May 201
Signal Flow Graph Approach to Efficient DST I-IV Algorithms
In this paper, fast and efficient discrete sine transformation (DST)
algorithms are presented based on the factorization of sparse, scaled
orthogonal, rotation, rotation-reflection, and butterfly matrices. These
algorithms are completely recursive and solely based on DST I-IV. The presented
algorithms have low arithmetic cost compared to the known fast DST algorithms.
Furthermore, the language of signal flow graph representation of digital
structures is used to describe these efficient and recursive DST algorithms
having points signal flow graph for DST-I and points signal flow
graphs for DST II-IV
Discrete Cosine Transforms on Quantum Computers
A classical computer does not allow to calculate a discrete cosine transform
on N points in less than linear time. This trivial lower bound is no longer
valid for a computer that takes advantage of quantum mechanical superposition,
entanglement, and interference principles. In fact, we show that it is possible
to realize the discrete cosine transforms and the discrete sine transforms of
size NxN and types I,II,III, and IV with as little as O(log^2 N) operations on
a quantum computer, whereas the known fast algorithms on a classical computer
need O(N log N) operations.Comment: 5 pages, LaTeX 2e, IEEE ISPA01, Pula, Croatia, 200
On the Irresistible Efficiency of Signal Processing Methods in Quantum Computing
We show that many well-known signal transforms allow highly efficient
realizations on a quantum computer. We explain some elementary quantum circuits
and review the construction of the Quantum Fourier Transform. We derive quantum
circuits for the Discrete Cosine and Sine Transforms, and for the Discrete
Hartley transform. We show that at most O(log^2 N) elementary quantum gates are
necessary to implement any of those transforms for input sequences of length N.Comment: 15 pages, LaTeX 2e. Expanded version of quant-ph/0111038. SPECLOG
2000, Tampere, Finlan
Algebraic Signal Processing Theory: Cooley-Tukey Type Algorithms for Polynomial Transforms Based on Induction
A polynomial transform is the multiplication of an input vector x\in\C^n by
a matrix \PT_{b,\alpha}\in\C^{n\times n}, whose -th element is
defined as for polynomials p_\ell(x)\in\C[x] from a list
and sample points \alpha_k\in\C from a list
. Such transforms find applications in
the areas of signal processing, data compression, and function interpolation.
Important examples include the discrete Fourier and cosine transforms. In this
paper we introduce a novel technique to derive fast algorithms for polynomial
transforms. The technique uses the relationship between polynomial transforms
and the representation theory of polynomial algebras. Specifically, we derive
algorithms by decomposing the regular modules of these algebras as a stepwise
induction. As an application, we derive novel general-radix
algorithms for the discrete Fourier transform and the discrete cosine transform
of type 4.Comment: 19 pages. Submitted to SIAM Journal on Matrix Analysis and
Application
Synthesis of Quantum Logic Circuits
We discuss efficient quantum logic circuits which perform two tasks: (i)
implementing generic quantum computations and (ii) initializing quantum
registers. In contrast to conventional computing, the latter task is nontrivial
because the state-space of an n-qubit register is not finite and contains
exponential superpositions of classical bit strings. Our proposed circuits are
asymptotically optimal for respective tasks and improve published results by at
least a factor of two.
The circuits for generic quantum computation constructed by our algorithms
are the most efficient known today in terms of the number of expensive gates
(quantum controlled-NOTs). They are based on an analogue of the Shannon
decomposition of Boolean functions and a new circuit block, quantum
multiplexor, that generalizes several known constructions. A theoretical lower
bound implies that our circuits cannot be improved by more than a factor of
two. We additionally show how to accommodate the severe architectural
limitation of using only nearest-neighbor gates that is representative of
current implementation technologies. This increases the number of gates by
almost an order of magnitude, but preserves the asymptotic optimality of gate
counts.Comment: 18 pages; v5 fixes minor bugs; v4 is a complete rewrite of v3, with
6x more content, a theory of quantum multiplexors and Quantum Shannon
Decomposition. A key result on generic circuit synthesis has been improved to
~23/48*4^n CNOTs for n qubit
A New Algorithm for Computing the Actions of Trigonometric and Hyperbolic Matrix Functions
A new algorithm is derived for computing the actions and
, where is cosine, sinc, sine, hyperbolic cosine, hyperbolic
sinc, or hyperbolic sine function. is an matrix and is
with . denotes any matrix square root of
and it is never required to be computed. The algorithm offers six independent
output options given , , , and a tolerance. For each option, actions
of a pair of trigonometric or hyperbolic matrix functions are simultaneously
computed. The algorithm scales the matrix down by a positive integer ,
approximates by a truncated Taylor series, and finally uses the
recurrences of the Chebyshev polynomials of the first and second kind to
recover . The selection of the scaling parameter and the degree of
Taylor polynomial are based on a forward error analysis and a sequence of the
form in such a way the overall computational cost of the
algorithm is optimized. Shifting is used where applicable as a preprocessing
step to reduce the scaling parameter. The algorithm works for any matrix
and its computational cost is dominated by the formation of products of
with matrices that could take advantage of the implementation of
level-3 BLAS. Our numerical experiments show that the new algorithm behaves in
a forward stable fashion and in most problems outperforms the existing
algorithms in terms of CPU time, computational cost, and accuracy.Comment: 4 figures, 16 page
- …