Solving Weighted Least Squares (WLS) problems on ARM-based architectures

Abstract

TheWeighted Least Squares algorithm (WLS) is applied to numerous optimization problems, but requires the use of high computational resources, especially when complex arithmetic is involved. This work aims to accelerate the resolution of a WLS problem by reducing the computational cost (relaying on BLAS/LAPACK routines) and the computational precision from double to single. As a test case, we design an IIR filter for a Graphic Equalizer, where the numerical errors due to single precision are easily visualized. In addition, given the importance of low power architectures for this kind of implementations, we evaluate the performance, scalability, and energy efficiency of each method on two different processors implementing the ARMv7 architecture, widely used in current mobile devices with power constraints. Results show that the method that exhibits a high theoretical computational cost overcomes in efficiency other methods with lower theoretical cost in architectures of this type.This work started in spring 2016 when Jose A. Belloch was a visiting postdoctoral researcher at Budapest University of Technology and Economics thanks to the European Network COST Action IC1305 inside the program Short Term Scientific Mission with the following reference: COST-SPASM-ECOST-STSM-IC1305-020416-072431. Dr. Jose A. Belloch is supported by GVA contract APOSTD/2016/069. The researchers from Universitat Jaume I are supported by the CICYT projects TIN2014-53495-R of MINECO and FEDER. The authors from the Universitat Politecnica de Valencia are supported by MINECO Projects TEC2015-67387-C4-1-R, PROMETEOII/2014/003 and CAPAP-H5 network TIN2014-53522-REDT. The researcher from UCM is supported by the EU (FEDER) and the Spanish MINECO, under Grants TIN 2015-65277-R and TIN2012-32180. The work of Balazs Bank was supported by the UNKP-16-4-III New National Excellence Program of the Ministry of Human Capacities, Hungary.Belloch Rodríguez, JA.; Bank, B.; Igual Peña, FD.; Quintana Ortí, ES.; Vidal Maciá, AM. (2017). Solving Weighted Least Squares (WLS) problems on ARM-based architectures. Journal of Supercomputing. 73(1):530-542. https://doi.org/10.1007/s11227-016-1910-9S530542731Smith TM, van de Geijn RA, Smelyanskiy M, Hammond JR, Van Zee FG (2014) Anatomy of high-performance many-threaded matrix multiplication. In: 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2014)Burrus CS (2012) Iterative reweighted least squares. OpenStax-CNC document, May 2012, module m45285. http://cnx.org/content/m45285/1.12 . Accessed 2 Nov 2016Khang SW (1972) Best LpL_p L p approximation. Math Comput 26(118):505–508Jackson LB (2008) Frequency-domain Steiglitz-McBride method for least-squares filter design, ARMA modeling, and periodogram smoothing. IEEE Signal Process Lett 15:49–52Bank B (2012) Magnitude-priority filter design for audio applications. In: Proceedings of 132nd132^{{\rm nd}} 132 nd AES Convention, Preprint No. 8591, Budapest, Hungary, May 2012Daubechies I, Devire R, Fornasier M, Gntrk CS (2010) Iteratively reweighted least squares minimization for sparse recovery. Comput Music J 23(2):52–69Rämö J, Välimäki V, Bank B (2014) High-precision parallel graphic equalizer. IEEE/ACM Trans Audio Speech Lange Proc 22(12):1894–1904Perez Gonzales E, Reiss J (2009) Automatic equalization of multi-channel audio using cross-adaptive methods. In: Proceedings of AES 127th Convention, New York, Oct. 2009Rämö J, Välimäki V (2013) Live sound equalization and attenuation with a headset. In: Proceedings of AES 51st International Conference, Helsinki, Finland, Aug. 2013Mäkivirta A, Antsalo P, Karjalainen M, Välimäki V (2003) Modal equalization of loudspeaker-room responses at low frequencies. J Audio Eng Soc 51(5):324–343Holters M, Zölzer U (2006) Graphic equalizer design using higher-order recursive filters. In: Proceedings of International Conference Digital Audio Effects, Montreal, QC, pp 37–40Tassart S (2013) Graphical equalization using interpolated filter banks. J Audio Eng Soc 61(5):263–279Chen Z, Geng GS, Yin FL, Hao J (2014) A pre-distortion based design method for digital audio graphic equalizer. Digital Signal Process 25:296–302Välimäki V, Reiss J (2016) All about audio equalization: solutions and frontiers. Appl Sci 6(5):129–145Belloch JA, Välimäki V (2016) Efficient target-response interpolation for a graphic equalizer. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), March 2016, pp 564–568Belloch JA, Alventosa FJ, Alonso P, Quintana-Ortí ES, Vidal AM (2016) Accelerating multi-channel filtering of audio signal on arm processors. J Supercomput, pp 1–12. doi: 10.1007/s11227-016-1689-8Belloch JA, Gonzalez A, Igual FD, Mayo R, Quintana-Ortí ES (2015)Vectorization of binaural sound virtualization on the ARM cortex-A15 architecture. In: Proceedings of 23rd European Signal Processing Conference, (EUSIPCO), Nize, France, September 2015Mitra G, Johnston B, Rendell A, McCreath E, Zhou J (2013) Use of simd vector operations to accelerate application code performance on low-powered arm and intel platforms. In: IEEE 27th International Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW), May 2013, pp 1107–1116Tomov S, Dongarra J, Baboulin M (2008) Towards dense linear algebra for hybrid gpu accelerated manycore systems. LAPACK Working Note, Tech. Rep. 210, Oct. 2008. http://www.netlib.org/lapack/lawnspdf/lawn210.pdf . Accessed 2 Nov 2016Dongarra JJ, DuCroz J, Hammarling S, Hanson RJ (1985) A proposal for an extended set of fortran basic linear algebra subprograms. ACM Signum Newsletter, New York, pp 2–18Golub GH, Loan CFV (2013) Matrix Comput, 4th edn. The John Hopkins University Press, BaltimoreAlonso P, Badia RM, Labarta J, Barreda M, Dolz MF, Mayo R, Quintana-Ortí ES, Reyes R (2012) Tools for power-energy modelling and analysis of parallel scientific applications. In: 41st International Conference on Parallel Processing—ICPP, 2012, pp 420–42

    Similar works