16 research outputs found

    Computing the Lambert W function in arbitrary-precision complex interval arithmetic

    Full text link
    We describe an algorithm to evaluate all the complex branches of the Lambert W function with rigorous error bounds in interval arithmetic, which has been implemented in the Arb library. The classic 1996 paper on the Lambert W function by Corless et al. provides a thorough but partly heuristic numerical analysis which needs to be complemented with some explicit inequalities and practical observations about managing precision and branch cuts.Comment: 16 pages, 4 figure

    On Ziv's rounding test

    Get PDF
    International audienceA very simple test, introduced by Ziv, allows one to determine if an approximation to the value f (x) of an elementary function at a given point x suffices to return the floating-point number nearest f(x). The same test may be used when implementing floating-point operations with input and output operands of different formats, using arithmetic operators tailored for manipulating operands of the same format. That test depends on a "magic constant" e. We show how to choose that constant e to make the test reliable and efficient. Various cases are considered, depending on the availability of an fma instruction, and on the range of f (x)

    Floating-point arithmetic in the Coq system

    Get PDF
    International audienceThe process of proving some mathematical theorems can be greatly reduced by relying on numerically-intensive computations with a certified arithmetic. This article presents a formalization of floating-point arithmetic that makes it possible to efficiently compute inside the proofs of the Coq system. This certified library is a multi-radix and multi-precision implementation free from underflow and overflow. It provides the basic arithmetic operators and a few elementary functions

    FloatX: A C++ Library for Customized Floating-Point Arithmetic

    Full text link
    "© ACM, 2019. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Mathematical Software, {45, 4, (2019)} https://dl.acm.org/doi/10.1145/3368086"[EN] We present FloatX (Float eXtended), a C++ framework to investigate the effect of leveraging customized floating-point formats in numerical applications. FloatX formats are based on binary IEEE 754 with smaller significand and exponent bit counts specified by the user. Among other properties, FloatX facilitates an incremental transformation of the code, relies on hardware-supported floating-point types as back-end to preserve efficiency, and incurs no storage overhead. The article discusses in detail the design principles, programming interface, and datatype casting rules behind FloatX. Furthermore, it demonstrates FloatX's usage and benefits via several case studies from well-known numerical dense linear algebra libraries, such as BLAS and LAPACK; the Ginkgo library for sparse linear systems; and two neural network applications related with image processing and text recognition.This work was supported by the CICYT projects TIN2014-53495-R and TIN2017-82972-R of the MINECO and FEDER, and the EU H2020 project 732631 "OPRECOMP. Open Transprecision Computing."Flegar, G.; Scheidegger, F.; Novakovic, V.; Mariani, G.; Tomás Domínguez, AE.; Malossi, C.; Quintana-Ortí, ES. (2019). FloatX: A C++ Library for Customized Floating-Point Arithmetic. ACM Transactions on Mathematical Software. 45(4):1-23. https://doi.org/10.1145/3368086S123454Edward Anderson Zhaojun Bai L. Susan Blackford James Demmesl Jack J. Dongarra Jeremy Du Croz Sven Hammarling Anne Greenbaum Alan McKenney and Danny C. Sorensen. 1999. LAPACK Users’ Guide (3rd ed.). SIAM. Edward Anderson Zhaojun Bai L. Susan Blackford James Demmesl Jack J. Dongarra Jeremy Du Croz Sven Hammarling Anne Greenbaum Alan McKenney and Danny C. Sorensen. 1999. LAPACK Users’ Guide (3rd ed.). SIAM.Bekas, C., Curioni, A., & Fedulova, I. (2011). Low-cost data uncertainty quantification. Concurrency and Computation: Practice and Experience, 24(8), 908-920. doi:10.1002/cpe.1770Boldo, S., & Melquiond, G. (2008). Emulation of a FMA and Correctly Rounded Sums: Proved Algorithms Using Rounding to Odd. IEEE Transactions on Computers, 57(4), 462-471. doi:10.1109/tc.2007.70819Buttari, A., Dongarra, J., Langou, J., Langou, J., Luszczek, P., & Kurzak, J. (2007). Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems. The International Journal of High Performance Computing Applications, 21(4), 457-466. doi:10.1177/1094342007084026Dongarra, J. J., Du Croz, J., Hammarling, S., & Duff, I. S. (1990). A set of level 3 basic linear algebra subprograms. ACM Transactions on Mathematical Software, 16(1), 1-17. doi:10.1145/77626.79170Figueroa, S. A. (1995). When is double rounding innocuous? ACM SIGNUM Newsletter, 30(3), 21-26. doi:10.1145/221332.221334Fousse, L., Hanrot, G., Lefèvre, V., Pélissier, P., & Zimmermann, P. (2007). MPFR. ACM Transactions on Mathematical Software, 33(2), 13. doi:10.1145/1236463.1236468Mark Gates Piotr Luszczek Ahmad Abdelfattah Jakub Kurzak Jack Dongarra Konstantin Arturov Cris Cecka and Chip Freitag. 2017. C++ API for BLAS and LAPACK. Technical Report 2 ICL-UT-17-03. Mark Gates Piotr Luszczek Ahmad Abdelfattah Jakub Kurzak Jack Dongarra Konstantin Arturov Cris Cecka and Chip Freitag. 2017. C++ API for BLAS and LAPACK. Technical Report 2 ICL-UT-17-03.John Hauser. Accessed March 2019. Berkeley SoftFloat project home page. Retrieved from http://www.jhauser.us/arithmetic/SoftFloat.html. John Hauser. Accessed March 2019. Berkeley SoftFloat project home page. Retrieved from http://www.jhauser.us/arithmetic/SoftFloat.html.Nicholas J. Higham. 2002. Accuracy and Stability of Numerical Algorithms (2nd ed.). Society for Industrial and Applied Mathematics Philadelphia PA. Nicholas J. Higham. 2002. Accuracy and Stability of Numerical Algorithms (2nd ed.). Society for Industrial and Applied Mathematics Philadelphia PA.Parker Hill Babak Zamirai Shengshuo Lu Yu-Wei Chao Michael Laurenzano Mehrzad Samadi Marios Papaefthymiou Scott Mahlke Thomas Wenisch Jia Deng Lingjia Tang and Jason Mars. 2018. Rethinking numerical representations for deep neural networks. arXiv e-prints (Aug 2018). arXiv:1808.02513. Retrieved from https://openreview.net/forum?id&equals;BJ_MGwqlg8noteId&equals;BJ_MGwqlg. Parker Hill Babak Zamirai Shengshuo Lu Yu-Wei Chao Michael Laurenzano Mehrzad Samadi Marios Papaefthymiou Scott Mahlke Thomas Wenisch Jia Deng Lingjia Tang and Jason Mars. 2018. Rethinking numerical representations for deep neural networks. arXiv e-prints (Aug 2018). arXiv:1808.02513. Retrieved from https://openreview.net/forum?id&equals;BJ_MGwqlg8noteId&equals;BJ_MGwqlg.Parker Hill Babak Zamirai Shengshuo Lu Yu-Wei Chao Michael Laurenzano Mehrzad Samadi Marios Papaefthymiou Scott Mahlke Thomas Wenisch Jia Deng etal 2018. Rethinking numerical representations for deep neural networks. 2018. Parker Hill Babak Zamirai Shengshuo Lu Yu-Wei Chao Michael Laurenzano Mehrzad Samadi Marios Papaefthymiou Scott Mahlke Thomas Wenisch Jia Deng et al. 2018. Rethinking numerical representations for deep neural networks. 2018.IBM. 2015. Engineering and Scientific Subroutine Library. Retrieved from http://www-03.ibm.com/systems/power/software/essl/. IBM. 2015. Engineering and Scientific Subroutine Library. Retrieved from http://www-03.ibm.com/systems/power/software/essl/.IEEE. 2008. IEEE Standard for Floating-point Arithmetic. IEEE Std 754-2008 (Aug. 2008) 1--70. DOI:https://doi.org/10.1109/IEEESTD.2008.4610935 IEEE. 2008. IEEE Standard for Floating-point Arithmetic. IEEE Std 754-2008 (Aug. 2008) 1--70. DOI:https://doi.org/10.1109/IEEESTD.2008.4610935Intel. 2015. Math Kernel Library. Retrieved from https://software.intel.com/en-us/intel-mkl. Intel. 2015. Math Kernel Library. Retrieved from https://software.intel.com/en-us/intel-mkl.ISO. 2017. ISO International Standard ISO/IEC 14882:2017(E)—Programming Language C++. Retrieved from https://isocpp.org/std/the-standard. Visited June 2018. ISO. 2017. ISO International Standard ISO/IEC 14882:2017(E)—Programming Language C++. Retrieved from https://isocpp.org/std/the-standard. Visited June 2018.Lefevre, V. (2013). SIPE: Small Integer Plus Exponent. 2013 IEEE 21st Symposium on Computer Arithmetic. doi:10.1109/arith.2013.22Liu, Z., Luo, P., Wang, X., & Tang, X. (2015). Deep Learning Face Attributes in the Wild. 2015 IEEE International Conference on Computer Vision (ICCV). doi:10.1109/iccv.2015.425Érik Martin-Dorel Guillaume Melquiond and Jean-Michel Muller. 2013. Some issues related to double rounding. BIT Num. Math. 53 4 (01 Dec. 2013) 897--924. DOI:https://doi.org/10.1007/s10543-013-0436-2 Érik Martin-Dorel Guillaume Melquiond and Jean-Michel Muller. 2013. Some issues related to double rounding. BIT Num. Math. 53 4 (01 Dec. 2013) 897--924. DOI:https://doi.org/10.1007/s10543-013-0436-2Sparsh Mittal. 2016. A survey of techniques for approximate computing. ACM Comput. Surv. 48 4 Article 62 (Mar. 2016) 33 pages. DOI:https://doi.org/10.1145/2893356 Sparsh Mittal. 2016. A survey of techniques for approximate computing. ACM Comput. Surv. 48 4 Article 62 (Mar. 2016) 33 pages. DOI:https://doi.org/10.1145/2893356NVIDIA. 2016. cuBLAS. Retrieved from https://developer.nvidia.com/cublas. NVIDIA. 2016. cuBLAS. Retrieved from https://developer.nvidia.com/cublas.D. O’Leary. 2006. Matrix factorization for information retrieval. Lecture notes for a course on Advanced Numerical Analysis. University of Maryland. Retrieved from https://www.cs.umd.edu/users/oleary/a600/yahoo.pdf. D. O’Leary. 2006. Matrix factorization for information retrieval. Lecture notes for a course on Advanced Numerical Analysis. University of Maryland. Retrieved from https://www.cs.umd.edu/users/oleary/a600/yahoo.pdf.OpenBLAS. 2015. Retrieved from http://www.openblas.net. OpenBLAS. 2015. Retrieved from http://www.openblas.net.Palmer, T. (2015). Modelling: Build imprecise supercomputers. Nature, 526(7571), 32-33. doi:10.1038/526032aAlec Radford Luke Metz and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. Retrieved from Arxiv Preprint Arxiv:1511.06434 (2015). Alec Radford Luke Metz and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. Retrieved from Arxiv Preprint Arxiv:1511.06434 (2015).Rubio-González, C., Nguyen, C., Nguyen, H. D., Demmel, J., Kahan, W., Sen, K., … Hough, D. (2013). Precimonious. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC ’13. doi:10.1145/2503210.2503296Rump, S. M. (2017). IEEE754 Precision- k base-β Arithmetic Inherited by Precision- m Base-β Arithmetic for k < m. ACM Transactions on Mathematical Software, 43(3), 1-15. doi:10.1145/2785965Rybalkin, V., Wehn, N., Yousefi, M. R., & Stricker, D. (2017). Hardware architecture of Bidirectional Long Short-Term Memory Neural Network for Optical Character Recognition. Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017. doi:10.23919/date.2017.7927210Giuseppe Tagliavini Stefan Mach Davide Rossi Andrea Marongiu and Luca Benini. 2017. A transprecision floating-point platform for ultra-low power computing. Retrieved from Arxiv Preprint Arxiv:1711.10374 (2017). Giuseppe Tagliavini Stefan Mach Davide Rossi Andrea Marongiu and Luca Benini. 2017. A transprecision floating-point platform for ultra-low power computing. Retrieved from Arxiv Preprint Arxiv:1711.10374 (2017).Tobias Thornes. (2016). Can reducing precision improve accuracy in weather and climate models? Weather 71 6 (02 June 2016) 147--150. DOI:https://doi.org/10.1002/wea.2732 Tobias Thornes. (2016). Can reducing precision improve accuracy in weather and climate models? Weather 71 6 (02 June 2016) 147--150. DOI:https://doi.org/10.1002/wea.2732Van Zee, F. G., & van de Geijn, R. A. (2015). BLIS: A Framework for Rapidly Instantiating BLAS Functionality. ACM Transactions on Mathematical Software, 41(3), 1-33. doi:10.1145/2764454Todd L. Veldhuizen. 2003. C++ Templates are Turing Complete. Technical Report. Todd L. Veldhuizen. 2003. C++ Templates are Turing Complete. Technical Report.Qiang Xu Todd Mytkowicz and Nam Sung Kim. 2015. Approximate computing: A survey. IEEE Des. Test 33 (01 2015) 8--22. Qiang Xu Todd Mytkowicz and Nam Sung Kim. 2015. Approximate computing: A survey. IEEE Des. Test 33 (01 2015) 8--22.Ziv, A. (1991). Fast evaluation of elementary mathematical functions with correctly rounded last bit. ACM Transactions on Mathematical Software, 17(3), 410-423. doi:10.1145/114697.11681
    corecore