Optimizations to the orthogonal matching pursuit algorithm for sparse basis representations of photometric redshift PDFs

Chan, Christopher

research

Optimizations to the orthogonal matching pursuit algorithm for sparse basis representations of photometric redshift PDFs

Authors: Christopher Chan
Publication date: 1 May 2016
Publisher

Abstract

In this thesis I investigate potential optimizations for the K-SVD algorithm (using Orthogonal Matching Pursuit) to create a sparse basis representation of probability density functions (PDFs), as implemented by NCSA research affiliate Matias Carrasco Kind and Professor Robert J. Brunner. The implementation these scientists engineered is currently being used to compress PDFs of photometric redshifts (i.e., distance estimates) for galaxies by about 90%. This implementation allows end-users to easily reconstruct the original PDF with accuracies better than 98%. As we continue to mine large, photometric sky surveys, photometric redshift PDF storage will need to scale appropriately; thus, meaningful advances in this algorithm's implementation will serve to demonstrably benefit our scientific ability to explore the Universe and to expand our cosmological understanding. However, the existing implementation of the algorithm is limited by run time—an issue that continues to grow more important as the amount of data surveys acquired becomes larger. The existing implementation utilizes SciPy, a scientific computing Python library. This past semester, I have explored this implementation by developing and testing alternative approaches to the core algorithms in C++, beginning with different linear algebra libraries. In my initial tests, I found that limitations in Eigen, a C++ linear algebra library, make it difficult to accurately reproduce both the results and the exaction speeds due to the optimizations that NumPy, the Python numerical library, already has implemented. Next, I pivoted to Armadillo, another C++ linear algebra library, where I discovered that the primary algorithm runs slightly quicker than its Python counterpart. This research is an ongoing project, and I am excited to continue my investigations into hardware assists, specifically in testing the efficiency of GPU-accelerated computation (NVBLAS). Once I have identified an optimization, I look forward to implementing Batch Orthogonal Matching Pursuit, an algorithm more suited for large sets of PDFs over a single dictionary, and, if time permits, an algorithm that can be extended to support two-dimensional PDF representations.Ope