3 research outputs found

    Mixed-Precision Random Projection for RandNLA on Tensor Cores

    Full text link
    Random projection can reduce the dimension of data while capturing its structure and is a fundamental tool for machine learning, signal processing, and information retrieval, which deal with a large amount of data today. RandNLA (Randomized Numerical Linear Algebra) leverages random projection to reduce the computational complexity of low-rank decomposition of tensors and solve least-square problems. While the computation of the random projection is a simple matrix multiplication, its asymptotic computational complexity is typically larger than other operations in a RandNLA algorithm. Therefore, various studies propose methods for reducing its computational complexity. We propose a fast mixed-precision random projection method on NVIDIA GPUs using Tensor Cores for single-precision tensors. We exploit the fact that the random matrix requires less precision, and develop a highly optimized matrix multiplication between FP32 and FP16 matrices -- SHGEMM (Single and Half-precision GEMM) -- on Tensor Cores, where the random matrix is stored in FP16. Our method can compute Randomized SVD 1.28 times faster and Random projection high order SVD 1.75 times faster than baseline single-precision implementations while maintaining accuracy.Comment: PASC'2

    Data aware sparse non-negative signal processing

    Get PDF
    Greedy techniques are a well established framework aiming to reconstruct signals which are sparse in some domain of representations. They are renowned for their relatively low computational cost, that makes them appealing from the perspective of real time applications. Within the current work we focus on the explicit case of sparse non–negative signals that finds applications in several aspects of daily life e.g., food analysis, hazardous materials detection etc. The conventional approach to deploy this type of algorithms does not employ benefits from properties that characterise natural data, such as lower dimensional representations, underlying structures. Motivated by these properties of data we are aiming to incorporate methodologies within the domain of greedy techniques that will boost their performance in terms of: 1) computational efficiency and 2) signal recovery improvement (for the remainder of the thesis we will use the term acceleration when referring to the first goal and robustness when we are referring to the second goal). These benefits can be exploited via data aware methodologies that arise, from the Machine Learning and Deep Learning community. Within the current work we are aiming to establish a link among conventional sparse non–negative signal decomposition frameworks that rely on greedy techniques and data aware methodologies. We have explained the connection among data aware methodologies and the challenges associated with the sparse non–negative signal decompositions: 1) acceleration and 2) robustness. We have also introduced the standard data aware methodologies, which are relevant to our problem, and the theoretical properties they have. The practical implementations of the proposed frameworks are provided here. The main findings of the current work can be summarised as follows: • We introduce novel algorithms, theory for the Nearest Neighbor problem. • We accelerate a greedy algorithm for sparse non–negative signal decomposition by incorporating our algorithms within its structure. • We introduce a novel reformulation of greedy techniques from the perspective of a Deep Neural Network that boosts the robustness of greedy techniques. • We introduce the theoretical framework that fingerprints the conditions that lay down the soil for the exact recovery of the signal
    corecore