3 research outputs found
Mixed-Precision Random Projection for RandNLA on Tensor Cores
Random projection can reduce the dimension of data while capturing its
structure and is a fundamental tool for machine learning, signal processing,
and information retrieval, which deal with a large amount of data today.
RandNLA (Randomized Numerical Linear Algebra) leverages random projection to
reduce the computational complexity of low-rank decomposition of tensors and
solve least-square problems. While the computation of the random projection is
a simple matrix multiplication, its asymptotic computational complexity is
typically larger than other operations in a RandNLA algorithm. Therefore,
various studies propose methods for reducing its computational complexity. We
propose a fast mixed-precision random projection method on NVIDIA GPUs using
Tensor Cores for single-precision tensors. We exploit the fact that the random
matrix requires less precision, and develop a highly optimized matrix
multiplication between FP32 and FP16 matrices -- SHGEMM (Single and
Half-precision GEMM) -- on Tensor Cores, where the random matrix is stored in
FP16. Our method can compute Randomized SVD 1.28 times faster and Random
projection high order SVD 1.75 times faster than baseline single-precision
implementations while maintaining accuracy.Comment: PASC'2
Data aware sparse non-negative signal processing
Greedy techniques are a well established framework aiming to reconstruct signals which
are sparse in some domain of representations. They are renowned for their relatively low
computational cost, that makes them appealing from the perspective of real time applications. Within the current work we focus on the explicit case of sparse non–negative
signals that finds applications in several aspects of daily life e.g., food analysis, hazardous materials detection etc. The conventional approach to deploy this type of algorithms does not employ benefits from properties that characterise natural data, such
as lower dimensional representations, underlying structures. Motivated by these properties of data we are aiming to incorporate methodologies within the domain of greedy
techniques that will boost their performance in terms of: 1) computational efficiency
and 2) signal recovery improvement (for the remainder of the thesis we will use the
term acceleration when referring to the first goal and robustness when we are referring
to the second goal). These benefits can be exploited via data aware methodologies that
arise, from the Machine Learning and Deep Learning community.
Within the current work we are aiming to establish a link among conventional
sparse non–negative signal decomposition frameworks that rely on greedy techniques
and data aware methodologies. We have explained the connection among data aware
methodologies and the challenges associated with the sparse non–negative signal decompositions: 1) acceleration and 2) robustness. We have also introduced the standard data
aware methodologies, which are relevant to our problem, and the theoretical properties
they have. The practical implementations of the proposed frameworks are provided
here. The main findings of the current work can be summarised as follows:
• We introduce novel algorithms, theory for the Nearest Neighbor problem.
• We accelerate a greedy algorithm for sparse non–negative signal decomposition
by incorporating our algorithms within its structure.
• We introduce a novel reformulation of greedy techniques from the perspective of
a Deep Neural Network that boosts the robustness of greedy techniques.
• We introduce the theoretical framework that fingerprints the conditions that lay
down the soil for the exact recovery of the signal