Search CORE

7 research outputs found

GiMMiK - Generating Bespoke Matrix Multiplication Kernels for Accelerators: Application to High-Order Computational Fluid Dynamics

Author: Kelly PHJ
Russell FP
Vincent PE
Witherden FD
Wozniak BD
Publication venue: 'Elsevier BV'
Publication date: 21/12/2015
Field of study

Spiral - Imperial College Digital Repository

Higher-order Count Sketch: Dimensionality Reduction That Retains Efficient Tensor Operations

Author: Anandkumar Animashree
Shi Yang
Publication venue
Publication date: 04/11/2019
Field of study

Sketching is a randomized dimensionality-reduction method that aims to preserve relevant information in large-scale datasets. Count sketch is a simple popular sketch which uses a randomized hash function to achieve compression. In this paper, we propose a novel extension known as Higher-order Count Sketch (HCS). While count sketch uses a single hash function, HCS uses multiple (smaller) hash functions for sketching. HCS reshapes the input (vector) data into a higher-order tensor and employs a tensor product of the random hash functions to compute the sketch. This results in an exponential saving (with respect to the order of the tensor) in the memory requirements of the hash functions, under certain conditions on the input data. Furthermore, when the input data itself has an underlying structure in the form of various tensor representations such as the Tucker decomposition, we obtain significant advantages. We derive efficient (approximate) computation of various tensor operations such as tensor products and tensor contractions directly on the sketched data. Thus, HCS is the first sketch to fully exploit the multi-dimensional nature of higher-order tensors. We apply HCS to tensorized neural networks where we replace fully connected layers with sketched tensor operations. We achieve nearly state of the art accuracy with significant compression on the image classification benchmark

arXiv.org e-Print Archive

Crossref

Caltech Authors

A GEMM interface and implementation on NVIDIA GPUs for multiple small matrices

Author: Anderson
Bientinesi
Chetan Jhurani
Demkowicz
Demkowicz
Deville
Goto
Higham
Karniadakis
Kågström
Nath
Paul Mullowney
Stroustrup
Šolín
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Efficiency in Machine Learning with Focus on Deep Learning and Recommender Systems

Author: Nesky Amy
Publication venue
Publication date: 01/01/2020
Field of study

Machine learning algorithms have opened up countless doors for scientists tackling problems that had previously been inaccessible, and the applications of these algorithms are far from exhausted. However, as the complexity of the learning problem grows, so does the computational and memory cost of the appropriate learning algorithm. As a result, the training process for computationally heavy algorithms can take weeks or even months to reach a good result, which can be prohibitively expensive. The general inefficiencies of machine learning algorithms is a significant bottleneck slowing the progress in application sciences. This thesis introduces three new methods of improving the efficiency of machine learning algorithms focusing on expensive algorithms such as neural networks and recommender systems. The first method discussed makes structured reductions of fully connected layers in neural networks, which causes speedup during training and decreases the amount of storage required. The second method presented is an accelerated gradient descent method called Predictor-Corrector Gradient Descent (PCGD) that combines predictor-corrector techniques with stochastic gradient descent. The final technique introduced generates Artificial Core Users (ACUs) from the Core Users of a recommendation dataset. Core Users condense the number of users in a recommendation dataset without significant loss of information; Artificial Core Users improve the recommendation accuracy of Core Users yet still mimic real user data.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/162928/1/anesky_1.pd

Deep Blue Documents at the University of Michigan

Traitement STAP en environnement hétérogène. Application à la détection radar et implémentation sur GPU

Author: DEGURSE Jean-François
MARCOS Sylvie
Publication venue
Publication date: 01/01/2014
Field of study

Les traitements spatio-temporels adaptatifs (STAP) sont des traitements qui exploitent conjointement les deux dimensions spatiale et temporelle des signaux reçus sur un réseau d'antennes, contrairement au traitement d'antenne classique qui n'exploite que la dimension spatiale, pour leur filtrage. Ces traitements sont particulièrement intéressants dans le cadre du filtrage des échos reçus par un radar aéroporté en provenance du sol pour lesquels il existe un lien direct entre direction d'arrivée et fréquence Doppler. Cependant, si les principes des traitements STAP sont maintenant bien acquis, leur mise en œuvre pratique face à un environnement réel se heurte à des points durs non encore résolus dans le contexte du radar opérationnel. Le premier verrou, adressé par la thèse dans une première phase, est d'ordre théorique, et consiste en la définition de procédures d'estimation de la matrice de covariance du fouillis sur la base d'une sélection des données d'apprentissage représentatives, dans un contexte à la fois de fouillis non homogène et de densité parfois importante des cibles d'intérêts. Le second verrou est d'ordre technologique, et réside dans l'implémentation physique des algorithmes, lié à la grande charge de calcul nécessaire. Ce point, crucial en aéroporté, est exploré par la thèse dans une deuxième phase, avec l'analyse de la faisabilité d'une implémentation sur GPU des étapes les plus lourdes d'un algorithme de traitement STAP.Space-time adaptive processing (STAP) is a processing that makes use of both the spatial and the temporal dimensions of the received signals by an antenna array, whereas conventional antenna processing only exploits the spatial dimension to perform filtering. These processing are very powerful to remove ground echoes received by airborne radars, where there is a direct relation between the arrival angle and the Doppler frequency. However, if the principles of STAP processing are now well understood, their performances are limited when facing practical situations. The first part of this thesis, is theoretical, and consists of defining effective procedures to estimate the covariance matrix of the clutter using a representative selection of training data, in a context of both non-homogeneous clutter and sometimes high density of targets. The second point studied in this thesis is technological, and lies in the physical implementation of the selected algorithms, because of their high computational workload requirement. This is a key point in airborne operations, and is explored by the thesis in a second phase, with the analysis of the feasibility of implementation on GPU of the heaviest stages of a STAP processing.PARIS11-SCD-Bib. électronique (914719901) / SudocSudocFranceF

OpenGrey Repository