Search CORE

132 research outputs found

A Study on the Influence of Caching: Sequences of Dense Linear Algebra Kernels

Author: Bientinesi Paolo
Peise Elmar
Publication venue
Publication date: 01/01/2014
Field of study

It is universally known that caching is critical to attain high- performance implementations: In many situations, data locality (in space and time) plays a bigger role than optimizing the (number of) arithmetic floating point operations. In this paper, we show evidence that at least for linear algebra algorithms, caching is also a crucial factor for accurate performance modeling and performance prediction.Comment: Submitted to the Ninth International Workshop on Automatic Performance Tuning (iWAPT2014

arXiv.org e-Print Archive

Crossref

Publikationsserver der RWTH Aachen University

Cache-aware Performance Modeling and Prediction for Dense Linear Algebra

Author: Bientinesi Paolo
Peise Elmar
Publication venue
Publication date: 01/01/2014
Field of study

Countless applications cast their computational core in terms of dense linear algebra operations. These operations can usually be implemented by combining the routines offered by standard linear algebra libraries such as BLAS and LAPACK, and typically each operation can be obtained in many alternative ways. Interestingly, identifying the fastest implementation -- without executing it -- is a challenging task even for experts. An equally challenging task is that of tuning each routine to performance-optimal configurations. Indeed, the problem is so difficult that even the default values provided by the libraries are often considerably suboptimal; as a solution, normally one has to resort to executing and timing the routines, driven by some form of parameter search. In this paper, we discuss a methodology to solve both problems: identifying the best performing algorithm within a family of alternatives, and tuning algorithmic parameters for maximum performance; in both cases, we do not execute the algorithms themselves. Instead, our methodology relies on timing and modeling the computational kernels underlying the algorithms, and on a technique for tracking the contents of the CPU cache. In general, our performance predictions allow us to tune dense linear algebra algorithms within few percents from the best attainable results, thus allowing computational scientists and code developers alike to efficiently optimize their linear algebra routines and codes.Comment: Submitted to PMBS1

arXiv.org e-Print Archive

Publikationsserver der RWTH Aachen University

Application-tailored Linear Algebra Algorithms: A search-based Approach

Author: Bientinesi Paolo
Fabregat-Traver Diego
Publication venue
Publication date: 26/11/2012
Field of study

In this paper, we tackle the problem of automatically generating algorithms for linear algebra operations by taking advantage of problem-specific knowledge. In most situations, users possess much more information about the problem at hand than what current libraries and computing environments accept; evidence shows that if properly exploited, such information leads to uncommon/unexpected speedups. We introduce a knowledge-aware linear algebra compiler that allows users to input matrix equations together with properties about the operands and the problem itself; for instance, they can specify that the equation is part of a sequence, and how successive instances are related to one another. The compiler exploits all this information to guide the generation of algorithms, to limit the size of the search space, and to avoid redundant computations. We applied the compiler to equations arising as part of sensitivity and genome studies; the algorithms produced exhibit, respectively, 100- and 1000-fold speedups

arXiv.org e-Print Archive

Publikationsserver der RWTH Aachen University

Computing Petaflops over Terabytes of Data: The Case of Genome-Wide Association Studies

Author: Bientinesi Paolo
Fabregat-Traver Diego
Publication venue
Publication date: 01/01/2012
Field of study

In many scientific and engineering applications, one has to solve not one but a sequence of instances of the same problem. Often times, the problems in the sequence are linked in a way that allows intermediate results to be reused. A characteristic example for this class of applications is given by the Genome-Wide Association Studies (GWAS), a widely spread tool in computational biology. GWAS entails the solution of up to trillions (

10^{12}

) of correlated generalized least-squares problems, posing a daunting challenge: the performance of petaflops (

10^{15}

floating-point operations) over terabytes of data. In this paper, we design an algorithm for performing GWAS on multi-core architectures. This is accomplished in three steps. First, we show how to exploit the relation among successive problems, thus reducing the overall computational complexity. Then, through an analysis of the required data transfers, we identify how to eliminate any overhead due to input/output operations. Finally, we study how to decompose computation into tasks to be distributed among the available cores, to attain high performance and scalability. With our algorithm, a GWAS that currently requires the use of a supercomputer may now be performed in matter of hours on a single multi-core node. The discussion centers around the methodology to develop the algorithm rather than the specific application. We believe the paper contributes valuable guidelines of general applicability for computational scientists on how to develop and optimize numerical algorithms

arXiv.org e-Print Archive

Publikationsserver der RWTH Aachen University

Recommended from our members

Efficiently Mapping Linear Algebra to High-Performance Code

Author: Barthels Henrik
Bientinesi Paolo
Psarras Christos
Publication venue
Publication date: 01/01/2019
Field of study

Aware of the role that linear algebra plays in scientific applications, we investigate if/how matrix expressions can be efficiently evaluated with current high-level languages. On the one hand, the numerical linear algebra community has put a lot of effort in developing and optimizing a relatively small set of “universally” useful operations. These are packaged in libraries such as BLAS and LAPACK, and serve as building blocks for more complex computa- tions. On the other hand, the linear algebra expressions that arise in many domains are significantly more complex than those building blocks. We refer to the problem of expressing a linear algebra expression in terms of a set of available building blocks as the ”Linear Algebra Mapping Problem” (LAMP). In practice, users have two alternatives to solve a given LAMP: 1) either “manually”, by using C/C++ or FORTRAN in combination with explicit calls to BLAS & LAPACK 2) or “automatically” by using one of the high-level languages (or libraries) with an API that directly captures the expressions. In this presentation, we focus only on the latter. Specifically, we consider 6 languages (or libraries): Matlab, Julia, R, NumPy (Python), Eigen (C++), and Armadillo (C++), and carefully assess how effectively they translate linear algebra expressions to code, i.e., how well they solve LAMPs. We investigate a number of aspects that are critical for the efficient solution of a LAMP. These range from the most basic mapping problem “Given the expression A*B, does the language map it to a call to GEMM?”, to the optimal parenthesization, to the exploitation of properties, to the identification & elimination -if advantageous- of common sub-expressions, and more. Ultimately, the purpose of this study is to exhibit the core challenges related to the effective computation of linear algebra expressions, and to help the development of languages and libraries.Texas Advanced Computing Center (TACC

Texas ScholarWorks

Extended pipeline for content-based feature engineering in music genre recognition

Author: Bientinesi Paolo
Raissi Tina
Tibo Alessandro
Publication venue
Publication date: 01/01/2018
Field of study

We present a feature engineering pipeline for the construction of musical signal characteristics, to be used for the design of a supervised model for musical genre identification. The key idea is to extend the traditional two-step process of extraction and classification with additive stand-alone phases which are no longer organized in a waterfall scheme. The whole system is realized by traversing backtrack arrows and cycles between various stages. In order to give a compact and effective representation of the features, the standard early temporal integration is combined with other selection and extraction phases: on the one hand, the selection of the most meaningful characteristics based on information gain, and on the other hand, the inclusion of the nonlinear correlation between this subset of features, determined by an autoencoder. The results of the experiments conducted on GTZAN dataset reveal a noticeable contribution of this methodology towards the model's performance in classification task.Comment: ICASSP 201

arXiv.org e-Print Archive

Crossref

Publikationsserver der RWTH Aachen University