132 research outputs found
A Study on the Influence of Caching: Sequences of Dense Linear Algebra Kernels
It is universally known that caching is critical to attain high- performance
implementations: In many situations, data locality (in space and time) plays a
bigger role than optimizing the (number of) arithmetic floating point
operations. In this paper, we show evidence that at least for linear algebra
algorithms, caching is also a crucial factor for accurate performance modeling
and performance prediction.Comment: Submitted to the Ninth International Workshop on Automatic
Performance Tuning (iWAPT2014
Cache-aware Performance Modeling and Prediction for Dense Linear Algebra
Countless applications cast their computational core in terms of dense linear
algebra operations. These operations can usually be implemented by combining
the routines offered by standard linear algebra libraries such as BLAS and
LAPACK, and typically each operation can be obtained in many alternative ways.
Interestingly, identifying the fastest implementation -- without executing it
-- is a challenging task even for experts. An equally challenging task is that
of tuning each routine to performance-optimal configurations. Indeed, the
problem is so difficult that even the default values provided by the libraries
are often considerably suboptimal; as a solution, normally one has to resort to
executing and timing the routines, driven by some form of parameter search. In
this paper, we discuss a methodology to solve both problems: identifying the
best performing algorithm within a family of alternatives, and tuning
algorithmic parameters for maximum performance; in both cases, we do not
execute the algorithms themselves. Instead, our methodology relies on timing
and modeling the computational kernels underlying the algorithms, and on a
technique for tracking the contents of the CPU cache. In general, our
performance predictions allow us to tune dense linear algebra algorithms within
few percents from the best attainable results, thus allowing computational
scientists and code developers alike to efficiently optimize their linear
algebra routines and codes.Comment: Submitted to PMBS1
Application-tailored Linear Algebra Algorithms: A search-based Approach
In this paper, we tackle the problem of automatically generating algorithms
for linear algebra operations by taking advantage of problem-specific
knowledge. In most situations, users possess much more information about the
problem at hand than what current libraries and computing environments accept;
evidence shows that if properly exploited, such information leads to
uncommon/unexpected speedups. We introduce a knowledge-aware linear algebra
compiler that allows users to input matrix equations together with properties
about the operands and the problem itself; for instance, they can specify that
the equation is part of a sequence, and how successive instances are related to
one another. The compiler exploits all this information to guide the generation
of algorithms, to limit the size of the search space, and to avoid redundant
computations. We applied the compiler to equations arising as part of
sensitivity and genome studies; the algorithms produced exhibit, respectively,
100- and 1000-fold speedups
Computing Petaflops over Terabytes of Data: The Case of Genome-Wide Association Studies
In many scientific and engineering applications, one has to solve not one but
a sequence of instances of the same problem. Often times, the problems in the
sequence are linked in a way that allows intermediate results to be reused. A
characteristic example for this class of applications is given by the
Genome-Wide Association Studies (GWAS), a widely spread tool in computational
biology. GWAS entails the solution of up to trillions () of correlated
generalized least-squares problems, posing a daunting challenge: the
performance of petaflops ( floating-point operations) over terabytes
of data.
In this paper, we design an algorithm for performing GWAS on multi-core
architectures. This is accomplished in three steps. First, we show how to
exploit the relation among successive problems, thus reducing the overall
computational complexity. Then, through an analysis of the required data
transfers, we identify how to eliminate any overhead due to input/output
operations. Finally, we study how to decompose computation into tasks to be
distributed among the available cores, to attain high performance and
scalability. With our algorithm, a GWAS that currently requires the use of a
supercomputer may now be performed in matter of hours on a single multi-core
node.
The discussion centers around the methodology to develop the algorithm rather
than the specific application. We believe the paper contributes valuable
guidelines of general applicability for computational scientists on how to
develop and optimize numerical algorithms
Recommended from our members
Efficiently Mapping Linear Algebra to High-Performance Code
Aware of the role that linear algebra plays in scientific applications, we investigate if/how matrix expressions can be efficiently evaluated with current high-level languages. On the one hand, the numerical linear algebra community has put a lot of effort in developing and optimizing a relatively small set of “universally” useful operations. These are packaged in libraries such as BLAS and LAPACK, and serve as building blocks for more complex computa- tions. On the other hand, the linear algebra expressions that arise in many domains are significantly more complex than those building blocks. We refer to the problem of expressing a linear algebra expression in terms of a set of available building blocks as the ”Linear Algebra Mapping Problem” (LAMP). In practice, users have two alternatives to solve a given LAMP: 1) either “manually”, by using C/C++ or FORTRAN in combination with explicit calls to BLAS & LAPACK 2) or “automatically” by using one of the high-level languages (or libraries) with an API that directly captures the expressions. In this presentation, we focus only on the latter. Specifically, we consider 6 languages (or libraries): Matlab, Julia, R, NumPy (Python), Eigen (C++), and Armadillo (C++), and carefully assess how effectively they translate linear algebra expressions to code, i.e., how well they solve LAMPs. We investigate a number of aspects that are critical for the efficient solution of a LAMP. These range from the most basic mapping problem “Given the expression A*B, does the language map it to a call to GEMM?”, to the optimal parenthesization, to the exploitation of properties, to the identification & elimination -if advantageous- of common sub-expressions, and more. Ultimately, the purpose of this study is to exhibit the core challenges related to the effective computation of linear algebra expressions, and to help the development of languages and libraries.Texas Advanced Computing Center (TACC
Extended pipeline for content-based feature engineering in music genre recognition
We present a feature engineering pipeline for the construction of musical
signal characteristics, to be used for the design of a supervised model for
musical genre identification. The key idea is to extend the traditional
two-step process of extraction and classification with additive stand-alone
phases which are no longer organized in a waterfall scheme. The whole system is
realized by traversing backtrack arrows and cycles between various stages. In
order to give a compact and effective representation of the features, the
standard early temporal integration is combined with other selection and
extraction phases: on the one hand, the selection of the most meaningful
characteristics based on information gain, and on the other hand, the inclusion
of the nonlinear correlation between this subset of features, determined by an
autoencoder. The results of the experiments conducted on GTZAN dataset reveal a
noticeable contribution of this methodology towards the model's performance in
classification task.Comment: ICASSP 201
- …