411 research outputs found
Descomposiciones ortogonales para el cálculo del rango numérico matricial
El cálculo del rango numérico matricial surge en numerosas aplicaciones de la ciencia y de la ingeniería. Actualmente existen tres aproximaciones numéricas básicas para efectuar este cálculo: la descomposición SVD, la descomposición URV y las descomposiciones QE reveladoras de rango (QR1IH).
En este trabajo se analizan experimentalmente varios algoritmos secuenciales, basados en las tres aproximaciones anteriores para el cálculo del rango numérico matricial. Así, en el estudio comparativo experimental se emplea una implemeutación propia para el cálculo de la descomposición URV y dos nuevas rutinas para el cálculo de la descomposición QRRR. Además se utilizan las rutinas de la librería LAPACK para el cálculo de la descomposición SVD y la descomposición QR con pivotamiento de columnas.
Los resultados experimentales muestran que la descomposición QEUR es en la práctica tan fiable como las costosas descomposiciones SVD y URV. Además, estas descomposiciones QRRR presentan la ventaja fundamental de su bajo coste computacional.Peer Reviewe
Efficient Numerical Algorithms for Balanced Stochastic Truncation
We propose an efficient numerical algorithm for relative error model reduction based on balanced stochastic truncation. The method uses full-rank factors of the Gramians to be balanced versus each other and exploits the fact that for large-scale systems these Gramians are often of low numerical rank. We use the easy-to-parallelize sign function method as the major computational tool in determining these full-rank factors and demonstrate the numerical performance of the suggested implementation of balanced stochastic truncation model reduction
Blocked algorithms for the reduction to Hessenberg-triangular form revisited
We present two variants of Moler and Stewart's algorithm for reducing a matrix pair to Hessenberg-triangular (HT) form with increased data locality in the access to the matrices. In one of these variants, a careful reorganization and accumulation of Givens rotations enables the use of efficient level 3 BLAS. Experimental results on four different architectures, representative of current high performance processors, compare the performances of the new variants with those of the implementation of Moler and Stewart's algorithm in subroutine DGGHRD from LAPACK, Dackland and Kågström's two-stage algorithm for the HT form, and a modified version of the latter which requires considerably less flop
Applying OOC Techniques in the Reduction to Condensed Form for Very Large Symmetric Eigenproblems on GPUs
In this paper we address the reduction of a dense matrix to tridiagonal form for the solution of symmetric eigenvalue
problems on a graphics processor (GPU) when the data is
too large to fit into the accelerator memory. We apply out-of-core techniques to a three-stage algorithm, carefully redesigning the first stage to reduce the number of data transfers between the CPU and GPU memory spaces, maintain the memory requirements on the GPU within limits, and ensure high performance by featuring a high ratio between computation and communication
A kernel regression procedure in the 3D shape space with an application to online sales of children's wear
Shape regression is of key importance in many scienti c elds. In this paper,
we focus on the case where the shape of an object is represented by a con-
guration matrix of landmarks. It is well known that this shape space has
a nite-dimensional Riemannian manifold structure (non-Euclidean) which
makes it di cult to work with. Papers about regression on this space are
scarce in the literature. The majority of them are restricted to the case of a
single explanatory variable, usually time or age, and many of them work in
the approximated tangent space. In this paper we adapt the general method
for kernel regression analysis in manifold-valued data proposed by Davis et al
(2007) to the three-dimensional case of Kendall's shape space and generalize
it to multiple explanatory variables. We also propose bootstrap con dence
intervals for prediction. A simulation study is carried out to check the goodness
of the procedure, and nally it is applied to a 3D database obtained from
an anthropometric survey of the Spanish child population with a potential
application to online sales of children's wear
Relación entre el método de evaluación del trabajo y el nivel de aprendizaje de los estudiantes
El objetivo del presente trabajo es presentar dos métodos
para evaluar el trabajo realizado por los estudiantes
fuera del aula y comparar el nivel de aprendizaje adquirido
en cada uno de ellos. El primero se fundamenta
en la evaluación entre compañeros, mientras que el
segundo combina la autoevaluación y la realización de
una prueba objetiva. En ambos casos, el objetivo fundamental
es aportar una rápida retroalimentación a los
alumnos. La comparación de las calificaciones de los
estudiantes permite concluir que el uso de pruebas objetivas
mejora el nivel de aprendizaje de los alumnos.
La segunda opción ha requerido el desarrollo de una
herramienta informática que evalúa las respuestas de
los estudiantes a la vez que detecta posibles problemas
en los enunciados de las pruebas objetivas.SUMMARY -- The main goal of this paper is to present two methods
to evaluate the students’ homework, and to compare
their learning level when these methods are used. The
first one is based on peer-assessment, while the second
one includes a self-assessment and a test. In both cases,
the main objective is to provide a fast feedback to
the students. Analyzing the students’ grades, we conclude
that the use of tests improves the learning level
of the students. The second method has required the
development of an application which computes the assessment
of the students and, at the same time, detects
any problem in the formulation of the objective tests
randUTV: A Blocked Randomized Algorithm for Computing a Rank-Revealing UTV Factorization
A randomized algorithm for computing a so-called UTV factorization efficiently is presented. Given a matrix , the algorithm “randUTV” computes a factorization , where and have orthonormal columns, and is triangular (either upper or lower, whichever is preferred). The algorithm randUTV is developed primarily to be a fast and easily parallelized alternative to algorithms for computing the Singular Value Decomposition (SVD). randUTV provides accuracy very close to that of the SVD for problems such as low-rank approximation, solving ill-conditioned linear systems, and determining bases for various subspaces associated with the matrix. Moreover, randUTV produces highly accurate approximations to the singular values of . Unlike the SVD, the randomized algorithm proposed builds a UTV factorization in an incremental, single-stage, and noniterative way, making it possible to halt the factorization process once a specified tolerance has been met. Numerical experiments comparing the accuracy and speed of randUTV to the SVD are presented. Other experiments also demonstrate that in comparison to column-pivoted QR, which is another factorization that is often used as a relatively economic alternative to the SVD, randUTV compares favorably in terms of speed while providing far higher accuracy
Architecture-Aware Configuration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors
Asymmetric multicore processors (AMPs) have recently emerged as an appealing
technology for severely energy-constrained environments, especially in mobile
appliances where heterogeneity in applications is mainstream. In addition,
given the growing interest for low-power high performance computing, this type
of architectures is also being investigated as a means to improve the
throughput-per-Watt of complex scientific applications.
In this paper, we design and embed several architecture-aware optimizations
into a multi-threaded general matrix multiplication (gemm), a key operation of
the BLAS, in order to obtain a high performance implementation for ARM
big.LITTLE AMPs. Our solution is based on the reference implementation of gemm
in the BLIS library, and integrates a cache-aware configuration as well as
asymmetric--static and dynamic scheduling strategies that carefully tune and
distribute the operation's micro-kernels among the big and LITTLE cores of the
target processor. The experimental results on a Samsung Exynos 5422, a
system-on-chip with ARM Cortex-A15 and Cortex-A7 clusters that implements the
big.LITTLE model, expose that our cache-aware versions of gemm with asymmetric
scheduling attain important gains in performance with respect to its
architecture-oblivious counterparts while exploiting all the resources of the
AMP to deliver considerable energy efficiency
A Review of Lightweight Thread Approaches for High Performance Computing
High-level, directive-based solutions are becoming the programming models (PMs) of the multi/many-core architectures. Several solutions relying on operating system (OS) threads perfectly work with a moderate number of cores. However, exascale systems will spawn hundreds of thousands of threads in order to exploit their massive parallel architectures and thus conventional OS threads are too heavy for that purpose. Several lightweight thread (LWT) libraries have recently appeared offering lighter mechanisms to tackle massive concurrency. In order to examine the suitability of LWTs in high-level runtimes, we develop a set of microbenchmarks consisting of commonly-found patterns in current parallel codes. Moreover, we study the semantics offered by some LWT libraries in order to expose the similarities between different LWT application programming interfaces. This study reveals that a reduced set of LWT functions can be sufficient to cover the common parallel code patterns andthat those LWT libraries perform better than OS threads-based solutions in cases where task and nested parallelism are becoming more popular with new architectures.The researchers from the Universitat Jaume I de Castelló were supported by project TIN2014-53495-R of the MINECO, the Generalitat Valenciana fellowship programme Vali+d 2015, and FEDER. This work was partially supported by the U.S. Dept. of Energy, Office of Science, Office of Advanced
Scientific Computing Research (SC-21), under contract DEAC02-06CH11357. We gratefully acknowledge the computing resources provided and operated by the Joint Laboratory for System Evaluation (JLSE) at Argonne National Laboratory.Peer ReviewedPostprint (author's final draft
- …