411 research outputs found

    Descomposiciones ortogonales para el cálculo del rango numérico matricial

    Get PDF
    El cálculo del rango numérico matricial surge en numerosas aplicaciones de la ciencia y de la ingeniería. Actualmente existen tres aproximaciones numéricas básicas para efectuar este cálculo: la descomposición SVD, la descomposición URV y las descomposiciones QE reveladoras de rango (QR1IH). En este trabajo se analizan experimentalmente varios algoritmos secuenciales, basados en las tres aproximaciones anteriores para el cálculo del rango numérico matricial. Así, en el estudio comparativo experimental se emplea una implemeutación propia para el cálculo de la descomposición URV y dos nuevas rutinas para el cálculo de la descomposición QRRR. Además se utilizan las rutinas de la librería LAPACK para el cálculo de la descomposición SVD y la descomposición QR con pivotamiento de columnas. Los resultados experimentales muestran que la descomposición QEUR es en la práctica tan fiable como las costosas descomposiciones SVD y URV. Además, estas descomposiciones QRRR presentan la ventaja fundamental de su bajo coste computacional.Peer Reviewe

    Efficient Numerical Algorithms for Balanced Stochastic Truncation

    Get PDF
    We propose an efficient numerical algorithm for relative error model reduction based on balanced stochastic truncation. The method uses full-rank factors of the Gramians to be balanced versus each other and exploits the fact that for large-scale systems these Gramians are often of low numerical rank. We use the easy-to-parallelize sign function method as the major computational tool in determining these full-rank factors and demonstrate the numerical performance of the suggested implementation of balanced stochastic truncation model reduction

    Blocked algorithms for the reduction to Hessenberg-triangular form revisited

    Get PDF
    We present two variants of Moler and Stewart's algorithm for reducing a matrix pair to Hessenberg-triangular (HT) form with increased data locality in the access to the matrices. In one of these variants, a careful reorganization and accumulation of Givens rotations enables the use of efficient level 3 BLAS. Experimental results on four different architectures, representative of current high performance processors, compare the performances of the new variants with those of the implementation of Moler and Stewart's algorithm in subroutine DGGHRD from LAPACK, Dackland and Kågström's two-stage algorithm for the HT form, and a modified version of the latter which requires considerably less flop

    Applying OOC Techniques in the Reduction to Condensed Form for Very Large Symmetric Eigenproblems on GPUs

    Get PDF
    In this paper we address the reduction of a dense matrix to tridiagonal form for the solution of symmetric eigenvalue problems on a graphics processor (GPU) when the data is too large to fit into the accelerator memory. We apply out-of-core techniques to a three-stage algorithm, carefully redesigning the first stage to reduce the number of data transfers between the CPU and GPU memory spaces, maintain the memory requirements on the GPU within limits, and ensure high performance by featuring a high ratio between computation and communication

    A kernel regression procedure in the 3D shape space with an application to online sales of children's wear

    Get PDF
    Shape regression is of key importance in many scienti c elds. In this paper, we focus on the case where the shape of an object is represented by a con- guration matrix of landmarks. It is well known that this shape space has a nite-dimensional Riemannian manifold structure (non-Euclidean) which makes it di cult to work with. Papers about regression on this space are scarce in the literature. The majority of them are restricted to the case of a single explanatory variable, usually time or age, and many of them work in the approximated tangent space. In this paper we adapt the general method for kernel regression analysis in manifold-valued data proposed by Davis et al (2007) to the three-dimensional case of Kendall's shape space and generalize it to multiple explanatory variables. We also propose bootstrap con dence intervals for prediction. A simulation study is carried out to check the goodness of the procedure, and nally it is applied to a 3D database obtained from an anthropometric survey of the Spanish child population with a potential application to online sales of children's wear

    Relación entre el método de evaluación del trabajo y el nivel de aprendizaje de los estudiantes

    Get PDF
    El objetivo del presente trabajo es presentar dos métodos para evaluar el trabajo realizado por los estudiantes fuera del aula y comparar el nivel de aprendizaje adquirido en cada uno de ellos. El primero se fundamenta en la evaluación entre compañeros, mientras que el segundo combina la autoevaluación y la realización de una prueba objetiva. En ambos casos, el objetivo fundamental es aportar una rápida retroalimentación a los alumnos. La comparación de las calificaciones de los estudiantes permite concluir que el uso de pruebas objetivas mejora el nivel de aprendizaje de los alumnos. La segunda opción ha requerido el desarrollo de una herramienta informática que evalúa las respuestas de los estudiantes a la vez que detecta posibles problemas en los enunciados de las pruebas objetivas.SUMMARY -- The main goal of this paper is to present two methods to evaluate the students’ homework, and to compare their learning level when these methods are used. The first one is based on peer-assessment, while the second one includes a self-assessment and a test. In both cases, the main objective is to provide a fast feedback to the students. Analyzing the students’ grades, we conclude that the use of tests improves the learning level of the students. The second method has required the development of an application which computes the assessment of the students and, at the same time, detects any problem in the formulation of the objective tests

    randUTV: A Blocked Randomized Algorithm for Computing a Rank-Revealing UTV Factorization

    Get PDF
    A randomized algorithm for computing a so-called UTV factorization efficiently is presented. Given a matrix , the algorithm “randUTV” computes a factorization , where and have orthonormal columns, and is triangular (either upper or lower, whichever is preferred). The algorithm randUTV is developed primarily to be a fast and easily parallelized alternative to algorithms for computing the Singular Value Decomposition (SVD). randUTV provides accuracy very close to that of the SVD for problems such as low-rank approximation, solving ill-conditioned linear systems, and determining bases for various subspaces associated with the matrix. Moreover, randUTV produces highly accurate approximations to the singular values of . Unlike the SVD, the randomized algorithm proposed builds a UTV factorization in an incremental, single-stage, and noniterative way, making it possible to halt the factorization process once a specified tolerance has been met. Numerical experiments comparing the accuracy and speed of randUTV to the SVD are presented. Other experiments also demonstrate that in comparison to column-pivoted QR, which is another factorization that is often used as a relatively economic alternative to the SVD, randUTV compares favorably in terms of speed while providing far higher accuracy

    Architecture-Aware Configuration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors

    Get PDF
    Asymmetric multicore processors (AMPs) have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for low-power high performance computing, this type of architectures is also being investigated as a means to improve the throughput-per-Watt of complex scientific applications. In this paper, we design and embed several architecture-aware optimizations into a multi-threaded general matrix multiplication (gemm), a key operation of the BLAS, in order to obtain a high performance implementation for ARM big.LITTLE AMPs. Our solution is based on the reference implementation of gemm in the BLIS library, and integrates a cache-aware configuration as well as asymmetric--static and dynamic scheduling strategies that carefully tune and distribute the operation's micro-kernels among the big and LITTLE cores of the target processor. The experimental results on a Samsung Exynos 5422, a system-on-chip with ARM Cortex-A15 and Cortex-A7 clusters that implements the big.LITTLE model, expose that our cache-aware versions of gemm with asymmetric scheduling attain important gains in performance with respect to its architecture-oblivious counterparts while exploiting all the resources of the AMP to deliver considerable energy efficiency

    A Review of Lightweight Thread Approaches for High Performance Computing

    Get PDF
    High-level, directive-based solutions are becoming the programming models (PMs) of the multi/many-core architectures. Several solutions relying on operating system (OS) threads perfectly work with a moderate number of cores. However, exascale systems will spawn hundreds of thousands of threads in order to exploit their massive parallel architectures and thus conventional OS threads are too heavy for that purpose. Several lightweight thread (LWT) libraries have recently appeared offering lighter mechanisms to tackle massive concurrency. In order to examine the suitability of LWTs in high-level runtimes, we develop a set of microbenchmarks consisting of commonly-found patterns in current parallel codes. Moreover, we study the semantics offered by some LWT libraries in order to expose the similarities between different LWT application programming interfaces. This study reveals that a reduced set of LWT functions can be sufficient to cover the common parallel code patterns andthat those LWT libraries perform better than OS threads-based solutions in cases where task and nested parallelism are becoming more popular with new architectures.The researchers from the Universitat Jaume I de Castelló were supported by project TIN2014-53495-R of the MINECO, the Generalitat Valenciana fellowship programme Vali+d 2015, and FEDER. This work was partially supported by the U.S. Dept. of Energy, Office of Science, Office of Advanced Scientific Computing Research (SC-21), under contract DEAC02-06CH11357. We gratefully acknowledge the computing resources provided and operated by the Joint Laboratory for System Evaluation (JLSE) at Argonne National Laboratory.Peer ReviewedPostprint (author's final draft
    corecore