59 research outputs found
Large Scale Parallel Computations in R through Elemental
Even though in recent years the scale of statistical analysis problems has
increased tremendously, many statistical software tools are still limited to
single-node computations. However, statistical analyses are largely based on
dense linear algebra operations, which have been deeply studied, optimized and
parallelized in the high-performance-computing community. To make
high-performance distributed computations available for statistical analysis,
and thus enable large scale statistical computations, we introduce RElem, an
open source package that integrates the distributed dense linear algebra
library Elemental into R. While on the one hand, RElem provides direct wrappers
of Elemental's routines, on the other hand, it overloads various operators and
functions to provide an entirely native R experience for distributed
computations. We showcase how simple it is to port existing R programs to Relem
and demonstrate that Relem indeed allows to scale beyond the single-node
limitation of R with the full performance of Elemental without any overhead.Comment: 16 pages, 5 figure
Quantum ESPRESSO: a modular and open-source software project for quantum simulations of materials
Quantum ESPRESSO is an integrated suite of computer codes for
electronic-structure calculations and materials modeling, based on
density-functional theory, plane waves, and pseudopotentials (norm-conserving,
ultrasoft, and projector-augmented wave). Quantum ESPRESSO stands for "opEn
Source Package for Research in Electronic Structure, Simulation, and
Optimization". It is freely available to researchers around the world under the
terms of the GNU General Public License. Quantum ESPRESSO builds upon
newly-restructured electronic-structure codes that have been developed and
tested by some of the original authors of novel electronic-structure algorithms
and applied in the last twenty years by some of the leading materials modeling
groups worldwide. Innovation and efficiency are still its main focus, with
special attention paid to massively-parallel architectures, and a great effort
being devoted to user friendliness. Quantum ESPRESSO is evolving towards a
distribution of independent and inter-operable codes in the spirit of an
open-source project, where researchers active in the field of
electronic-structure calculations are encouraged to participate in the project
by contributing their own codes or by implementing their own ideas into
existing codes.Comment: 36 pages, 5 figures, resubmitted to J.Phys.: Condens. Matte
LAPACK95 â high performance linear algebra package
LAPACK95 is a set of FORTRAN95 subroutines which interfaces FORTRAN95 with LAPACK.
All LAPACK driver subroutines (including expert drivers) and some LAPACK computationals have both generic LAPACK95 interfaces and generic LAPACK77 interfaces. The remaining computationals have only generic LAPACK77 interfaces. In both types of interfaces no distinction is made between single and double precision or between real and complex data types.
LAPACK95 - didelio naĆĄumo tiesinÄs algebros algoritmĆł paketas
Santrauka
Ć iame darbe apraĆĄytas LAPACK 95 paketas, kuri sudaro FORTRAN95 paprogramiu rinkinys. Jos skirtos realizuoti FORTRAN 95 interfeisa su standartine LAPACK biblioteka. Pateikiama bĆ«tiniausia informacija apie LAPACK, ScaLAPACK, FORTRAN 95 ir HPF bibliotekas. Po to trumpai suformuluojami bendri interfeiso sudarymo principai. LAPACK 95 nera nauja tiesines algebros algoritmu biblioteka, o tik konverteris, leidĆŸiantis ir FORTRAN 95 programose naudoti LAPACK algoritmus. Parodyta, kaip galima iĆĄnaudoti platesnes FORTRAN 95 galimybes lyginant su standartiniu FORTRAN 77. Pateikti interfeisu pavyzdĆŸiai.
First Published Online: 14 Oct 201
Time-power-energy balance of BLAS kernels in modern FPGAs
Conference proceedings 2022High Performance Computing. 9th Latin American Conference, CARLA 2022, Porto Alegre, Brazil, 26-30 sep 2022, Revised Selected Papers.Numerical Linear Algebra (NLA) is a research field that in the last decades has been characterized by the use of kernel libraries that are de facto standards. One of the most remarkable examples, in particular in the HPC field, is the Basic Linear Algebra Subroutines (BLAS). Most BLAS operations are fundamental in multiple scientific algorithms because they generally constitute the most computationally expensive stage. For this reason, numerous efforts have been made to optimize such operations on various hardware platforms. There is a growing concern in the high-performance computing world about power consumption, making energy efficiency an extremely important quality when evaluating hardware platforms. Due to their greater energy efficiency, Field-Programmable Gate Arrays (FPGAs) are available today as an interesting alternative to other hardware platforms for the acceleration of this type of operation. Our study focuses on the evaluation of FPGAs to address dense NLA operations. Specifically, in this work we explore and evaluate the available options for two of the most representative kernels of BLAS, i.e. GEMV and GEMM. The experimental evaluation is carried out in an Alveo U50 accelerator card from Xilinx and an Intel Xeon Silver multicore CPU. Our findings show that even in kernels where the CPU reaches better runtimes, the FPGA counterpart is more energy efficient.Los investigadores contaron con el apoyo de la Universidad de la RepĂșblica y el PEDECIBA.Se agradece a la ANII â MPG Independent Research Groups : âEfficient Hetergenous Computingâ - CSC grou
Approaches for MATLAB Applications Acceleration Using High Performance Reconfigurable Computers
A lot of raw computing power is needed in many scientific computing applications and simulations. MATLABÂźâ is one of the popular choices as a language for technical computing. Presented here are approaches for MATLAB based applications acceleration using High Performance Reconfigurable Computing (HPRC) machines. Typically, these are a cluster of Von Neumann architecture based systems with none or more FPGA reconfigurable boards. As a case study, an Image Correlation Algorithm has been ported on this architecture platform. As a second case study, the recursive training process in an Artificial Neural Network (ANN) to realize an optimum network has been accelerated, by porting it to HPC Systems. The approaches taken are analyzed with respect to target scenarios, end users perspective, programming efficiency and performance. Disclaimer: Some material in this text has been used and reproduced with appropriate references and permissions where required. â MATLABÂź is a registered trademark of The Mathworks, Inc. ©1994-2003
A Preemption-Based Meta-Scheduling System for Distributed Computing
This research aims at designing and building a scheduling framework for distributed computing systems with the primary objectives of providing fast response times to the users, delivering high system throughput and accommodating maximum number of applications into the systems. The author claims that the above mentioned objectives are the most important objectives for scheduling in recent distributed computing systems, especially Grid computing environments.
In order to achieve the objectives of the scheduling framework, the scheduler employs arbitration of application-level schedules and preemption of executing jobs under certain conditions. In application-level scheduling, the user develops a schedule for his application using an execution model that simulates the execution behavior of the application. Since application-level scheduling can seriously impede the performance of the system, the scheduling framework developed in this research arbitrates between different application-level schedules corresponding to different applications to provide fair system usage for all applications and balance the interests of different applications. In this sense, the scheduling framework is not a classical scheduling system, but a meta-scheduling system that interacts with the application-level schedulers.
Due to the large system dynamics involved in Grid computing systems, the ability to preempt executing jobs becomes a necessity. The meta-scheduler described in this dissertation employs well defined scheduling policies to preempt and migrate executing applications. In order to provide the users with the capability to make their applications preemptible, a user-level check-pointing library called SRS (Stop-Restart Software) was also developed by this research. The SRS library is different from many user-level check-pointing libraries since it allows reconfiguration of applications between migrations. This reconfiguration can be achieved by changing the processor configuration and/or data distribution.
The experimental results provided in this dissertation demonstrates the utility of the metascheduling framework for distributed computing systems. And lastly, the metascheduling framework was put to practical use by building a Grid computing system called GradSolve. GradSolve is a flexible system and it allows the application library writers to upload applications with different capabilities into the system. GradSolve is also unique with respect to maintaining traces of the execution of the applications and using the traces for subsequent executions of the application
- âŠ