Search CORE

18,517 research outputs found

GridFTP: Protocol Extensions to FTP for the Grid

Author: Ould-Saada Farid
Publication venue
Publication date: 28/05/2013
Field of study

GridFTP: Protocol Extensions to FTP for the Gri

ZENODO

High-Performance Solvers for Dense Hermitian Eigenproblems

Author: Bientinesi Paolo
Peise Elmar
Petschow Matthias
Publication venue
Publication date: 25/09/2012
Field of study

We introduce a new collection of solvers - subsequently called EleMRRR - for large-scale dense Hermitian eigenproblems. EleMRRR solves various types of problems: generalized, standard, and tridiagonal eigenproblems. Among these, the last is of particular importance as it is a solver on its own right, as well as the computational kernel for the first two; we present a fast and scalable tridiagonal solver based on the Algorithm of Multiple Relatively Robust Representations - referred to as PMRRR. Like the other EleMRRR solvers, PMRRR is part of the freely available Elemental library, and is designed to fully support both message-passing (MPI) and multithreading parallelism (SMP). As a result, the solvers can equally be used in pure MPI or in hybrid MPI-SMP fashion. We conducted a thorough performance study of EleMRRR and ScaLAPACK's solvers on two supercomputers. Such a study, performed with up to 8,192 cores, provides precise guidelines to assemble the fastest solver within the ScaLAPACK framework; it also indicates that EleMRRR outperforms even the fastest solvers built from ScaLAPACK's components

arXiv.org e-Print Archive

Publikationsserver der RWTH Aachen University

Recommended from our members

Executing matrix multiply on a process oriented data flow machine

Author: Bic Lubomir
Nagel Mark D.
Roy John M.A.
Publication venue: eScholarship, University of California
Publication date: 01/01/1990
Field of study

The Process-Oriented Dataflow System (PODS) is an execution model that combines the von Neumann and dataflow models of computation to gain the benefits of each. Central to PODS is the concept of array distribution and its effects on partitioning and mapping of processes.In PODS arrays are partitioned by simply assigning consecutive elements to each processing element (PE) equally. Since PODS uses single assignment, there will be only one producer of each element. This producing PE owns that element and will perform the necessary computations to assign it. Using this approach the filling loop is distributed across the PEs. This simple partitioning and mapping scheme provides excellent results for executing scientific code on MIMD machines. In this way PODS allows MIMD machines to exploit vector and data parallelism easily while still providing the flexibility of MIMD over SIMD for multi-user systems.In this paper, the classic matrix multiply algorithm, with 1024 data points, is executed on a PODS simulator and the results are presented and discussed. Matrix multiply is a good example because it has several interesting properties: there are multiple code-blocks; a new array must be dynamically allocated and distributed; there is a loop-carried dependency in the innermost loop; the two input arrays have different access patterns; and the sizes of the input arrays are not known at compile time. Matrix multiply also forms the basis for many important scientific algorithms such as: LU decomposition, convolution, and the Fast-Fourier Transform.The results show that PODS is comparable to both Iannucci's Hybrid Architecture and MIT's TTDA in terms of overhead and instruction power. They also show that PODS easily distributes the work load evenly across the PEs. The key result is that PODS can scale matrix multiply in a near linear fashion until there is little or no work to be performed for each PE. Then overhead and message passing become a major component of the execution time. With larger problems (e.g., >/=16k data points) this limit would be reached at around 256 PEs

eScholarship - University of California

Principles in Patterns (PiP) : Evaluation of Impact on Business Processes

Author: Macgregor George
Publication venue: University of Strathclyde
Publication date: 01/04/2012
Field of study

The innovation and development work conducted under the auspices of the Principles in Patterns (PiP) project is intended to explore and develop new technology-supported approaches to curriculum design, approval and review. An integral component of this innovation is the use of business process analysis and process change techniques - and their instantiation within the C-CAP system (Class and Course Approval Pilot) - in order to improve the efficacy of curriculum approval processes. Improvements to approval process responsiveness and overall process efficacy can assist institutions in better reviewing or updating curriculum designs to enhance pedagogy. Such improvements also assume a greater significance in a globalised HE environment, in which institutions must adapt or create curricula quickly in order to better reflect rapidly changing academic contexts, as well as better responding to the demands of employment marketplaces and the expectations of professional bodies. This is increasingly an issue for disciplines within the sciences and engineering, where new skills or knowledge need to be rapidly embedded in curricula as a response to emerging technological or environmental developments. All of the aforementioned must also be achieved while simultaneously maintaining high standards of academic quality, thus adding a further layer of complexity to the way in which HE institutions engage in "responsive curriculum design" and approval. This strand of the PiP evaluation therefore entails an analysis of the business process techniques used by PiP, their efficacy, and the impact of process changes on the curriculum approval process, as instantiated by C-CAP. More generally the evaluation is a contribution towards a wider understanding of technology-supported process improvement initiatives within curriculum approval and their potential to render such processes more transparent, efficient and effective. Partly owing to limitations in the data required to facilitate comparative analyses, this evaluation adopts a mixed approach, making use of qualitative and quantitative methods as well as theoretical techniques. These approaches combined enable a comparative evaluation of the curriculum approval process under the "new state" (i.e. using C-CAP) and under the "previous state". This report summarises the methodology used to enable comparative evaluation and presents an analysis and discussion of the results. As the report will explain, the impact of C-CAP and its ability to support improvements in process and document management has resulted in the resolution of numerous process failings. C-CAP has also demonstrated potential for improvements in approval process cycle time, process reliability, process visibility, process automation, process parallelism and a reduction in transition delays within the approval process, thus contributing to considerable process efficiencies; although it is acknowledged that enhancements and redesign may be required to take advantage of C-CAP's potential. Other aspects pertaining to C-CAP's impact on process change, improvements to document management and the curation of curriculum designs will also be discussed

University of Strathclyde Institutional Repository

Parallel accelerated cyclic reduction preconditioner for three-dimensional elliptic PDEs with variable coefficients

Author: Chávez Gustavo
Keyes David
Turkiyyah George
Zampini Stefano
Publication venue: 'Elsevier BV'
Publication date: 23/12/2017
Field of study

We present a robust and scalable preconditioner for the solution of large-scale linear systems that arise from the discretization of elliptic PDEs amenable to rank compression. The preconditioner is based on hierarchical low-rank approximations and the cyclic reduction method. The setup and application phases of the preconditioner achieve log-linear complexity in memory footprint and number of operations, and numerical experiments exhibit good weak and strong scalability at large processor counts in a distributed memory environment. Numerical experiments with linear systems that feature symmetry and nonsymmetry, definiteness and indefiniteness, constant and variable coefficients demonstrate the preconditioner applicability and robustness. Furthermore, it is possible to control the number of iterations via the accuracy threshold of the hierarchical matrix approximations and their arithmetic operations, and the tuning of the admissibility condition parameter. Together, these parameters allow for optimization of the memory requirements and performance of the preconditioner.Comment: 24 pages, Elsevier Journal of Computational and Applied Mathematics, Dec 201

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Combined shared and distributed memory ab-initio computations of molecular-hydrogen systems in the correlated state: process pool solution and two-level parallelism

Author: Biborski Andrzej
Kądzielawa Andrzej P.
Spałek Józef
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

An efficient computational scheme devised for investigations of ground state properties of the electronically correlated systems is presented. As an example,

(H_{2})_{n}

chain is considered with the long-range electron-electron interactions taken into account. The implemented procedure covers: (i) single-particle Wannier wave-function basis construction in the correlated state, (ii) microscopic parameters calculation, and (iii) ground state energy optimization. The optimization loop is based on highly effective process-pool solution - specific root-workers approach. The hierarchical, two-level parallelism was applied: both shared (by use of Open Multi-Processing) and distributed (by use of Message Passing Interface) memory models were utilized. We discuss in detail the feature that such approach results in a substantial increase of the calculation speed reaching factor of

300

for the fully parallelized solution.Comment: 14 pages, 10 figures, 1 tabl

arXiv.org e-Print Archive

Crossref

Jagiellonian Univeristy Repository

Parallel Algorithm for Frequent Itemset Mining on Intel Many-core Systems

Author: Zymbler Mikhail
Publication venue
Publication date: 01/01/2018
Field of study

Frequent itemset mining leads to the discovery of associations and correlations among items in large transactional databases. Apriori is a classical frequent itemset mining algorithm, which employs iterative passes over database combining with generation of candidate itemsets based on frequent itemsets found at the previous iteration, and pruning of clearly infrequent itemsets. The Dynamic Itemset Counting (DIC) algorithm is a variation of Apriori, which tries to reduce the number of passes made over a transactional database while keeping the number of itemsets counted in a pass relatively low. In this paper, we address the problem of accelerating DIC on the Intel Xeon Phi many-core system for the case when the transactional database fits in main memory. Intel Xeon Phi provides a large number of small compute cores with vector processing units. The paper presents a parallel implementation of DIC based on OpenMP technology and thread-level parallelism. We exploit the bit-based internal layout for transactions and itemsets. This technique reduces the memory space for storing the transactional database, simplifies the support count via logical bitwise operation, and allows for vectorization of such a step. Experimental evaluation on the platforms of the Intel Xeon CPU and the Intel Xeon Phi coprocessor with large synthetic and real databases showed good performance and scalability of the proposed algorithm.Comment: Accepted for publication in Journal of Computing and Information Technology (http://cit.fer.hr

arXiv.org e-Print Archive

Directory of Open Access Journals

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia