Search CORE

83 research outputs found

Taming computational complexity: efficient and parallel SimRank optimizations on undirected graphs

Author: D. Fogaras
P. Li
R. Bhatia
Y. Cai
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

SimRank has been considered as one of the promising link-based ranking algorithms to evaluate similarities of web documents in many modern search engines. In this paper, we investigate the optimization problem of SimRank similarity computation on undirected web graphs. We ﬁrst present a novel algorithm to estimate the SimRank between vertices in O(n3+ Kn2) time, where n is the number of vertices, and K is the number of iterations. In comparison, the most efﬁcient implementation of SimRank algorithm in [1] takes O(K n3 ) time in the worst case. To efﬁciently handle large-scale computations, we also propose a parallel implementation of the SimRank algorithm on multiple processors. The experimental evaluations on both synthetic and real-life data sets demonstrate the better computational time and parallel efﬁciency of our proposed techniques

Crossref

Spiral - Imperial College Digital Repository

Out-of-core macromolecular simulations on multithreaded architectures

Author: Aliaga José Ignacio
Badia Jose M.
Castillo Maribel
Davidović Davor
Mayo Rafael
Quintana-Ortí Enrique S.
Publication venue: 'Wiley'
Publication date: 01/01/2015
Field of study

We address the solution of large-scale eigenvalue problems that appear in the motion simulation of complex macromolecules on multithreaded platforms, consisting of multicore processors and possibly a graphics processor (GPU). In particular, we compare specialized implementations of several high- performance eigensolvers that, by relying on disk storage and out-of-core (OOC) techniques, can in principle tackle the large memory requirements of these biological problems, which in general do not fit into the main memory of current desktop machines. All these OOC eigensolvers, except for one, are composed of compute-bound (i.e., arithmetically-intensive) operations, which we accelerate by exploiting the performance of current multicore processors and, in some cases, by additionally off-loading certain parts of the computation to a GPU accelerator. One of the eigensolvers is a memory-bound algorithm, which strongly constrains its performance when the data is on disk. However, this method exhibits a much lower arithmetic cost compared with its compute- bound alternatives for this particular application. Experimental results on a desktop platform, representative of current server technology, illustrate the potential of these methods to address the simulation of biological activity

Repositori Institucional de la Universitat Jaume I

Full-text Institutional Repository of the Ruđer Bošković Institute

Leveraging task-parallelism in message-passing dense matrix factorizations using SMPSs

Author: Badia Sala Rosa Maria
Martín Huertas Alberto Francisco
Quintana Ortí Enrique Salvador
Reyes Ruyman
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

In this paper, we investigate how to exploit task-parallelism during the execution of the Cholesky factorization on clusters of multicore processors with the SMPSs programming model. Our analysis reveals that the major difficulties in adapting the code for this operation in ScaLAPACK to SMPSs lie in algorithmic restrictions and the semantics of the SMPSs programming model, but also that they both can be overcome with a limited programming effort. The experimental results report considerable gains in performance and scalability of the routine parallelized with SMPSs when compared with conventional approaches to execute the original ScaLAPACK implementation in parallel as well as two recent message-passing routines for this operation. In summary, our study opens the door to the possibility of reusing message-passing legacy codes/libraries for linear algebra, by introducing up-to-date techniques like dynamic out-of-order scheduling that significantly upgrade their performance, while avoiding a costly rewrite/reimplementation.This research was supported by Project EU INFRA-2010-1.2.2 \TEXT:Towards EXa op applicaTions". The researcher at BSC-CNS was supported by the HiPEAC-2 Network of Excellence (FP7/ICT 217068), the Spanish Ministry of Education (CICYT TIN2011-23283, TIN2007-60625 and CSD2007- 00050), and the Generalitat de Catalunya (2009-SGR-980). The researcher at CIMNE was partially funded by the UPC postdoctoral grants under the programme \BKC5-Atracció i Fidelització de talent al BKC". The researcher at UJI was supported by project CICYT TIN2008-06570-C04-01 and FEDER. We thank Jesus Labarta, from BSC-CNS, for helpful discussions on SMPSs and his help with the performance analysis of the codes with Paraver. We thank Vladimir Marjanovic, also from BSC-CNS, for his help in the set-up and tuning of the MPI/SMPSs tools on JuRoPa. Finally, we thank Rafael Mayo, from UJI, for his support in the preliminary stages of this work. The authors gratefully acknowledge the computing time granted on the supercomputer JuRoPa at Jülich Supercomputing Centrer.Peer ReviewedPreprin

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Repositori Institucional de la Universitat Jaume I

Elemental: A new framework for distributed memory dense matrix computations

Author: Bryan Marker
Jack Poulson
Robert Van De Geijn
Publication venue
Publication date: 05/03/2020
Field of study

Abstract Parallelizing dense matrix computations to distributed memory architectures is a well-studied subject and generally considered to be among the best understood domains of parallel computing. Two packages, developed in the mid 1990s, still enjoy regular use: ScaLAPACK and PLAPACK. With the advent of many-core architectures, which may very well take the shape of distributed memory architectures within a single processor, these packages must be revisited since it will likely not be practical to use MPI-based implementations. Thus, this is a good time to review what lessons we have learned since the introduction of these two packages and to propose a simple yet effective alternative. Preliminary performance results show the new solution achieves considerably better performance than the previously developed libraries

CiteSeerX

Recommended from our members

Using parallel computation to apply the singular value decomposition (SVD) in solving for large Earth gravity fields based on satellite data

Author: Hinga Mark Brandon
Publication venue
Publication date: 01/01/2004
Field of study

textUsing satellite data only to estimate for an Earth gravity field introduces the problem of an ill-conditioned system of equations. This mathematical difficulty amplifies as the number of unknown gravity field parameters increases, requiring a stabilization of the inversion for solution. But the number of parameters to be estimated can also be too large to allow inversion using a sequential algorithm (one computer processor). Therefore the challenge is two-fold. A stabilized inversion must be performed with a parallel (multi-processor) algorithm. Thus, new code was developed in the parallel computing infrastructure of Parallel Linear Algebra Package (PLAPACK) to achieve the task of applying the Singular Value Decomposition (SVD) to invert for (and stabilize) very large gravity fields of well over 25,000 unknown parameters. This new code is given the name (Parallel LArge Svd Solver) PLASS. The choice of the SVD was made because it offers multiple opportunities of stabilization techniques. Poorly observed parameter corrections are removed from the culpable eigenspace of the normal matrix of CHAMP or the singular vector space of the upper R triangular matrix of GRACE. Solutions were stabilized based on the removal of either eigenvalues or singular values using four different standard optimization criteria: Inspection, Relative Error, Norm Norm minimization, trace of the Mean Square Error (MSE) matrix, and with a fifth method, independently introduced for this investigation, that optimizes removal of eigenvalues or singular values based on Kaula’s power rule of thumb. This method is given the name “Kaula Eigenvalue (KEV) or Kaula Singular Value (KSV) relation”. For the gravity fields of this investigation, orbital fits, geodetic evaluations and error propagations of the best of the resulting SVD gravity fields were performed, and shown to be comparable to the CHAMP solution obtained by the GeoForschungsZentrum (GFZ) and to the full rank GRACE solution obtained by the Center for Space Research (CSR).Aerospace Engineering and Engineering Mechanic

Texas ScholarWorks

Implementation of Parallel Least-Squares Alorithms for Gravity Field Estimation

Author: Xie Jing
Publication venue: Ohio State University. Division of Geodetic Science
Publication date: 01/03/2005
Field of study

This report was prepared by Jing Xie, a graduate research associate in the Department of Civil and Environmental Engineering and Geodetic Science at the Ohio State University, under the supervision of Professor C. K. Shum. This research was partially supported by grants from NSF Earth Sciences program: EAR-0327633, NASA Office of Earth Science program: NNG04GF01G and NNG04GN19G.This report was also submitted to the Graduate School of the Ohio State University as a thesis in partial fulfillment of the requirements for the Master of Science degree.NASA/GFZ’s Gravity Recovery and Climate Experiment (GRACE) twin-satellite mission, launched in 2002 for a five-year nominal mission, has provided accurate scientific products which help scientists gain new insights on climate signals which manifest as temporal variations of the Earth’s gravity field. This satellite mission also presents a significant computational challenge to analyze the large amount of data collected to solve a massive geophysical inverse problem every month. This paper focuses on applying parallel (primarily distributed) computing techniques capable of rigorously inverting monthly geopotential coefficients using GRACE data. The gravity solution is based on the energy conservation approach which established a linear relationship between the in-situ geopotential difference of two satellites and the position and velocity vectors using the high-low (GPS to GRACE) and the low-low (GRACE spacecrafts) satellite-to-satellite tracking data, and the accelerometer data from both GRACE satellites. Both the direct or rigorous inversion and the iterative (conjugate gradient) methods are studied. Our goal is to develop numerical algorithms and a portable distributed-computing code, which is potentially “scalable” (i.e., keeping constant efficiency with increased problem size and number of processors), capable of efficiently solving the GRACE problem and also applicable to other generalized large geophysical inverse problems. Typical monthly GRACE gravity solutions require solving spherical harmonic coefficients complete to degree 120 (14,637 parameters) and other nuisance parameters. The accumulation of the 259,200 monthly low-low GRACE observations (with 0.1 Hz sampling rate) to normal equations matrix needs more than 55 trillion floating point operations (FLOPs) and ~1.7 GB central memory to store it. Its inversion adds ~1 trillion FLOPs. To circumvent this huge computational challenge, we use a 16 nodes SGI 750 cluster system with 32 733 MHz Itanium processors to test our algorithm. We choose the object-oriented Parallel Linear Algebra Package (PLAPACK) as the main tool and Message Passing Interface (MPI) as the underlying communication layer to build the parallel code. MPI parallel I/O technique is also implemented to increase the speed of transferring data between hard drive and memory. Furthermore, we optimize both the serial and parallel codes by carefully analyzing the cost of the numerical operations, fully exploiting the power of the Itanium architecture and utilizing highly optimized numerical libraries. For direct inversion, we tested the implementations of the Normal equations Matrix Accumulation (NMA) method, that computes the design as well as normal iii equations matrix locally and accumulates them to global objects afterwards, and the Design Matrix Accumulation (DMA) approach, which forms small-size design matrices locally first and transfers them to global scale by matrix-matrix multiplication to obtain a global normal equations matrix. The creation of the normal equations matrix takes the majority of the entire wall clock time. Our preliminary results indicate that the NMA method is very fast but at present cannot be used to estimate extremely high degree and order coefficients due to the lack of central memory. The DMA method can solve for all geopotential coefficients complete to spherical harmonic degree 120 in roughly 30 minutes using 24 CPUs. The serial implementation of the direct inverse method takes about 7.5 hours for the same inversion problem using the same but only one processor. In the realization of the conjugate gradient method on the distributed platform, the preconditioner is chosen as the block diagonal part of the normal equations matrix. The approximate computation of the variance-covariance matrix for the solution is also implemented. With significantly less arithmetic operations and memory usage, the conjugate gradient method only spends approximately 8 minutes (wall clock time) to solve for the gravity field coefficients up to degree 120 using 24 CPUs after 21 iterations, while the serial code runs roughly 3.5 hours to achieve the same results on a single processor. Both the direct inversion method and the iterative method give good estimates of the unknown geopotential coefficients. In this sense, the iteration approach is better for the much shorter running time, but only an approximation of the estimated variancecovariance matrix is provided. Scalability of the direct and iterative method is also analyzed in this study. Numerical results show that the NMA method and the conjugate gradient method achieve good scalability in our simulation. While the DMA method is not as scalable as the other two for smaller problem sizes, its efficiency improves gradually with the increase of problem sizes and processor numbers. The developed codes are potentially transportable across different computer platforms and applicable to other generalized large geophysical inverse problems

KnowledgeBank at OSU

Elemental: A new framework for distributed memory dense matrix computations

Author: Bryan Marker
Jack Poulson
Jeff R Hammond
Nichols A Romero
Robert A Van De Geijn
Publication venue
Publication date: 05/03/2020
Field of study

Abstract Parallelizing dense matrix computations to distributed memory architectures is a well-studied subject and generally considered to be among the best understood domains of parallel computing. Two packages, developed in the mid 1990s, still enjoy regular use: ScaLAPACK and PLAPACK. With the advent of many-core architectures, which may very well take the shape of distributed memory architectures within a single processor, these packages must be revisited since it will likely not be practical to use MPI-based implementations. Thus, this is a good time to review lessons learned since the introduction of these two packages and to propose a simple yet effective alternative. Preliminary performance results show the new solution achieves competitive, if not superior, performance on large clusters (i.e., on two racks o

CiteSeerX