Search CORE

757 research outputs found

The Parallelism Motifs of Genomic Data Analysis

Author: Awan Muaaz
Azad Ariful
Brock Benjamin
Buluc Aydin
Egan Rob
Ekanayake Saliya
Ellis Marquita
Georganas Evangelos
Guidi Giulia
Hofmeyr Steven
Oliker Leonid
Selvitopi Oguz
Teodoropol Cristina
Yelick Katherine
Publication venue: 'The Royal Society'
Publication date: 20/01/2020
Field of study

Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

arXiv.org e-Print Archive

eScholarship - University of California

Analysis of microstrip antennas by multilevel matrix decomposition algorithm

Author: Heldring Alexander
Mosig Juan R
Parrón Granados Josep
Rius Casals Juan Manuel
Úbeda Farré Eduard
Publication venue: 'Sociedade Portuguesa de Vida Selvagem (SPVS)'
Publication date: 01/01/1999
Field of study

Integral equation methods (IE) are widely used in conjunction with Method of Moments (MoM) discretization for the numerical analysis of microstrip antennas. However, their application to large antenna arrays is difficult due to the fact that the computational requirements increase rapidly with the number of unknowns N. Several techniques have been proposed to reduce the computational cost of IE-MoM. The Multilevel Matrix Decomposition Algorithm (MLMDA) has been implemented in 3D for arbitrary perfectly conducting surfaces discretized in Rao, Wilton and Glisson linear triangle basis functions . This algorithm requires an operation count that is proportional to N·log2N. The performance of the algorithm is much better for planar or piece-wise planar objects than for general 3D problems, which makes the algorithm particularly well-suited for the analysis of microstrip antennas. The memory requirements are proportional to N·logN and very low. The main advantage of the MLMDA compared with other efficient techniques to solve integral equations is that it does not rely on specific mathematical properties of the Green's functions being used. Thus, we can apply the method to interesting configurations governed by special Green's functions like multilayered media. In fact, the MDA-MLMDA method can be used at the top of any existing MoM code. In this paper we present the application to the analysis of large printed antenna arrays.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

A 2D algorithm with asymmetric workload for the UPC conjugate gradient method

Author: DH Bailey
H Shan
J González-Domínguez
JC Pichel
Jorge González-Domínguez
Juan Touriño
María J. Martín
Osni A. Marques
R Barrett
R Vuduc
Y Saad
Y Zheng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

This is a post-peer-review, pre-copyedit version of an article published in Journal of Supercomputing. The final authenticated version is available online at: https://doi.org/10.1007/s11227-014-1300-0[Abstract] This paper examines four different strategies, each one with its own data distribution, for implementing the parallel conjugate gradient (CG) method and how they impact communication and overall performance. Firstly, typical 1D and 2D distributions of the matrix involved in CG computations are considered. Then, a new 2D version of the CG method with asymmetric workload, based on leaving some threads idle during part of the computation to reduce communication, is proposed. The four strategies are independent of sparse storage schemes and are implemented using Unified Parallel C (UPC), a Partitioned Global Address Space (PGAS) language. The strategies are evaluated on two different platforms through a set of matrices that exhibit distinct sparse patterns, demonstrating that our asymmetric proposal outperforms the others except for one matrix on one platform.Ministerio de Economía y Competitividad; TIN2013-42148-PXunta de Galicia; GRC2013/055United States. Department of Energy; DEAC03-76SF0009

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Performance Evaluation of Sparse Matrix Products in UPC

Author: García-López Óscar
González-Domínguez Jorge
López Taboada Guillermo
Martín María J.
Touriño Juan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

This is a post-peer-review, pre-copyedit version of an article published in The Journal of Supercomputing. The final authenticated version is available online at: https://doi.org/10.1007/s11227-012-0796-4[Abstract] Unified Parallel C (UPC) is a Partitioned Global Address Space (PGAS) language whose popularity has increased during the last years owing to its high programmability and reasonable performance through an efficient exploitation of data locality, especially on hierarchical architectures like multicore clusters. However, the performance issues that arise in this language due to the irregular structure of sparse matrix operations have not yet been studied. Among them, the selection of an adequate storage format for the sparse matrices can significantly improve the efficiency of the parallel codes. This paper presents an evaluation, using UPC, of the most common sparse storage formats with different implementations of the matrix-vector and matrix-matrix products, which are key kernels in many scientific applications.Ministerio de Ciencia e Innovación; TIN2010-16735Ministerio de Educación; AP2008-01578Ministerio de Ciencia e Innovación; CAPAP-H3; TIN2010-12011-

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

A Flexible NTT-based multiplier for Post-Quantum Cryptography

Author: Guido Masera
Kristjane Koleci
Maurizio Martina
Paolo Mazzetti
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2023
Field of study

In this work an NTT-based (Number Theoretic Transform) multiplier for code-based Post-Quantum Cryptography (PQC) is presented, supporting Quasi Cyclic Low/Moderate-Density Parity-Check (QC LDPC/MDPC) codes. The cyclic matrix product, which is the fundamental operation required in this application, is treated as a polynomial product and adapted to the specific case of QC-MDPC codes proposed for Round 3 and 4 in the National Institute of Standards and Technology (NIST) competition for PQC. The multiplier is a fundamental component in both encryption and decryption, and the proposed solution leads to a flexible NTT-based multiplier, which can efficiently handle all types of required products, where the vectors have a length ≈104 and can be moderately sparse. The proposed architecture is implemented using both Field Programmable Gate Array (FPGA) and Application Specific Integrated Circuit (ASIC) technologies and, when compared with the best published results, it features a 10 times reduction of the encryption times with the area increased by 3 times. The proposed multiplier, incorporated in the encryption and decryption stages of a code-based PQC cryptosystem, leads to an improvement over the best published results between 3 to 10 times in terms of LC product (LUT times latency)

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Dynamic Multiple Work Stealing Strategy for Flexible Load Balancing

Author: ADNAN
SATO Mitsuhisa
佐藤三久
Publication venue: 'Institute of Electronics, Information and Communications Engineers (IEICE)'
Publication date: 01/06/2012
Field of study

Lazy-task creation is an efficient method of overcoming the overhead of the grain-size problem in parallel computing. Work stealing is an effective load balancing strategy for parallel computing. In this paper, we present dynamic work stealing strategies in a lazy-task creation technique for efficient fine-grain task scheduling. The basic idea is to control load balancing granularity depending on the number of task parents in a stack. The dynamic-length strategy of work stealing uses run-time information, which is information on the load of the victim, to determine the number of tasks that a thief is allowed to steal. We compare it with the bottommost first work stealing strategy used in StackThread/MP, and the fixed-length strategy of work stealing, where a thief requests to steal a fixed number of tasks, as well as other multithreaded frameworks such as Cilk and OpenMP task implementations. The experiments show that the dynamic-length strategy of work stealing performs well in irregular workloads such as in UTS benchmarks, as well as in regular workloads such as Fibonacci, Strassen\u27s matrix multiplication, FFT, and Sparse-LU factorization. The dynamic-length strategy works better than the fixed-length strategy because it is more flexible than the latter; this strategy can avoid load imbalance due to overstealing

Tsukuba Repository

Method of moments enhancement technique for the analysis of Sierpinski pre-fractal antennas

Author: Mosig Juan R.
Parrón Granados Josep
Rius Casals Juan Manuel
Romeu Robert Jordi
Publication venue: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Publication date: 01/01/2003
Field of study

The numerical analysis of highly iterated Sierpinski microstrip patch antennas by method of moments (MoM) involves many tiny subdomain basis functions, resulting in a very large number of unknowns. The Sierpinski pre-fractal can be defined by an iterated function system (IFS). As a consequence, the geometry has a multilevel structure with many equal subdomains. This property, together with a multilevel matrix decomposition algorithm (MLMDA) implementation in which the MLMDA blocks are equal to the IFS generating shape, is used to reduce the computational cost of the frequency analysis of a Sierpinski based structure.Peer Reviewe

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments

Author: Adams M.
Aydin Buluç
Buluç A.
Buluç A.
Buluç A.
Buluç A.
Buluç A.
Chakrabarti D.
Chevalier C.
Davis T. A.
Erdös P.
Hendrickson B.
John R. Gilbert
Kaplan H.
Teng S.-H.
Yamazaki I.
Yuster R.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 26/04/2012
Field of study

Generalized sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. Here we show that SpGEMM also yields efficient algorithms for general sparse-matrix indexing in distributed memory, provided that the underlying SpGEMM implementation is sufficiently flexible and scalable. We demonstrate that our parallel SpGEMM methods, which use two-dimensional block data distributions with serial hypersparse kernels, are indeed highly flexible, scalable, and memory-efficient in the general case. This algorithm is the first to yield increasing speedup on an unbounded number of processors; our experiments show scaling up to thousands of processors in a variety of test scenarios

arXiv.org e-Print Archive

Crossref

Architectures for Code-based Post-Quantum Cryptography

Author: KOLECI KRISTJANE
Publication venue: country:Italy
Publication date: 22/11/2023
Field of study

L'abstract è presente nell'allegato / the abstract is in the attachmen

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)