Search CORE

8,425 research outputs found

A Bulk-Parallel Priority Queue in External Memory with STXXL

Author: GS Brodal
J Singler
JS Vitter
L Arge
MC Pinotti
N Deo
P Sanders
P Sanders
PJ Varman
R Dementiev
Publication venue
Publication date: 01/01/2015
Field of study

We propose the design and an implementation of a bulk-parallel external memory priority queue to take advantage of both shared-memory parallelism and high external memory transfer speeds to parallel disks. To achieve higher performance by decoupling item insertions and extractions, we offer two parallelization interfaces: one using "bulk" sequences, the other by defining "limit" items. In the design, we discuss how to parallelize insertions using multiple heaps, and how to calculate a dynamic prediction sequence to prefetch blocks and apply parallel multiway merge for extraction. Our experimental results show that in the selected benchmarks the priority queue reaches 75% of the full parallel I/O bandwidth of rotational disks and and 65% of SSDs, or the speed of sorting in external memory when bounded by computation.Comment: extended version of SEA'15 conference pape

arXiv.org e-Print Archive

Crossref

KITopen

The Parallelism Motifs of Genomic Data Analysis

Author: Awan Muaaz
Azad Ariful
Brock Benjamin
Buluc Aydin
Egan Rob
Ekanayake Saliya
Ellis Marquita
Georganas Evangelos
Guidi Giulia
Hofmeyr Steven
Oliker Leonid
Selvitopi Oguz
Teodoropol Cristina
Yelick Katherine
Publication venue: 'The Royal Society'
Publication date: 20/01/2020
Field of study

Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

arXiv.org e-Print Archive

eScholarship - University of California

Actors vs Shared Memory: two models at work on Big Data application frameworks

Author: Crafa Silvia
Tronchin Luca
Publication venue
Publication date: 01/01/2015
Field of study

This work aims at analyzing how two different concurrency models, namely the shared memory model and the actor model, can influence the development of applications that manage huge masses of data, distinctive of Big Data applications. The paper compares the two models by analyzing a couple of concrete projects based on the MapReduce and Bulk Synchronous Parallel algorithmic schemes. Both projects are doubly implemented on two concrete platforms: Akka Cluster and Managed X10. The result is both a conceptual comparison of models in the Big Data Analytics scenario, and an experimental analysis based on concrete executions on a cluster platform

arXiv.org e-Print Archive

CiteSeerX

Archivio istituzionale della ricerca - Università di Padova

Communications

Author: Field Alexander J.
Publication venue: Scholar Commons
Publication date: 01/05/2006
Field of study

The communications sector of an economy comprises a range of technologies, physical media, and institutions/rules that facilitate the storage of information through means other than a society\u27s oral tradition and the transmission of that information over distances beyond the normal reach of human conversation. This chapter provides data on the historical evolution of a disparate range of industries and institutions contributing to the movement and storage of information in the United States over the past two centuries. These include the U.S. Postal Service, the newspaper industry, book publishing, the telegraph, wired and cellular telephone service, radio and television, and the Internet

Scholar Commons - Santa Clara University