5,683 research outputs found
Fault Tolerant Distributed Computing Framework for Scientific Algorithms
Arvuti riistvara füüsilised piirangud on lõpetanud protsessorite tuumade arvutusvõimsuse suurenemist, kuid arvutiarhitektuuride suurenev parallelsus säilitab Moore'i seaduse kehtivust. Samal ajal tõuseb arvutusvõimsuse nõudlus pidevalt, sundides inimesi kohandada algoritme paralleelsete arhitektuuride kasutamiseks. Üks paljudest paralleelsete arhitektuuride probleemidest on tõrkete tekkimise tõenäosuse suurenemine parallelsete komponentide arvu suurenemisega. Piinlikult paralleelsete ja andmemahukate algoritmidega seoses on MapReduce läbinud pika tee, et tagada kasutajatele suure hulga hajutatud arvutiressursside lihtsustatud kasutamine ilma töö kaotamise hirmuta. Sama ei sa öelda kommunikatsiooni intensiivsete algoritmide jaoks mis on levinud teadusarvutuse domeenis. Selles töös on pakutud uus BSP ({\it Bulk Synchronous Parallel}) inspireeritud parallelprogrammeerimise mudel, mille lähenemisviis on sarnane {\it continuation passing} programmeerimis stiiliga ja mis võimaldab rakendada BSP struktuuril baseeruvat loomulikku tõrkekindlust. Töös on kirjeldatud loodud hajusarvutuste raamistik NEWT, mis põhineb pakutud mudelil ja on kasutatud selle lähenemisviisi valideerimiseks. Raamistik säilitab enamik MapReduce eelisi ning efektiivsemalt toetab suuremat algoritmide hulka, nagu näiteks eelmainitud iteratiivsed algoritmid.The physical limitations of computing hardware have put a stop on the increase of a single processor core's computing power. However, Moore's law is still maintained through the ever increasing parallelism of the computing architectures. At the same time the demand for computational power has been unrelentingly growing, forcing people to adapt the algorithms they use to these parallel architectures. One of the many downsides to parallel architectures is that with the rise in the number of components, the chance of failure of one of these components increases. When it comes to embarrassingly parallel data-intensive algorithms, Map-Reduce has gone a long way in ensuring users can easily utilize large amounts of distributed computing resources without the fear of losing work. However, this does not apply to iterative communication-intensive algorithms common in the scientific computing domain. In this work a new BSP-inspired (Bulk Synchronous Parallel) programming model is proposed, which adopts an approach similar to continuation passing for implementing parallel algorithms and facilitates fault-tolerance inherent in the BSP program structure. The distributed computing framework NEWT, which is based on the proposed model, is described and used to validate the approach. The framework retains most of the advantages that Map-Reduce provides, yet efficiently supports a larger assortment of algorithms, such as the aforementioned iterative ones
The Parallelism Motifs of Genomic Data Analysis
Genomic data sets are growing dramatically as the cost of sequencing
continues to decline and small sequencing devices become available. Enormous
community databases store and share this data with the research community, but
some of these genomic data analysis problems require large scale computational
platforms to meet both the memory and computational requirements. These
applications differ from scientific simulations that dominate the workload on
high end parallel systems today and place different requirements on programming
support, software libraries, and parallel architectural design. For example,
they involve irregular communication patterns such as asynchronous updates to
shared data structures. We consider several problems in high performance
genomics analysis, including alignment, profiling, clustering, and assembly for
both single genomes and metagenomes. We identify some of the common
computational patterns or motifs that help inform parallelization strategies
and compare our motifs to some of the established lists, arguing that at least
two key patterns, sorting and hashing, are missing
Parallel Processing of Large Graphs
More and more large data collections are gathered worldwide in various IT
systems. Many of them possess the networked nature and need to be processed and
analysed as graph structures. Due to their size they require very often usage
of parallel paradigm for efficient computation. Three parallel techniques have
been compared in the paper: MapReduce, its map-side join extension and Bulk
Synchronous Parallel (BSP). They are implemented for two different graph
problems: calculation of single source shortest paths (SSSP) and collective
classification of graph nodes by means of relational influence propagation
(RIP). The methods and algorithms are applied to several network datasets
differing in size and structural profile, originating from three domains:
telecommunication, multimedia and microblog. The results revealed that
iterative graph processing with the BSP implementation always and
significantly, even up to 10 times outperforms MapReduce, especially for
algorithms with many iterations and sparse communication. Also MapReduce
extension based on map-side join usually noticeably presents better efficiency,
although not as much as BSP. Nevertheless, MapReduce still remains the good
alternative for enormous networks, whose data structures do not fit in local
memories.Comment: Preprint submitted to Future Generation Computer System
- …