Search CORE

84 research outputs found

Design and Implementation of MapReduce using the PGAS Programming Model with UPC

Author: Doallo Ramón
López Taboada Guillermo
Teijeiro Barjas Carlos
Touriño Juan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/01/2012
Field of study

This is a post-peer-review, pre-copyedit version of an article published in International Conference on Parallel and Distributed Systems. Proceedings. The final authenticated version is available online at: http://dx.doi.org/10.1109/ICPADS.2011.162[Abstract] MapReduce is a powerful tool for processing large data sets used by many applications running in distributed environments. However, despite the increasing number of computationally intensive problems that require low-latency communications, the adoption of MapReduce in High Performance Computing (HPC) is still emerging. Here languages based on the Partitioned Global Address Space (PGAS) programming model have shown to be a good choice for implementing parallel applications, in order to take advantage of the increasing number of cores per node and the programmability benefits achieved by their global memory view, such as the transparent access to remote data. This paper presents the first PGAS-based MapReduce implementation that uses the Unified Parallel C (UPC) language, which (1) obtains programmability benefits in parallel programming, (2) offers advanced configuration options to define a customized load distribution for different codes, and (3) overcomes performance penalties and bottlenecks that have traditionally prevented the deployment of MapReduce applications in HPC. The performance evaluation of representative applications on shared and distributed memory environments assesses the scalability of the presented MapReduce framework, confirming its suitability.Ministerio de Ciencia e Innovación; TIN2010-1673

Repositorio da Universidade da Coruña

Crossref

High-Performance Computational and Information Technologies for Numerical Models and Data Processing

Author: Aidarov Kanat
Akhmed-Zaki Darkhan
Aubakirov Sanzhar
Daribayev Beimbet
Imankulov Timur
Lebedev Danil
Mansurova Madina
Shomanov Aday
Turar Olzhas
Publication venue: 'IntechOpen'
Publication date: 30/05/2018
Field of study

This chapter discusses high-performance computational and information technologies for numerical models and data processing. In the first part of the chapter, the numerical model of the oil displacement problem was considered by injection of chemical reagents to increase oil recovery of reservoir. Moreover the fragmented algorithm was developed for solving this problem and the algorithm for high-performance visualization of calculated data. Analysis and comparison of parallel algorithms based on the fragmented approach and using MPI technologies are also presented. The algorithm for solving given problem on mobile platforms and analysis of computational results is given too. In the second part of the chapter, the problem of unstructured and semi-structured data processing was considered. It was decided to address the task of n-gram extraction which requires a lot of computing with large amount of textual data. In order to deal with such complexity, there was a need to adopt and implement parallelization patterns. The second part of the chapter also describes parallel implementation of the document clustering algorithm that used a heuristic genetic algorithm. Finally, a novel UPC implementation of MapReduce framework for semi-structured data processing was introduced which allows to express data parallel applications using simple sequential code

IntechOpen

Crossref

Extended collectives library for unified parallel C

Author: Teijeiro Barjas Carlos
Publication venue
Publication date: 01/01/2013
Field of study

[Abstract] Current multicore processors mitigate single-core processor problems (e.g., power, memory and instruction-level parallelism walls), but they have raised the programmability wall. In this scenario, the use of a suitable parallel programming model is key to facilitate a paradigm shift from sequential application development while maximizing the productivity of code developers. At this point, the PGAS (Partitioned Global Address Space) paradigm represents a relevant research advance for its application to multicore systems, as its memory model, with a shared memory view while providing private memory for taking advantage of data locality, mimics the memory structure provided by these architectures. Unified Parallel C (UPC), a PGAS-based extension of ANSI C, has been grabbing the attention of developers for the last years. Nevertheless, the focus on improving performance of current UPC compilers/ runtimes has been relegating the goal of providing higher programmability, where the available constructs have not always guaranteed good performance. Therefore, this Thesis focuses on making original contributions to the state of the art of UPC programmability by means of two main tasks: (1) presenting an analytical and empirical study of the features of the language, and (2) providing new functionalities that favor programmability, while not hampering performance. Thus, the main contribution of this Thesis is the development of a library of extended collective functions, which complements and improves the existing UPC standard library with programmable constructs based on efficient algorithms. A UPC MapReduce framework (UPC-MR) has also been implemented to support this highly scalable computing model for UPC applications. Finally, the analysis and development of relevant kernels and applications (e.g., a large parallel particle simulation based on Brownian dynamics) confirm the usability of these libraries, concluding that UPC can provide high performance and scalability, especially for environments with a large number of threads at a competitive development cost

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

The Parallelism Motifs of Genomic Data Analysis

Author: Awan Muaaz
Azad Ariful
Brock Benjamin
Buluc Aydin
Egan Rob
Ekanayake Saliya
Ellis Marquita
Georganas Evangelos
Guidi Giulia
Hofmeyr Steven
Oliker Leonid
Selvitopi Oguz
Teodoropol Cristina
Yelick Katherine
Publication venue: 'The Royal Society'
Publication date: 20/01/2020
Field of study

Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

arXiv.org e-Print Archive

eScholarship - University of California

A system’s approach to cache hierarchy-aware decomposition of data-parallel computations

Author: Delgado Nuno Miguel de Brito
Publication venue: Faculdade de Ciências e Tecnologia
Publication date: 01/01/2014
Field of study

Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaThe architecture of nowadays’ processors is very complex, comprising several computational cores and an intricate hierarchy of cache memories. The latter, in particular, differ considerably between the many processors currently available in the market, resulting in a wide variety of configurations. Application development is typically oblivious of this complexity and diversity, taking only into consideration the number of available execution cores. This oblivion prevents such applications from fully harnessing the computing power available in these architectures. This problem has been recognized by the community, which has proposed languages and models to express and tune applications according to the underlying machine’s hierarchy. These, however, lack the desired abstraction level, forcing the programmer to have deep knowledge of computer architecture and parallel programming, in order to ensure performance portability across a wide range of architectures. Realizing these limitations, the goal of this thesis is to delegate these hierarchy-aware optimizations to the runtime system. Accordingly, the programmer’s responsibilities are confined to the definition of procedures for decomposing an application’s domain, into an arbitrary number of partitions. With this, the programmer has only to reason about the application’s data representation and manipulation. We prototyped our proposal on top of a Java parallel programming framework, and evaluated it from a performance perspective, against cache neglectful domain decompositions. The results demonstrate that our optimizations deliver significant speedups against decomposition strategies based solely on the number of execution cores, without requiring the programmer to reason about the machine’s hardware. These facts allow us to conclude that it is possible to obtain performance gains by transferring hierarchyaware optimizations concerns to the runtime system

Repositório da Universidade Nova de Lisboa

Exascale machines require new programming paradigms and runtimes

Author: Astsatryan Hrachya
Da Costa Georges
Fahringer Thomas
Grasso Ivan
Hristov Atanas
Karatza Helen D.
Lastovetsky Alexey
Marozzo Fabrizio
Petcu Dana
Rico-Gallego Juan-Antonio
Stavrinides Georgios L.
Talia Domenico
Trufio Paolo
Publication venue
Publication date: 01/01/2015
Field of study

Extreme scale parallel computing systems will have tens of thousands of optionally accelerator-equiped nodes with hundreds of cores each, as well as deep memory hierarchies and complex interconnect topologies. Such Exascale systems will provide hardware parallelism at multiple levels and will be energy constrained. Their extreme scale and the rapidly deteriorating reliablity of their hardware components means that Exascale systems will exhibit low mean-time-between-failure values. Furthermore, existing programming models already require heroic programming and optimisation efforts to achieve high efficiency on current supercomputers. Invariably, these efforts are platform-specific and non-portable. In this paper we will explore the shortcomings of existing programming models and runtime systems for large scale computing systems. We then propose and discuss important features of programming paradigms and runtime system to deal with large scale computing systems with a special focus on data-intensive applications and resilience. Finally, we also discuss code sustainability issues and propose several software metrics that are of paramount importance for code development for large scale computing systems

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Open Access Repository

Doctor of Philosophy

Author: Tew Kevin B.
Publication venue: University of Utah
Publication date: 01/08/2013
Field of study

dissertationPlaces and distributed places bring new support for message-passing parallelism to Racket. This dissertation describes the programming model and how Racket's sequential runtime-system was modified to support places and distributed places. The freedom to design the places programming model helped make the implementation tractable; specifically, the conventional pain of adding just the right amount of locking to a big, legacy runtime system was avoided. The dissertation presents an evaluation of the places design that includes both real-world applications and standard parallel benchmarks. Distributed places are introduced as a language extension of the places design and architecture. The distributed places extension augments places with the features of remote process launch, remote place invocation, and distributed message passing. Distributed places provide a foundation for constructing higher-level distributed frameworks. Example implementations of RPC, MPI, map reduce, and nested data parallelism demonstrate the extensibility of the distributed places API

The University of Utah: J. Willard Marriott Digital Library