18 research outputs found
SPar: A DSL for High-Level and Productive Stream Parallelism
This paper introduces SPar, an internal C++ Domain-Specific Language (DSL) that supports the development of classic stream parallel applications. The DSL uses standard C++ attributes to introduce annotations tagging the notable components of stream parallel applications: stream sources and stream processing stages. A set of tools process SPar code (C++ annotated code using the SPar attributes) to generate FastFlow C++ code that exploits the stream parallelism denoted by SPar annotations while targeting shared memory multi-core architectures. We outline the main SPar features along with the main implementation techniques and tools. Also, we show the results of experiments assessing the feasibility of the entire approach as well as SPar's performance and expressiveness
Raising the Parallel Abstraction Level for Streaming Analytics Applications
In the stream processing domain, applications are represented by graphs of operators arbitrarily connected and filled with their business logic code. The APIs of existing Stream Processing Systems (SPSs) ease the development of transformations that recur in the streaming practice (e.g., filtering, aggregation and joins). In contrast, their parallelism abstractions are quite limited since they provide support to stateless operators only, or when the state is organized in a set of key-value pairs. This paper presents how the parallel patterns methodology can be revisited for sliding-window streaming analytics. Our vision fosters a design process of the application as composition and nesting of ready-to-use patterns provided through a C++17 fluent interface. Our prototype implements the run-time system of the patterns in the FastFlow parallel library expressing thread-based parallelism. The experimental analysis shows interesting outcomes. First, our pattern-based approach allows easy prototyping of different versions of the application, and the programmer can leverage nesting of patterns to increase performance (up to 37% in one of the two considered test-bed cases). Second, our FastFlow implementation outperforms (three times faster) the handmade porting of our patterns in popular JVM-based SPSs. Finally, in the concluding part of this paper, we explore the use of a task-based run-time system, by deriving interesting insights into how to make our patterns library suitable for multi backends
DCDB Wintermute: Enabling Online and Holistic Operational Data Analytics on HPC Systems
As we approach the exascale era, the size and complexity of HPC systems
continues to increase, raising concerns about their manageability and
sustainability. For this reason, more and more HPC centers are experimenting
with fine-grained monitoring coupled with Operational Data Analytics (ODA) to
optimize efficiency and effectiveness of system operations. However, while
monitoring is a common reality in HPC, there is no well-stated and
comprehensive list of requirements, nor matching frameworks, to support
holistic and online ODA. This leads to insular ad-hoc solutions, each
addressing only specific aspects of the problem.
In this paper we propose Wintermute, a novel generic framework to enable
online ODA on large-scale HPC installations. Its design is based on the results
of a literature survey of common operational requirements. We implement
Wintermute on top of the holistic DCDB monitoring system, offering a large
variety of configuration options to accommodate the varying requirements of ODA
applications. Moreover, Wintermute is based on a set of logical abstractions to
ease the configuration of models at a large scale and maximize code re-use. We
highlight Wintermute's flexibility through a series of practical case studies,
each targeting a different aspect of the management of HPC systems, and then
demonstrate the small resource footprint of our implementation.Comment: Accepted for publication at the 29th ACM International Symposium on
High-Performance Parallel and Distributed Computing (HPDC 2020
Proposta de uma linguagem espec?fica de dom?nio de programa??o paralela orientada a padr?es paralelos: um estudo de caso baseado no padr?o mestre/escravo para arquiteturas multi-core
Made available in DSpace on 2015-04-14T14:49:50Z (GMT). No. of bitstreams: 1
439447.pdf: 12654350 bytes, checksum: 6b1e68a168b4468adf4d1eba9517ad21 (MD5)
Previous issue date: 2012-03-19This work proposes a Domain-Specific Language for Parallel Patterns Oriented Parallel Programming (LED-PPOPP). Its main purpose is to provide a way to decrease the amount of effort necessary to develop parallel programs, offering a way to guide developers through patterns which are implemented by the language interface. The idea is to exploit this approach avoiding large performance losses in the applications. Patterns are specialized solutions, previously studied, and used to solve a frequent problem. Thus, parallel patterns offer a higher abstraction level to organize the algorithms in the exploitation of parallelism. They also can be easily learned by inexperienced programmers and software engineers. This work carried out a case study based on the Master/Slave pattern, focusing on the parallelization of algorithms for multi-core architectures. The implementation was validated through experiments to evaluate the programming effort to write code in LED-PPOPP and the performance achieved by the parallel code automatically generated. The obtained results let us conclude that a significant reduction in the parallel programming effort occurred in comparison to the Pthreads library utilization. Additionally, the final performance of the parallelized algorithms confirms that the parallelization with LED-PPOPP does not bring on significant losses related to parallelization using OpenMP in most of the all experiments carried out.Este trabalho prop?s uma Linguagem Espec?fica de Dom?nio de Programa??o Paralela Orientada a Padr?es Paralelos (LED-PPOPP). O principal objetivo ? reduzir o esfor?o e induzir o programador a desenvolver algoritmos paralelos guiando-se atrav?s de padr?es que s?o implementados pela interface da linguagem, evitando que ocorram grandes perdas de desempenho nas aplica??es. Anteriormente estudados, os padr?es s?o solu??es especializadas e utilizadas para resolver um problema frequente. Assim, padr?es paralelos s?o descritos em um alto n?vel de abstra??o para organizar os algoritmos na explora??o do paralelismo, podendo ser facilmente interpretados por programadores inexperientes e engenheiros de software. Como ponto de partida, este trabalho realizou um estudo de caso baseandose no padr?o Mestre/Escravo, focando na paraleliza??o de algoritmos para arquiteturas multi-core. Atrav?s de experimentos para medi??o de esfor?o e desempenho, a implementa??o de estudo de caso foi avaliada obtendo bons resultados. Os resultados obtidos mostram que houve uma redu??o no esfor?o de programa??o paralela em rela??o a utiliza??o da biblioteca Pthreads. J? com rela??o ao desempenho final das aplica??es paralelizadas, foi poss?vel comprovar que a paraleliza??o com LED-PPOPP n?o acarreta perdas significativas com rela??o a paraleliza??es com OpenMP na quase totalidade das aplica??es testadas
Stream Parallelism Annotations for Multi-Core Frameworks
Data generation, collection, and processing is an important workload of modern computer architectures. Stream or high-intensity data flow applications are commonly employed in extracting and interpreting the information contained in this data. Due to the computational complexity of these applications, high-performance ought to be achieved using parallel computing. Indeed, the efficient exploitation of available parallel resources from the architecture remains a challenging task for the programmers. Techniques and methodologies are required to help shift the efforts from the complexity of parallelism exploitation to specific algorithmic solutions. To tackle this problem, we propose a methodology that provides the developer with a suitable abstraction layer between a clean and effective parallel programming interface targeting different multi-core parallel programming frameworks. We used standard C++ code annotations that may be inserted in the source code by the programmer. Then, a compiler parses C++ code with the annotations and generates calls to the desired parallel runtime API. Our experiments demonstrate the feasibility of our methodology and the performance of the abstraction layer, where the difference is negligible in four applications with respect to the state-of-the-art C++ parallel programming frameworks. Additionally, our methodology allows improving the application performance since the developers can choose the runtime that best performs in their system
Domain-specific language & support tools for high-level stream parallelism
Submitted by Setor de Tratamento da Informa??o - BC/PUCRS ([email protected]) on 2016-06-20T20:03:42Z
No. of bitstreams: 1
TES_DALVAN_JAIR_GRIEBLER_COMPLETO.pdf: 6190464 bytes, checksum: 4381302ea5fe43fe32da3f9826a1ad8a (MD5)Made available in DSpace on 2016-06-20T20:03:42Z (GMT). No. of bitstreams: 1
TES_DALVAN_JAIR_GRIEBLER_COMPLETO.pdf: 6190464 bytes, checksum: 4381302ea5fe43fe32da3f9826a1ad8a (MD5)
Previous issue date: 2016-03-30Stream-based systems are representative of several application domains including video, audio, networking, graphic processing, etc. Stream programs may run on different kinds of parallel architectures (desktop, servers, cell phones, and supercomputers) and represent significant workloads on our current computing systems. Nevertheless, most of them are still not parallelized. Moreover, when new software has to be developed, programmers often face a trade-off between coding productivity, code portability, and performance. To solve this problem, we provide a new Domain-Specific Language (DSL) that naturally/on-the-fly captures and represents parallelism for stream-based applications. The aim is to offer a set of attributes (through annotations) that preserves the program?s source code and is not architecture-dependent for annotating parallelism. We used the C++ attribute mechanism to design a ?de-facto? standard C++ embedded DSL named SPar. However, the implementation of DSLs using compiler-based tools is difficult, complicated, and usually requires a significant learning curve. This is even harder for those who are not familiar with compiler technology. Therefore, our motivation is to simplify this path for other researchers (experts in their domain) with support tools (our tool is CINCLE) to create high-level and productive DSLs through powerful and aggressive source-to-source transformations. In fact, parallel programmers can use their expertise without having to design and implement low-level code. The main goal of this thesis was to create a DSL and support tools for high-level stream parallelism in the context of a programming framework that is compiler-based and domain-oriented. Thus, we implemented SPar using CINCLE. SPar supports the software developer with productivity, performance, and code portability while CINCLE provides sufficient support to generate new DSLs. Also, SPar targets source-to-source transformation producing parallel pattern code built on top of FastFlow and MPI. Finally, we provide a full set of experiments showing that SPar provides better coding productivity without significant performance degradation in multi-core systems as well as transformation rules that are able to achieve code portability (for cluster architectures) through its generalized attributes.Sistemas baseados em fluxo cont?nuo de dados representam diversos dom?nios de aplica??es, por exemplo, video, ?udio, processamento gr?fico e de rede, etc. Os programas que processam um fluxo cont?nuo de dados podem executar em diferentes tipos de arquiteturas paralelas (esta??es de trabalho, servidores, celulares e supercomputadores) e representam cargas de trabalho significantes em nossos sistemas computacionais atuais. Mesmo assim, a maioria deles ainda n?o ? paralelizado. Al?m disso, quando um novo software precisa ser desenvolvido, os programadores necessitam lidar com solu??es que oferecem pouca produtividade de c?digo, portabilidade de c?digo e desempenho. Para resolver este problema, estamos oferecendo uma nova linguagem espec?fica de dom?nio (DSL), que naturalmente captura e representa o paralelismo para aplica??es baseadas em fluxo cont?nuo de dados. O objetivo ? oferecer um conjunto de atributos (atrav?s de anota??es) que preservam o c?digo fonte do programa e n?o ? dependente de arquitetura para anotar o paralelismo. Neste estudo foi usado o mecanismo de atributos do C++ para projetar uma DSL embarcada e padronizada com a linguagem hospedeira, que foi nomeada como SPar. No entanto, a implementa??o de DSLs usando ferramentas baseadas em compiladores ? dif?cil, complicado e geralmente requer uma curva de aprendizagem significativa. Isto ? ainda mais dif?cil para aqueles que n?o s?o familiarizados com uma tecnologia de compiladores. Portanto, a motiva??o ? simplificar este caminho para outros pesquisadores (sabedores do seu dom?nio) com ferramentas de apoio (a ferramenta ? chamada de CINCLE) para implementar DSLs produtivas e de alto n?vel atrav?s de poderosas e agressivas transforma??es de fonte para fonte. Na verdade, desenvolvedores que criam programas com paralelismo podem usar suas habilidades sem ter que projetar e implementar o c?digo de baixo n?vel. O principal objetivo desta tese foi criar uma DSL e ferramentas de apoio para paralelismo de fluxo cont?nuo de alto n?vel no contexto de um framework de programa??o que ? baseado em compilador e orientado a dom?nio. Assim, SPar foi criado usando CINCLE. SPar oferece apoio ao desenvolvedor de software com produtividade, desempenho e portabilidade de c?digo, enquanto CINCLE oferece o apoio necess?rio para gerar novas DSLs. Tamb?m, SPar mira transforma??o de fonte para fonte produzindo c?digo de padr?es paralelos no topo de FastFlow e MPI. Por fim, temos um conjunto completo de experimentos demonstrando que SPar oferece melhor produtividade de c?digo sem degradar significativamente o desempenho em sistemas multi-core bem como regras de transforma??es que s?o capazes de atingir a portabilidade de c?digo (para arquiteturas multi-computador) atrav?s dos seus atributos gen?ricos
Assessing Application Efficiency and Performance Portability in Single-Source Programming for Heterogeneous Parallel Systems
We analyze the performance portability of the skeleton-based, single-source multi-backend high-level programming framework SkePU across multiple different CPU-GPU heterogeneous systems. Thereby, we provide a systematic application efficiency characterization of SkePU-generated code in comparison to equivalent hand-written code in more low-level parallel programming models such as OpenMP and CUDA. For this purpose, we contribute ports of the STREAM benchmark suite and of a part of the NAS Parallel Benchmark suite to SkePU. We show that for STREAM and the EP benchmark, SkePU regularly scores efficiency values above 80% and in particular for CPU systems, SkePU can outperform hand-written code.Funding Agencies|ELLIIT; project GPAI; [SNIC 2021/22-971]</p
GMAP/SPBench: SPBench version v0.5-alpha
<h2>What's Changed</h2>
<ul>
<li>0.4.1 by @adrianomg in https://github.com/GMAP/SPBench/pull/2</li>
<li>V0.4.3 alpha by @adrianomg in https://github.com/GMAP/SPBench/pull/4</li>
<li>SPBench 0.4.4-alpha by @adrianomg in https://github.com/GMAP/SPBench/pull/5</li>
<li>Adds OpenMP and Threads by @renatobhf in https://github.com/GMAP/SPBench/pull/9</li>
<li>New app by @adrianomg in https://github.com/GMAP/SPBench/pull/15</li>
</ul>
<h2>New Contributors</h2>
<ul>
<li>@renatobhf made their first contribution in https://github.com/GMAP/SPBench/pull/9</li>
</ul>
<p><strong>Full Changelog</strong>: https://github.com/GMAP/SPBench/compare/v0.4-alpha...v0.5-alpha</p>
Stream parallelism with ordered data constraints on multi-core systems
It is often a challenge to keep input/output tasks/results in order for parallel computations over data streams, particularly when stateless task operators are replicated to increase parallelism when there are irregular tasks. Maintaining input/output order requires additional coding effort and may significantly impact the application’s actual throughput. Thus, we propose a new implementation technique designed to be easily integrated with any of the existing C++ parallel programming frameworks that support stream parallelism. In this paper, it is first implemented and studied using SPar, our high-level domain-specific language for stream parallelism. We discuss the results of a set of experiments with real-world applications revealing how significant performance improvements may be achieved when our proposed solution is integrated within SPar, especially for data compression applications. Also, we show the results of experiments performed after integrating our solution within FastFlow and TBB, revealing no significant overheads