598,119 research outputs found
The Maximum Common Subgraph Problem: A Parallel and Multi-Engine Approach
The maximum common subgraph of two graphs is the largest possible common subgraph,
i.e., the common subgraph with as many vertices as possible. Even if this problem is very challenging,
as it has been long proven NP-hard, its countless practical applications still motivates searching
for exact solutions. This work discusses the possibility to extend an existing, very effective
branch-and-bound procedure on parallel multi-core and many-core architectures. We analyze
a parallel multi-core implementation that exploits a divide-and-conquer approach based on a
thread pool, which does not deteriorate the original algorithmic efficiency and it minimizes
data structure repetitions. We also extend the original algorithm to parallel many-core GPU
architectures adopting the CUDA programming framework, and we show how to handle the heavily
workload-unbalance and the massive data dependency. Then, we suggest new heuristics to reorder
the adjacency matrix, to deal with “dead-ends”, and to randomize the search with automatic restarts.
These heuristics can achieve significant speed-ups on specific instances, even if they may not be competitive with the original strategy on average. Finally, we propose a portfolio approach, which integrates all the different local search algorithms as component tools; such portfolio, rather than choosing the best tool for a given instance up-front, takes the decision on-line. The proposed approach drastically limits memory bandwidth constraints and avoids other typical portfolio fragility as CPU and GPU versions often show a complementary efficiency and run on separated platforms. Experimental results support the claims and motivate further research to better exploit GPUs in embedded task-intensive and multi-engine parallel applications
PaPaS: A Portable, Lightweight, and Generic Framework for Parallel Parameter Studies
The current landscape of scientific research is widely based on modeling and
simulation, typically with complexity in the simulation's flow of execution and
parameterization properties. Execution flows are not necessarily
straightforward since they may need multiple processing tasks and iterations.
Furthermore, parameter and performance studies are common approaches used to
characterize a simulation, often requiring traversal of a large parameter
space. High-performance computers offer practical resources at the expense of
users handling the setup, submission, and management of jobs. This work
presents the design of PaPaS, a portable, lightweight, and generic workflow
framework for conducting parallel parameter and performance studies. Workflows
are defined using parameter files based on keyword-value pairs syntax, thus
removing from the user the overhead of creating complex scripts to manage the
workflow. A parameter set consists of any combination of environment variables,
files, partial file contents, and command line arguments. PaPaS is being
developed in Python 3 with support for distributed parallelization using SSH,
batch systems, and C++ MPI. The PaPaS framework will run as user processes, and
can be used in single/multi-node and multi-tenant computing systems. An example
simulation using the BehaviorSpace tool from NetLogo and a matrix multiply
using OpenMP are presented as parameter and performance studies, respectively.
The results demonstrate that the PaPaS framework offers a simple method for
defining and managing parameter studies, while increasing resource utilization.Comment: 8 pages, 6 figures, PEARC '18: Practice and Experience in Advanced
Research Computing, July 22--26, 2018, Pittsburgh, PA, US
D-SPACE4Cloud: A Design Tool for Big Data Applications
The last years have seen a steep rise in data generation worldwide, with the
development and widespread adoption of several software projects targeting the
Big Data paradigm. Many companies currently engage in Big Data analytics as
part of their core business activities, nonetheless there are no tools and
techniques to support the design of the underlying hardware configuration
backing such systems. In particular, the focus in this report is set on Cloud
deployed clusters, which represent a cost-effective alternative to on premises
installations. We propose a novel tool implementing a battery of optimization
and prediction techniques integrated so as to efficiently assess several
alternative resource configurations, in order to determine the minimum cost
cluster deployment satisfying QoS constraints. Further, the experimental
campaign conducted on real systems shows the validity and relevance of the
proposed method
RELEASE: A High-level Paradigm for Reliable Large-scale Server Software
Erlang is a functional language with a much-emulated model for building reliable distributed systems. This paper outlines the RELEASE project, and describes the progress in the rst six months. The project aim is to scale the Erlang's radical concurrency-oriented programming paradigm to build reliable general-purpose software, such as server-based systems, on massively parallel machines. Currently Erlang has inherently scalable computation and reliability models, but in practice scalability is constrained by aspects of the language and virtual machine. We are working at three levels to address these challenges: evolving the Erlang virtual machine so that it can work effectively on large scale multicore systems; evolving the language to Scalable Distributed (SD) Erlang; developing a scalable Erlang infrastructure to integrate multiple, heterogeneous clusters. We are also developing state of the art tools that allow programmers to understand the behaviour of massively parallel SD Erlang programs. We will demonstrate the e ectiveness of the RELEASE approach using demonstrators and two large case studies on a Blue Gene
LIKWID: Lightweight Performance Tools
Exploiting the performance of today's microprocessors requires intimate
knowledge of the microarchitecture as well as an awareness of the ever-growing
complexity in thread and cache topology. LIKWID is a set of command line
utilities that addresses four key problems: Probing the thread and cache
topology of a shared-memory node, enforcing thread-core affinity on a program,
measuring performance counter metrics, and microbenchmarking for reliable upper
performance bounds. Moreover, it includes a mpirun wrapper allowing for
portable thread-core affinity in MPI and hybrid MPI/threaded applications. To
demonstrate the capabilities of the tool set we show the influence of thread
affinity on performance using the well-known OpenMP STREAM triad benchmark, use
hardware counter tools to study the performance of a stencil code, and finally
show how to detect bandwidth problems on ccNUMA-based compute nodes.Comment: 12 page
Using Cognitive Computing for Learning Parallel Programming: An IBM Watson Solution
While modern parallel computing systems provide high performance resources,
utilizing them to the highest extent requires advanced programming expertise.
Programming for parallel computing systems is much more difficult than
programming for sequential systems. OpenMP is an extension of C++ programming
language that enables to express parallelism using compiler directives. While
OpenMP alleviates parallel programming by reducing the lines of code that the
programmer needs to write, deciding how and when to use these compiler
directives is up to the programmer. Novice programmers may make mistakes that
may lead to performance degradation or unexpected program behavior. Cognitive
computing has shown impressive results in various domains, such as health or
marketing. In this paper, we describe the use of IBM Watson cognitive system
for education of novice parallel programmers. Using the dialogue service of the
IBM Watson we have developed a solution that assists the programmer in avoiding
common OpenMP mistakes. To evaluate our approach we have conducted a survey
with a number of novice parallel programmers at the Linnaeus University, and
obtained encouraging results with respect to usefulness of our approach
- …