2,134 research outputs found
A uniform framework for modelling nondeterministic, probabilistic, stochastic, or mixed processes and their behavioral equivalences
Labeled transition systems are typically used as behavioral models of concurrent processes, and the labeled transitions define the a one-step state-to-state reachability relation. This model can be made generalized by modifying the transition relation to associate a state reachability distribution, rather than a single target state, with any pair of source state and transition label. The state reachability distribution becomes a function mapping each possible target state to a value that expresses the degree of one-step reachability of that state. Values are taken from a preordered set equipped with a minimum that denotes unreachability. By selecting suitable preordered sets, the resulting model, called ULTraS from Uniform Labeled Transition System, can be specialized to capture well-known models of fully nondeterministic processes (LTS), fully
probabilistic processes (ADTMC), fully stochastic processes (ACTMC), and of nondeterministic and probabilistic (MDP) or nondeterministic and stochastic (CTMDP) processes. This uniform treatment of different behavioral models extends to behavioral equivalences. These can be defined on ULTraS by relying on appropriate measure functions that expresses the degree of reachability of a set of states when performing
single-step or multi-step computations. It is shown that the specializations of bisimulation, trace, and testing
equivalences for the different classes of ULTraS coincide with the behavioral equivalences defined in the literature over traditional models
Runtime-guided mitigation of manufacturing variability in power-constrained multi-socket NUMA nodes
This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493, SEV-2011-00067), by
the Spanish Ministry of Science and Innovation (contracts TIN2015-65316-P), by Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), by the RoMoL ERC Advanced Grant (GA 321253) and the European HiPEAC Network of Excellence. M. Moretó has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship number JCI-2012-15047. M. Casas is supported by the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Cofund
programme of the Marie Curie Actions of the 7th R&D Framework Programme of the European Union (Contract 2013 BP B 00243). This work was also partially performed
under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 (LLNL-CONF-689878).
Finally, the authors are grateful to the reviewers for their valuable comments, to the RoMoL team, to Xavier Teruel and Kallia Chronaki from the Programming Models group
of BSC and the Computation Department of LLNL for their technical support and useful feedback.Peer ReviewedPostprint (published version
Parallel Anisotropic Unstructured Grid Adaptation
Computational Fluid Dynamics (CFD) has become critical to the design and analysis of aerospace vehicles. Parallel grid adaptation that resolves multiple scales with anisotropy is identified as one of the challenges in the CFD Vision 2030 Study to increase the capacity and capability of CFD simulation. The Study also cautions that computer architectures are undergoing a radical change and dramatic increases in algorithm concurrency will be required to exploit full performance. This paper reviews four different methods to parallel anisotropic grid generation. They cover both ends of the spectrum: (i) using existing state-of-the-art software optimized for a single core and modifying it for parallel platforms and (ii) designing and implementing scalable software with incomplete, but rapidly maturating functionality. A brief overview for each grid adaptation system is presented in the context of a telescopic approach for multilevel concurrency. These methods employ different approaches to enable parallel execution, which provides a unique opportunity to illustrate the relative behavior of each approach. Qualitative and quantitative metric evaluations are used to draw lessons for future developments in this critical area for parallel CFD simulation
GLB: Lifeline-based Global Load Balancing library in X10
We present GLB, a programming model and an associated implementation that can
handle a wide range of irregular paral- lel programming problems running over
large-scale distributed systems. GLB is applicable both to problems that are
easily load-balanced via static scheduling and to problems that are hard to
statically load balance. GLB hides the intricate syn- chronizations (e.g.,
inter-node communication, initialization and startup, load balancing,
termination and result collection) from the users. GLB internally uses a
version of the lifeline graph based work-stealing algorithm proposed by
Saraswat et al. Users of GLB are simply required to write several pieces of
sequential code that comply with the GLB interface. GLB then schedules and
orchestrates the parallel execution of the code correctly and efficiently at
scale. We have applied GLB to two representative benchmarks: Betweenness
Centrality (BC) and Unbalanced Tree Search (UTS). Among them, BC can be
statically load-balanced whereas UTS cannot. In either case, GLB scales well--
achieving nearly linear speedup on different computer architectures (Power,
Blue Gene/Q, and K) -- up to 16K cores
Revisiting Actor Programming in C++
The actor model of computation has gained significant popularity over the
last decade. Its high level of abstraction makes it appealing for concurrent
applications in parallel and distributed systems. However, designing a
real-world actor framework that subsumes full scalability, strong reliability,
and high resource efficiency requires many conceptual and algorithmic additives
to the original model.
In this paper, we report on designing and building CAF, the "C++ Actor
Framework". CAF targets at providing a concurrent and distributed native
environment for scaling up to very large, high-performance applications, and
equally well down to small constrained systems. We present the key
specifications and design concepts---in particular a message-transparent
architecture, type-safe message interfaces, and pattern matching
facilities---that make native actors a viable approach for many robust,
elastic, and highly distributed developments. We demonstrate the feasibility of
CAF in three scenarios: first for elastic, upscaling environments, second for
including heterogeneous hardware like GPGPUs, and third for distributed runtime
systems. Extensive performance evaluations indicate ideal runtime behaviour for
up to 64 cores at very low memory footprint, or in the presence of GPUs. In
these tests, CAF continuously outperforms the competing actor environments
Erlang, Charm++, SalsaLite, Scala, ActorFoundry, and even the OpenMPI.Comment: 33 page
Design and Implementation of a Distributed Middleware for Parallel Execution of Legacy Enterprise Applications
A typical enterprise uses a local area network of computers to perform its
business. During the off-working hours, the computational capacities of these
networked computers are underused or unused. In order to utilize this
computational capacity an application has to be recoded to exploit concurrency
inherent in a computation which is clearly not possible for legacy applications
without any source code. This thesis presents the design an implementation of a
distributed middleware which can automatically execute a legacy application on
multiple networked computers by parallelizing it. This middleware runs multiple
copies of the binary executable code in parallel on different hosts in the
network. It wraps up the binary executable code of the legacy application in
order to capture the kernel level data access system calls and perform them
distributively over multiple computers in a safe and conflict free manner. The
middleware also incorporates a dynamic scheduling technique to execute the
target application in minimum time by scavenging the available CPU cycles of
the hosts in the network. This dynamic scheduling also supports the CPU
availability of the hosts to change over time and properly reschedule the
replicas performing the computation to minimize the execution time. A prototype
implementation of this middleware has been developed as a proof of concept of
the design. This implementation has been evaluated with a few typical case
studies and the test results confirm that the middleware works as expected
Weak Markovian Bisimulation Congruences and Exact CTMC-Level Aggregations for Concurrent Processes
We have recently defined a weak Markovian bisimulation equivalence in an
integrated-time setting, which reduces sequences of exponentially timed
internal actions to individual exponentially timed internal actions having the
same average duration and execution probability as the corresponding sequences.
This weak Markovian bisimulation equivalence is a congruence for sequential
processes with abstraction and turns out to induce an exact CTMC-level
aggregation at steady state for all the considered processes. However, it is
not a congruence with respect to parallel composition. In this paper, we show
how to generalize the equivalence in a way that a reasonable tradeoff among
abstraction, compositionality, and exactness is achieved for concurrent
processes. We will see that, by enhancing the abstraction capability in the
presence of concurrent computations, it is possible to retrieve the congruence
property with respect to parallel composition, with the resulting CTMC-level
aggregation being exact at steady state only for a certain subset of the
considered processes.Comment: In Proceedings QAPL 2012, arXiv:1207.055
- …