41,154 research outputs found
Components and Interfaces of a Process Management System for Parallel Programs
Parallel jobs are different from sequential jobs and require a different type
of process management. We present here a process management system for parallel
programs such as those written using MPI. A primary goal of the system, which
we call MPD (for multipurpose daemon), is to be scalable. By this we mean that
startup of interactive parallel jobs comprising thousands of processes is
quick, that signals can be quickly delivered to processes, and that stdin,
stdout, and stderr are managed intuitively. Our primary target is parallel
machines made up of clusters of SMPs, but the system is also useful in more
tightly integrated environments. We describe how MPD enables much faster
startup and better runtime management of parallel jobs. We show how close
control of stdio can support the easy implementation of a number of convenient
system utilities, even a parallel debugger. We describe a simple but general
interface that can be used to separate any process manager from a parallel
library, which we use to keep MPD separate from MPICH.Comment: 12 pages, Workshop on Clusters and Computational Grids for Scientific
Computing, Sept. 24-27, 2000, Le Chateau de Faverges de la Tour, Franc
RELEASE: A High-level Paradigm for Reliable Large-scale Server Software
Erlang is a functional language with a much-emulated model for building reliable distributed systems. This paper outlines the RELEASE project, and describes the progress in the rst six months. The project aim is to scale the Erlang's radical concurrency-oriented programming paradigm to build reliable general-purpose software, such as server-based systems, on massively parallel machines. Currently Erlang has inherently scalable computation and reliability models, but in practice scalability is constrained by aspects of the language and virtual machine. We are working at three levels to address these challenges: evolving the Erlang virtual machine so that it can work effectively on large scale multicore systems; evolving the language to Scalable Distributed (SD) Erlang; developing a scalable Erlang infrastructure to integrate multiple, heterogeneous clusters. We are also developing state of the art tools that allow programmers to understand the behaviour of massively parallel SD Erlang programs. We will demonstrate the e ectiveness of the RELEASE approach using demonstrators and two large case studies on a Blue Gene
RELEASE: A High-level Paradigm for Reliable Large-scale Server Software
Erlang is a functional language with a much-emulated model for building reliable distributed systems. This paper outlines the RELEASE project, and describes the progress in the first six months. The project aim is to scale the Erlang’s radical concurrency-oriented programming paradigm to build reliable general-purpose software, such as server-based systems, on massively parallel machines. Currently Erlang has inherently scalable computation and reliability models, but in practice scalability is constrained by aspects of the language and virtual machine. We are working at three levels to address these challenges: evolving the Erlang virtual machine so that it can work effectively on large scale multicore systems; evolving the language to Scalable Distributed (SD) Erlang; developing a scalable Erlang infrastructure to integrate multiple, heterogeneous clusters. We are also developing state of the art tools that allow programmers to understand the behaviour of massively parallel SD Erlang programs. We will demonstrate the effectiveness of the RELEASE approach using demonstrators and two large case studies on a Blue Gene
21st Century Simulation: Exploiting High Performance Computing and Data Analysis
This paper identifies, defines, and analyzes the limitations imposed on Modeling and Simulation by outmoded
paradigms in computer utilization and data analysis. The authors then discuss two emerging capabilities to
overcome these limitations: High Performance Parallel Computing and Advanced Data Analysis. First, parallel
computing, in supercomputers and Linux clusters, has proven effective by providing users an advantage in
computing power. This has been characterized as a ten-year lead over the use of single-processor computers.
Second, advanced data analysis techniques are both necessitated and enabled by this leap in computing power.
JFCOM's JESPP project is one of the few simulation initiatives to effectively embrace these concepts. The
challenges facing the defense analyst today have grown to include the need to consider operations among non-combatant
populations, to focus on impacts to civilian infrastructure, to differentiate combatants from non-combatants,
and to understand non-linear, asymmetric warfare. These requirements stretch both current
computational techniques and data analysis methodologies. In this paper, documented examples and potential
solutions will be advanced. The authors discuss the paths to successful implementation based on their experience.
Reviewed technologies include parallel computing, cluster computing, grid computing, data logging, OpsResearch,
database advances, data mining, evolutionary computing, genetic algorithms, and Monte Carlo sensitivity analyses.
The modeling and simulation community has significant potential to provide more opportunities for training and
analysis. Simulations must include increasingly sophisticated environments, better emulations of foes, and more
realistic civilian populations. Overcoming the implementation challenges will produce dramatically better insights,
for trainees and analysts. High Performance Parallel Computing and Advanced Data Analysis promise increased
understanding of future vulnerabilities to help avoid unneeded mission failures and unacceptable personnel losses.
The authors set forth road maps for rapid prototyping and adoption of advanced capabilities. They discuss the
beneficial impact of embracing these technologies, as well as risk mitigation required to ensure success
Don't Repeat Yourself: Seamless Execution and Analysis of Extensive Network Experiments
This paper presents MACI, the first bespoke framework for the management, the
scalable execution, and the interactive analysis of a large number of network
experiments. Driven by the desire to avoid repetitive implementation of just a
few scripts for the execution and analysis of experiments, MACI emerged as a
generic framework for network experiments that significantly increases
efficiency and ensures reproducibility. To this end, MACI incorporates and
integrates established simulators and analysis tools to foster rapid but
systematic network experiments.
We found MACI indispensable in all phases of the research and development
process of various communication systems, such as i) an extensive DASH video
streaming study, ii) the systematic development and improvement of Multipath
TCP schedulers, and iii) research on a distributed topology graph pattern
matching algorithm. With this work, we make MACI publicly available to the
research community to advance efficient and reproducible network experiments
- …