52,185 research outputs found
ReSHAPE: A Framework for Dynamic Resizing and Scheduling of Homogeneous Applications in a Parallel Environment
Applications in science and engineering often require huge computational
resources for solving problems within a reasonable time frame. Parallel
supercomputers provide the computational infrastructure for solving such
problems. A traditional application scheduler running on a parallel cluster
only supports static scheduling where the number of processors allocated to an
application remains fixed throughout the lifetime of execution of the job. Due
to the unpredictability in job arrival times and varying resource requirements,
static scheduling can result in idle system resources thereby decreasing the
overall system throughput. In this paper we present a prototype framework
called ReSHAPE, which supports dynamic resizing of parallel MPI applications
executed on distributed memory platforms. The framework includes a scheduler
that supports resizing of applications, an API to enable applications to
interact with the scheduler, and a library that makes resizing viable.
Applications executed using the ReSHAPE scheduler framework can expand to take
advantage of additional free processors or can shrink to accommodate a high
priority application, without getting suspended. In our research, we have
mainly focused on structured applications that have two-dimensional data arrays
distributed across a two-dimensional processor grid. The resize library
includes algorithms for processor selection and processor mapping. Experimental
results show that the ReSHAPE framework can improve individual job turn-around
time and overall system throughput.Comment: 15 pages, 10 figures, 5 tables Submitted to International Conference
on Parallel Processing (ICPP'07
Better Process Mapping and Sparse Quadratic Assignment
Communication and topology aware process mapping is a powerful approach to reduce communication time in parallel applications with known communication patterns on large, distributed memory systems. We address the problem as a quadratic assignment problem (QAP), and present algorithms to construct initial mappings of processes to processors as well as fast local search algorithms to further improve the mappings. By exploiting assumptions that typically hold for applications and modern supercomputer systems such as sparse communication patterns and hierarchically organized communication systems, we arrive at significantly more powerful algorithms for these special QAPs. Our multilevel construction algorithms employ recently developed, perfectly balanced graph partitioning techniques and excessively exploit the given communication system hierarchy. We present improvements to a local search algorithm of Brandfass et al. (2013), and decrease the running time by reducing the time needed to perform swaps in the assignment as well as by carefully constraining local search neighborhoods. Experiments indicate that our algorithms not only dramatically speed up local search, but due to the multilevel approach also find much better solutions in practice
A Review on Software Architectures for Heterogeneous Platforms
The increasing demands for computing performance have been a reality
regardless of the requirements for smaller and more energy efficient devices.
Throughout the years, the strategy adopted by industry was to increase the
robustness of a single processor by increasing its clock frequency and mounting
more transistors so more calculations could be executed. However, it is known
that the physical limits of such processors are being reached, and one way to
fulfill such increasing computing demands has been to adopt a strategy based on
heterogeneous computing, i.e., using a heterogeneous platform containing more
than one type of processor. This way, different types of tasks can be executed
by processors that are specialized in them. Heterogeneous computing, however,
poses a number of challenges to software engineering, especially in the
architecture and deployment phases. In this paper, we conduct an empirical
study that aims at discovering the state-of-the-art in software architecture
for heterogeneous computing, with focus on deployment. We conduct a systematic
mapping study that retrieved 28 studies, which were critically assessed to
obtain an overview of the research field. We identified gaps and trends that
can be used by both researchers and practitioners as guides to further
investigate the topic
Scalable data abstractions for distributed parallel computations
The ability to express a program as a hierarchical composition of parts is an
essential tool in managing the complexity of software and a key abstraction
this provides is to separate the representation of data from the computation.
Many current parallel programming models use a shared memory model to provide
data abstraction but this doesn't scale well with large numbers of cores due to
non-determinism and access latency. This paper proposes a simple programming
model that allows scalable parallel programs to be expressed with distributed
representations of data and it provides the programmer with the flexibility to
employ shared or distributed styles of data-parallelism where applicable. It is
capable of an efficient implementation, and with the provision of a small set
of primitive capabilities in the hardware, it can be compiled to operate
directly on the hardware, in the same way stack-based allocation operates for
subroutines in sequential machines
Design of object processing systems
Object processing systems are met rather often in every day life, in industry, tourism, commerce, etc. When designing such a system, many problems can be posed and considered, depending on the scope and purpose of design. We give here a general approach which involves graph theory, and which can have many applications. The generation of possible designs for an object processing system, known as synthesis in the engineering field, is reduced to first solving a graph embedding problem. We believe that our model could be successful and relatively easily implemented in a software tool, called Smart Synthesis Tool, so that the engineering design process will perform quicker. We propose three types of graph transformations which aid the way an object processing system can be designed. Future work will show to which extent these transformation types suffice for generating most of the layouts of the object processing systems
Montage: a grid portal and software toolkit for science-grade astronomical image mosaicking
Montage is a portable software toolkit for constructing custom, science-grade
mosaics by composing multiple astronomical images. The mosaics constructed by
Montage preserve the astrometry (position) and photometry (intensity) of the
sources in the input images. The mosaic to be constructed is specified by the
user in terms of a set of parameters, including dataset and wavelength to be
used, location and size on the sky, coordinate system and projection, and
spatial sampling rate. Many astronomical datasets are massive, and are stored
in distributed archives that are, in most cases, remote with respect to the
available computational resources. Montage can be run on both single- and
multi-processor computers, including clusters and grids. Standard grid tools
are used to run Montage in the case where the data or computers used to
construct a mosaic are located remotely on the Internet. This paper describes
the architecture, algorithms, and usage of Montage as both a software toolkit
and as a grid portal. Timing results are provided to show how Montage
performance scales with number of processors on a cluster computer. In
addition, we compare the performance of two methods of running Montage in
parallel on a grid.Comment: 16 pages, 11 figure
- …