5,452 research outputs found
Parallel memetic algorithms for independent job scheduling in computational grids
In this chapter we present parallel implementations of Memetic Algorithms (MAs) for the problem of scheduling independent jobs in computational grids. The problem of scheduling in computational grids is known for its high demanding computational time. In this work we exploit the intrinsic parallel nature of MAs as well as the fact that computational grids offer large amount of resources, a part of which could be used to compute the efficient allocation of jobs to grid resources.
The parallel models exploited in this work for MAs include both fine-grained and coarse-grained parallelization and their hybridization. The resulting schedulers have been tested through different grid scenarios generated by a grid simulator to match different possible configurations of computational grids in terms of size (number of jobs and resources) and computational characteristics of resources. All in all, the result of this work showed that Parallel MAs are very good alternatives in order to match different performance requirement on fast scheduling of jobs to grid resources.Peer ReviewedPostprint (author's final draft
Dynamic load balancing for the distributed mining of molecular structures
In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of
methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the
past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially
render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to
discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no
reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic
partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated
load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer
Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed
approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable
for large-scale, multi-domain, heterogeneous environments, such as computational grids
Distributed-Memory Breadth-First Search on Massive Graphs
This chapter studies the problem of traversing large graphs using the
breadth-first search order on distributed-memory supercomputers. We consider
both the traditional level-synchronous top-down algorithm as well as the
recently discovered direction optimizing algorithm. We analyze the performance
and scalability trade-offs in using different local data structures such as CSR
and DCSC, enabling in-node multithreading, and graph decompositions such as 1D
and 2D decomposition.Comment: arXiv admin note: text overlap with arXiv:1104.451
A Multi-Code Analysis Toolkit for Astrophysical Simulation Data
The analysis of complex multiphysics astrophysical simulations presents a
unique and rapidly growing set of challenges: reproducibility, parallelization,
and vast increases in data size and complexity chief among them. In order to
meet these challenges, and in order to open up new avenues for collaboration
between users of multiple simulation platforms, we present yt (available at
http://yt.enzotools.org/), an open source, community-developed astrophysical
analysis and visualization toolkit. Analysis and visualization with yt are
oriented around physically relevant quantities rather than quantities native to
astrophysical simulation codes. While originally designed for handling Enzo's
structure adaptive mesh refinement (AMR) data, yt has been extended to work
with several different simulation methods and simulation codes including Orion,
RAMSES, and FLASH. We report on its methods for reading, handling, and
visualizing data, including projections, multivariate volume rendering,
multi-dimensional histograms, halo finding, light cone generation and
topologically-connected isocontour identification. Furthermore, we discuss the
underlying algorithms yt uses for processing and visualizing data, and its
mechanisms for parallelization of analysis tasks.Comment: 18 pages, 6 figures, emulateapj format. Resubmitted to Astrophysical
Journal Supplement Series with revisions from referee. yt can be found at
http://yt.enzotools.org
Distributed mining of molecular fragments
In real world applications sequential algorithms of
data mining and data exploration are often unsuitable for
datasets with enormous size, high-dimensionality and complex
data structure. Grid computing promises unprecedented
opportunities for unlimited computing and storage resources. In this context there is the necessity to develop
high performance distributed data mining algorithms.
However, the computational complexity of the problem and
the large amount of data to be explored often make the design of large scale applications particularly challenging. In this paper we present the first distributed formulation of a frequent subgraph mining algorithm for discriminative fragments of molecular compounds. Two distributed approaches have been developed and compared on the well known National Cancer Institute’s HIV-screening dataset. We present experimental results on a small-scale computing environment
Accelerating scientific codes by performance and accuracy modeling
Scientific software is often driven by multiple parameters that affect both
accuracy and performance. Since finding the optimal configuration of these
parameters is a highly complex task, it extremely common that the software is
used suboptimally. In a typical scenario, accuracy requirements are imposed,
and attained through suboptimal performance. In this paper, we present a
methodology for the automatic selection of parameters for simulation codes, and
a corresponding prototype tool. To be amenable to our methodology, the target
code must expose the parameters affecting accuracy and performance, and there
must be formulas available for error bounds and computational complexity of the
underlying methods. As a case study, we consider the particle-particle
particle-mesh method (PPPM) from the LAMMPS suite for molecular dynamics, and
use our tool to identify configurations of the input parameters that achieve a
given accuracy in the shortest execution time. When compared with the
configurations suggested by expert users, the parameters selected by our tool
yield reductions in the time-to-solution ranging between 10% and 60%. In other
words, for the typical scenario where a fixed number of core-hours are granted
and simulations of a fixed number of timesteps are to be run, usage of our tool
may allow up to twice as many simulations. While we develop our ideas using
LAMMPS as computational framework and use the PPPM method for dispersion as
case study, the methodology is general and valid for a range of software tools
and methods
Lattice QCD Thermodynamics on the Grid
We describe how we have used simultaneously nodes of the
EGEE Grid, accumulating ca. 300 CPU-years in 2-3 months, to determine an
important property of Quantum Chromodynamics. We explain how Grid resources
were exploited efficiently and with ease, using user-level overlay based on
Ganga and DIANE tools above standard Grid software stack. Application-specific
scheduling and resource selection based on simple but powerful heuristics
allowed to improve efficiency of the processing to obtain desired scientific
results by a specified deadline. This is also a demonstration of combined use
of supercomputers, to calculate the initial state of the QCD system, and Grids,
to perform the subsequent massively distributed simulations. The QCD simulation
was performed on a lattice. Keeping the strange quark mass at
its physical value, we reduced the masses of the up and down quarks until,
under an increase of temperature, the system underwent a second-order phase
transition to a quark-gluon plasma. Then we measured the response of this
system to an increase in the quark density. We find that the transition is
smoothened rather than sharpened. If confirmed on a finer lattice, this finding
makes it unlikely for ongoing experimental searches to find a QCD critical
point at small chemical potential
Recent Advances in Graph Partitioning
We survey recent trends in practical algorithms for balanced graph
partitioning together with applications and future research directions
- …