767 research outputs found
Design and Evaluation of a Collective IO Model for Loosely Coupled Petascale Programming
Loosely coupled programming is a powerful paradigm for rapidly creating
higher-level applications from scientific programs on petascale systems,
typically using scripting languages. This paradigm is a form of many-task
computing (MTC) which focuses on the passing of data between programs as
ordinary files rather than messages. While it has the significant benefits of
decoupling producer and consumer and allowing existing application programs to
be executed in parallel with no recoding, its typical implementation using
shared file systems places a high performance burden on the overall system and
on the user who will analyze and consume the downstream data. Previous efforts
have achieved great speedups with loosely coupled programs, but have done so
with careful manual tuning of all shared file system access. In this work, we
evaluate a prototype collective IO model for file-based MTC. The model enables
efficient and easy distribution of input data files to computing nodes and
gathering of output results from them. It eliminates the need for such manual
tuning and makes the programming of large-scale clusters using a loosely
coupled model easier. Our approach, inspired by in-memory approaches to
collective operations for parallel programming, builds on fast local file
systems to provide high-speed local file caches for parallel scripts, uses a
broadcast approach to handle distribution of common input data, and uses
efficient scatter/gather and caching techniques for input and output. We
describe the design of the prototype model, its implementation on the Blue
Gene/P supercomputer, and present preliminary measurements of its performance
on synthetic benchmarks and on a large-scale molecular dynamics application.Comment: IEEE Many-Task Computing on Grids and Supercomputers (MTAGS08) 200
Data Mining and Machine Learning in Astronomy
We review the current state of data mining and machine learning in astronomy.
'Data Mining' can have a somewhat mixed connotation from the point of view of a
researcher in this field. If used correctly, it can be a powerful approach,
holding the potential to fully exploit the exponentially increasing amount of
available data, promising great scientific advance. However, if misused, it can
be little more than the black-box application of complex computing algorithms
that may give little physical insight, and provide questionable results. Here,
we give an overview of the entire data mining process, from data collection
through to the interpretation of results. We cover common machine learning
algorithms, such as artificial neural networks and support vector machines,
applications from a broad range of astronomy, emphasizing those where data
mining techniques directly resulted in improved science, and important current
and future directions, including probability density functions, parallel
algorithms, petascale computing, and the time domain. We conclude that, so long
as one carefully selects an appropriate algorithm, and is guided by the
astronomical problem at hand, data mining can be very much the powerful tool,
and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra
figures, some minor additions to the tex
3rd EGEE User Forum
We have organized this book in a sequence of chapters, each chapter associated with an application or technical theme introduced by an overview of the contents, and a summary of the main conclusions coming from the Forum for the chapter topic. The first chapter gathers all the plenary session keynote addresses, and following this there is a sequence of chapters covering the application flavoured sessions. These are followed by chapters with the flavour of Computer Science and Grid Technology. The final chapter covers the important number of practical demonstrations and posters exhibited at the Forum. Much of the work presented has a direct link to specific areas of Science, and so we have created a Science Index, presented below. In addition, at the end of this book, we provide a complete list of the institutes and countries involved in the User Forum
- …