74,378 research outputs found
Open-architecture Implementation of Fragment Molecular Orbital Method for Peta-scale Computing
We present our perspective and goals on highperformance computing for
nanoscience in accordance with the global trend toward "peta-scale computing."
After reviewing our results obtained through the grid-enabled version of the
fragment molecular orbital method (FMO) on the grid testbed by the Japanese
Grid Project, National Research Grid Initiative (NAREGI), we show that FMO is
one of the best candidates for peta-scale applications by predicting its
effective performance in peta-scale computers. Finally, we introduce our new
project constructing a peta-scale application in an open-architecture
implementation of FMO in order to realize both goals of highperformance in
peta-scale computers and extendibility to multiphysics simulations.Comment: 6 pages, 9 figures, proceedings of the 2nd IEEE/ACM international
workshop on high performance computing for nano-science and technology
(HPCNano06
Adaptive structured parallelism for computational grids
Algorithmic skeletons abstract commonly-used patterns of parallel computation, communication, and interaction. They provide top-down design composition and control inheritance throughout the whole structure. Parallel programs are expressed by interweaving parameterised skeletons analogously to the way sequential structured programs are constructed.
This design paradigm, known as structured parallelism, provides a high-level parallel programming method which allows the abstract description of programs and fosters portability. That is to say, structured parallelism requires the description of the algorithm rather than its implementation, providing a clear and consistent meaning across platforms while their associated structure depends on the particular implementation. By decoupling the structure from the meaning of a parallel program, it benefits entirely from any performance improvements in the systems infrastructure
A Taxonomy of Workflow Management Systems for Grid Computing
With the advent of Grid and application technologies, scientists and
engineers are building more and more complex applications to manage and process
large data sets, and execute scientific experiments on distributed resources.
Such application scenarios require means for composing and executing complex
workflows. Therefore, many efforts have been made towards the development of
workflow management systems for Grid computing. In this paper, we propose a
taxonomy that characterizes and classifies various approaches for building and
executing workflows on Grids. We also survey several representative Grid
workflow systems developed by various projects world-wide to demonstrate the
comprehensiveness of the taxonomy. The taxonomy not only highlights the design
and engineering similarities and differences of state-of-the-art in Grid
workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure
A compiler approach to scalable concurrent program design
The programmer's most powerful tool for controlling complexity in program design is abstraction. We seek to use abstraction in the design of concurrent programs, so as to
separate design decisions concerned with decomposition, communication, synchronization, mapping, granularity, and load balancing. This paper describes programming and compiler techniques intended to facilitate this design strategy. The programming techniques are based on a core programming notation with two important properties: the ability to separate concurrent programming concerns, and extensibility with reusable programmer-defined
abstractions. The compiler techniques are based on a simple transformation system together with a set of compilation transformations and portable run-time support. The
transformation system allows programmer-defined abstractions to be defined as source-to-source transformations that convert abstractions into the core notation. The same
transformation system is used to apply compilation transformations that incrementally transform the core notation toward an abstract concurrent machine. This machine can be implemented on a variety of concurrent architectures using simple run-time support.
The transformation, compilation, and run-time system techniques have been implemented and are incorporated in a public-domain program development toolkit. This
toolkit operates on a wide variety of networked workstations, multicomputers, and shared-memory
multiprocessors. It includes a program transformer, concurrent compiler, syntax checker, debugger, performance analyzer, and execution animator. A variety of substantial
applications have been developed using the toolkit, in areas such as climate modeling and fluid dynamics
Supporting simulation in industry through the application of grid computing
An increased need for collaborative research, together with continuing advances in communication technology and computer hardware, has facilitated the development of distributed systems that can provide users access to geographically dispersed computing resources that are administered in multiple computer domains. The term grid computing, or grids, is popularly used to refer to such distributed systems. Simulation is characterized by the need to run multiple sets of computationally intensive experiments. Large scale scientific simulations have traditionally been the primary benefactor of grid computing. The application of this technology to simulation in industry has, however, been negligible. This research investigates how grid technology can be effectively exploited by users to model simulations in industry. It introduces our desktop grid, WinGrid, and presents a case study conducted at a leading European investment bank. Results indicate that grid computing does indeed hold promise for simulation in industry
Parallel detrended fluctuation analysis for fast event detection on massive PMU data
("(c) 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.")Phasor measurement units (PMUs) are being rapidly deployed in power grids due to their high sampling rates and synchronized measurements. The devices high data reporting rates present major computational challenges in the requirement to process potentially massive volumes of data, in addition to new issues surrounding data storage. Fast algorithms capable of processing massive volumes of data are now required in the field of power systems. This paper presents a novel parallel detrended fluctuation analysis (PDFA) approach for fast event detection on massive volumes of PMU data, taking advantage of a cluster computing platform. The PDFA algorithm is evaluated using data from installed PMUs on the transmission system of Great Britain from the aspects of speedup, scalability, and accuracy. The speedup of the PDFA in computation is initially analyzed through Amdahl's Law. A revision to the law is then proposed, suggesting enhancements to its capability to analyze the performance gain in computation when parallelizing data intensive applications in a cluster computing environment
Efficient Parallelization of Short-Range Molecular Dynamics Simulations on Many-Core Systems
This article introduces a highly parallel algorithm for molecular dynamics
simulations with short-range forces on single node multi- and many-core
systems. The algorithm is designed to achieve high parallel speedups for
strongly inhomogeneous systems like nanodevices or nanostructured materials. In
the proposed scheme the calculation of the forces and the generation of
neighbor lists is divided into small tasks. The tasks are then executed by a
thread pool according to a dependent task schedule. This schedule is
constructed in such a way that a particle is never accessed by two threads at
the same time.Benchmark simulations on a typical 12 core machine show that the
described algorithm achieves excellent parallel efficiencies above 80 % for
different kinds of systems and all numbers of cores. For inhomogeneous systems
the speedups are strongly superior to those obtained with spatial
decomposition. Further benchmarks were performed on an Intel Xeon Phi
coprocessor. These simulations demonstrate that the algorithm scales well to
large numbers of cores.Comment: 12 pages, 8 figure
- …