2,733 research outputs found
Recommended from our members
Leveraging legacy codes to distributed problem solving environments: A web service approach
This paper describes techniques used to leverage high performance legacy codes as CORBA components to a distributed problem solving environment. It first briefly introduces the software architecture adopted by the environment. Then it presents a CORBA oriented wrapper generator (COWG) which can be used to automatically wrap high performance legacy codes as CORBA components. Two legacy codes have been wrapped with COWG. One is an MPI-based molecular dynamic simulation (MDS) code, the other is a finite element based computational fluid dynamics (CFD) code for simulating incompressible Navier-Stokes flows. Performance comparisons between runs of the MDS CORBA component and the original MDS legacy code on a cluster of workstations and on a parallel computer are also presented. Wrapped as CORBA components, these legacy codes can be reused in a distributed computing environment. The first case shows that high performance can be maintained with the wrapped MDS component. The second case shows that a Web user can submit a task to the wrapped CFD component through a Web page without knowing the exact implementation of the component. In this way, a userâs desktop computing environment can be extended to a high performance computing environment using a cluster of workstations or a parallel computer
A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures
Scientific problems that depend on processing large amounts of data require
overcoming challenges in multiple areas: managing large-scale data
distribution, co-placement and scheduling of data with compute resources, and
storing and transferring large volumes of data. We analyze the ecosystems of
the two prominent paradigms for data-intensive applications, hereafter referred
to as the high-performance computing and the Apache-Hadoop paradigm. We propose
a basis, common terminology and functional factors upon which to analyze the
two approaches of both paradigms. We discuss the concept of "Big Data Ogres"
and their facets as means of understanding and characterizing the most common
application workloads found across the two paradigms. We then discuss the
salient features of the two paradigms, and compare and contrast the two
approaches. Specifically, we examine common implementation/approaches of these
paradigms, shed light upon the reasons for their current "architecture" and
discuss some typical workloads that utilize them. In spite of the significant
software distinctions, we believe there is architectural similarity. We discuss
the potential integration of different implementations, across the different
levels and components. Our comparison progresses from a fully qualitative
examination of the two paradigms, to a semi-quantitative methodology. We use a
simple and broadly used Ogre (K-means clustering), characterize its performance
on a range of representative platforms, covering several implementations from
both paradigms. Our experiments provide an insight into the relative strengths
of the two paradigms. We propose that the set of Ogres will serve as a
benchmark to evaluate the two paradigms along different dimensions.Comment: 8 pages, 2 figure
DART-MPI: An MPI-based Implementation of a PGAS Runtime System
A Partitioned Global Address Space (PGAS) approach treats a distributed
system as if the memory were shared on a global level. Given such a global view
on memory, the user may program applications very much like shared memory
systems. This greatly simplifies the tasks of developing parallel applications,
because no explicit communication has to be specified in the program for data
exchange between different computing nodes. In this paper we present DART, a
runtime environment, which implements the PGAS paradigm on large-scale
high-performance computing clusters. A specific feature of our implementation
is the use of one-sided communication of the Message Passing Interface (MPI)
version 3 (i.e. MPI-3) as the underlying communication substrate. We evaluated
the performance of the implementation with several low-level kernels in order
to determine overheads and limitations in comparison to the underlying MPI-3.Comment: 11 pages, International Conference on Partitioned Global Address
Space Programming Models (PGAS14
Scientific Computing Meets Big Data Technology: An Astronomy Use Case
Scientific analyses commonly compose multiple single-process programs into a
dataflow. An end-to-end dataflow of single-process programs is known as a
many-task application. Typically, tools from the HPC software stack are used to
parallelize these analyses. In this work, we investigate an alternate approach
that uses Apache Spark -- a modern big data platform -- to parallelize
many-task applications. We present Kira, a flexible and distributed astronomy
image processing toolkit using Apache Spark. We then use the Kira toolkit to
implement a Source Extractor application for astronomy images, called Kira SE.
With Kira SE as the use case, we study the programming flexibility, dataflow
richness, scheduling capacity and performance of Apache Spark running on the
EC2 cloud. By exploiting data locality, Kira SE achieves a 2.5x speedup over an
equivalent C program when analyzing a 1TB dataset using 512 cores on the Amazon
EC2 cloud. Furthermore, we show that by leveraging software originally designed
for big data infrastructure, Kira SE achieves competitive performance to the C
implementation running on the NERSC Edison supercomputer. Our experience with
Kira indicates that emerging Big Data platforms such as Apache Spark are a
performant alternative for many-task scientific applications
Going Stupid with EcoLab
In 2005, Railsback et al. proposed a very simple model ({\em Stupid
Model}) that could be implemented within a couple of hours, and later
extended to demonstrate the use of common ABM platform functionality. They
provided implementations of the model in several agent based modelling
platforms, and compared the platforms for ease of implementation of this simple
model, and performance. In this paper, I implement Railsback et al's Stupid
Model in the EcoLab simulation platform, a C++ based modelling platform,
demonstrating that it is a feasible platform for these sorts of models, and
compare the performance of the implementation with Repast, Mason and Swarm
versions
Checkpointing as a Service in Heterogeneous Cloud Environments
A non-invasive, cloud-agnostic approach is demonstrated for extending
existing cloud platforms to include checkpoint-restart capability. Most cloud
platforms currently rely on each application to provide its own fault
tolerance. A uniform mechanism within the cloud itself serves two purposes: (a)
direct support for long-running jobs, which would otherwise require a custom
fault-tolerant mechanism for each application; and (b) the administrative
capability to manage an over-subscribed cloud by temporarily swapping out jobs
when higher priority jobs arrive. An advantage of this uniform approach is that
it also supports parallel and distributed computations, over both TCP and
InfiniBand, thus allowing traditional HPC applications to take advantage of an
existing cloud infrastructure. Additionally, an integrated health-monitoring
mechanism detects when long-running jobs either fail or incur exceptionally low
performance, perhaps due to resource starvation, and proactively suspends the
job. The cloud-agnostic feature is demonstrated by applying the implementation
to two very different cloud platforms: Snooze and OpenStack. The use of a
cloud-agnostic architecture also enables, for the first time, migration of
applications from one cloud platform to another.Comment: 20 pages, 11 figures, appears in CCGrid, 201
High-Performance Cloud Computing: A View of Scientific Applications
Scientific computing often requires the availability of a massive number of
computers for performing large scale experiments. Traditionally, these needs
have been addressed by using high-performance computing solutions and installed
facilities such as clusters and super computers, which are difficult to setup,
maintain, and operate. Cloud computing provides scientists with a completely
new model of utilizing the computing infrastructure. Compute resources, storage
resources, as well as applications, can be dynamically provisioned (and
integrated within the existing infrastructure) on a pay per use basis. These
resources can be released when they are no more needed. Such services are often
offered within the context of a Service Level Agreement (SLA), which ensure the
desired Quality of Service (QoS). Aneka, an enterprise Cloud computing
solution, harnesses the power of compute resources by relying on private and
public Clouds and delivers to users the desired QoS. Its flexible and service
based infrastructure supports multiple programming paradigms that make Aneka
address a variety of different scenarios: from finance applications to
computational science. As examples of scientific computing in the Cloud, we
present a preliminary case study on using Aneka for the classification of gene
expression data and the execution of fMRI brain imaging workflow.Comment: 13 pages, 9 figures, conference pape
Minerva and minepy: a C engine for the MINE suite and its R, Python and MATLAB wrappers
We introduce a novel implementation in ANSI C of the MINE family of
algorithms for computing maximal information-based measures of dependence
between two variables in large datasets, with the aim of a low memory footprint
and ease of integration within bioinformatics pipelines. We provide the
libraries minerva (with the R interface) and minepy for Python, MATLAB, Octave
and C++. The C solution reduces the large memory requirement of the original
Java implementation, has good upscaling properties, and offers a native
parallelization for the R interface. Low memory requirements are demonstrated
on the MINE benchmarks as well as on large (n=1340) microarray and Illumina
GAII RNA-seq transcriptomics datasets.
Availability and Implementation: Source code and binaries are freely
available for download under GPL3 licence at http://minepy.sourceforge.net for
minepy and through the CRAN repository http://cran.r-project.org for the R
package minerva. All software is multiplatform (MS Windows, Linux and OSX).Comment: Bioinformatics 2012, in pres
- âŠ