38,064 research outputs found
A Taxonomy of Workflow Management Systems for Grid Computing
With the advent of Grid and application technologies, scientists and
engineers are building more and more complex applications to manage and process
large data sets, and execute scientific experiments on distributed resources.
Such application scenarios require means for composing and executing complex
workflows. Therefore, many efforts have been made towards the development of
workflow management systems for Grid computing. In this paper, we propose a
taxonomy that characterizes and classifies various approaches for building and
executing workflows on Grids. We also survey several representative Grid
workflow systems developed by various projects world-wide to demonstrate the
comprehensiveness of the taxonomy. The taxonomy not only highlights the design
and engineering similarities and differences of state-of-the-art in Grid
workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure
Early Accurate Results for Advanced Analytics on MapReduce
Approximate results based on samples often provide the only way in which
advanced analytical applications on very massive data sets can satisfy their
time and resource constraints. Unfortunately, methods and tools for the
computation of accurate early results are currently not supported in
MapReduce-oriented systems although these are intended for `big data'.
Therefore, we proposed and implemented a non-parametric extension of Hadoop
which allows the incremental computation of early results for arbitrary
work-flows, along with reliable on-line estimates of the degree of accuracy
achieved so far in the computation. These estimates are based on a technique
called bootstrapping that has been widely employed in statistics and can be
applied to arbitrary functions and data distributions. In this paper, we
describe our Early Accurate Result Library (EARL) for Hadoop that was designed
to minimize the changes required to the MapReduce framework. Various tests of
EARL of Hadoop are presented to characterize the frequent situations where EARL
can provide major speed-ups over the current version of Hadoop.Comment: VLDB201
Developing Real-Time Emergency Management Applications: Methodology for a Novel Programming Model Approach
The last years have been characterized by the arising of highly distributed computing
platforms composed of a heterogeneity of computing and communication resources including
centralized high-performance computing architectures (e.g. clusters or large shared-memory
machines), as well as multi-/many-core components also integrated into mobile nodes
and network facilities. The emerging of computational paradigms such as Grid and Cloud
Computing, provides potential solutions to integrate such platforms with data systems, natural
phenomena simulations, knowledge discovery and decision support systems responding to a
dynamic demand of remote computing and communication resources and services.
In this context time-critical applications, notably emergency management systems, are
composed of complex sets of application components specialized for executing specific
computations, which are able to cooperate in such a way as to perform a global goal in a
distributed manner. Since the last years the scientific community has been involved in facing
with the programming issues of distributed systems, aimed at the definition of applications
featuring an increasing complexity in the number of distributed components, in the spatial
distribution and cooperation between interested parties and in their degree of heterogeneity.
Over the last decade the research trend in distributed computing has been focused on
a crucial objective. The wide-ranging composition of distributed platforms in terms of
different classes of computing nodes and network technologies, the strong diffusion of
applications that require real-time elaborations and online compute-intensive processing as
in the case of emergency management systems, lead to a pronounced tendency of systems
towards properties like self-managing, self-organization, self-controlling and strictly speaking
adaptivity.
Adaptivity implies the development, deployment, execution and management of applications
that, in general, are dynamic in nature. Dynamicity concerns the number and the specific
identification of cooperating components, the deployment and composition of the most
suitable versions of software components on processing and networking resources and
services, i.e., both the quantity and the quality of the application components to achieve
the needed Quality of Service (QoS). In time-critical applications the QoS specification
can dynamically vary during the execution, according to the user intentions and the
Developing Real-Time Emergency
Management Applications: Methodology for
a Novel Programming Model Approach
Gabriele Mencagli and Marco Vanneschi
Department of Computer Science, University of Pisa, L. Bruno Pontecorvo, Pisa
Italy
2
2 Will-be-set-by-IN-TECH
information produced by sensors and services, as well as according to the monitored state
and performance of networks and nodes.
The general reference point for this kind of systems is the Grid paradigm which, by
definition, aims to enable the access, selection and aggregation of a variety of distributed and
heterogeneous resources and services. However, though notable advancements have been
achieved in recent years, current Grid technology is not yet able to supply the needed software
tools with the features of high adaptivity, ubiquity, proactivity, self-organization, scalability
and performance, interoperability, as well as fault tolerance and security, of the emerging
applications.
For this reason in this chapter we will study a methodology for designing high-performance
computations able to exploit the heterogeneity and dynamicity of distributed environments
by expressing adaptivity and QoS-awareness directly at the application level. An effective
approach needs to address issues like QoS predictability of different application configurations
as well as the predictability of reconfiguration costs. Moreover adaptation strategies need to
be developed assuring properties like the stability degree of a reconfiguration decision and the
execution optimality (i.e. select reconfigurations accounting proper trade-offs among different
QoS objectives). In this chapter we will present the basic points of a novel approach that lays
the foundations for future programming model environments for time-critical applications
such as emergency management systems.
The organization of this chapter is the following. In Section 2 we will compare the existing
research works for developing adaptive systems in critical environments, highlighting their
drawbacks and inefficiencies. In Section 3, in order to clarify the application scenarios that
we are considering, we will present an emergency management system in which the run-time
selection of proper application configuration parameters is of great importance for meeting the
desired QoS constraints. In Section 4we will describe the basic points of our approach in terms
of how compute-intensive operations can be programmed, how they can be dynamically
modified and how adaptation strategies can be expressed. In Section 5 our approach will
be contextualize to the definition of an adaptive parallel module, which is a building block
for composing complex and distributed adaptive computations. Finally in Section 6 we will
describe a set of experimental results that show the viability of our approach and in Section 7
we will give the concluding remarks of this chapter
Replica Creation Algorithm for Data Grids
Data grid system is a data management infrastructure that facilitates reliable access and sharing of large amount of data, storage resources, and data transfer services that can be scaled across distributed locations. This thesis presents a new replication algorithm that improves data access performance in data grids by distributing relevant data copies around the grid. The new Data Replica Creation Algorithm (DRCM) improves performance of data grid systems by reducing job execution time and making the best use of data grid resources (network bandwidth and storage space). Current algorithms focus on number of accesses in deciding which file to replicate and where to place them, which ignores resources’ capabilities. DRCM differs by considering both user and resource perspectives; strategically placing replicas at locations that provide the lowest transfer cost. The proposed algorithm uses three strategies: Replica Creation and Deletion Strategy (RCDS), Replica Placement Strategy (RPS), and Replica Replacement Strategy (RRS). DRCM was evaluated using network simulation (OptorSim) based on selected performance metrics (mean job execution time, efficient network usage, average storage usage, and computing element usage), scenarios, and topologies. Results revealed better job execution time with lower resource consumption than existing approaches. This research contributes replication strategies embodied in one algorithm that enhances data grid performance, capable of making a decision on creating or deleting more than one file during same decision. Furthermore, dependency-level-between-files criterion was utilized and integrated with the exponential growth/decay model to give an accurate file evaluation
- …