386 research outputs found
Proceedings of the Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015) Krakow, Poland
Proceedings of: Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015). Krakow (Poland), September 10-11, 2015
Virtual Organization Clusters: Self-Provisioned Clouds on the Grid
Virtual Organization Clusters (VOCs) provide a novel architecture for overlaying dedicated cluster systems on existing grid infrastructures. VOCs provide customized, homogeneous execution environments on a per-Virtual Organization basis, without the cost of physical cluster construction or the overhead of per-job containers. Administrative access and overlay network capabilities are granted to Virtual Organizations (VOs) that choose to implement VOC technology, while the system remains completely transparent to end users and non-participating VOs. Unlike alternative systems that require explicit leases, VOCs are autonomically self-provisioned according to configurable usage policies. As a grid computing architecture, VOCs are designed to be technology agnostic and are implementable by any combination of software and services that follows the Virtual Organization Cluster Model. As demonstrated through simulation testing and evaluation of an implemented prototype, VOCs are a viable mechanism for increasing end-user job compatibility on grid sites. On existing production grids, where jobs are frequently submitted to a small subset of sites and thus experience high queuing delays relative to average job length, the grid-wide addition of VOCs does not adversely affect mean job sojourn time. By load-balancing jobs among grid sites, VOCs can reduce the total amount of queuing on a grid to a level sufficient to counteract the performance overhead introduced by virtualization
Developing Real-Time Emergency Management Applications: Methodology for a Novel Programming Model Approach
The last years have been characterized by the arising of highly distributed computing
platforms composed of a heterogeneity of computing and communication resources including
centralized high-performance computing architectures (e.g. clusters or large shared-memory
machines), as well as multi-/many-core components also integrated into mobile nodes
and network facilities. The emerging of computational paradigms such as Grid and Cloud
Computing, provides potential solutions to integrate such platforms with data systems, natural
phenomena simulations, knowledge discovery and decision support systems responding to a
dynamic demand of remote computing and communication resources and services.
In this context time-critical applications, notably emergency management systems, are
composed of complex sets of application components specialized for executing specific
computations, which are able to cooperate in such a way as to perform a global goal in a
distributed manner. Since the last years the scientific community has been involved in facing
with the programming issues of distributed systems, aimed at the definition of applications
featuring an increasing complexity in the number of distributed components, in the spatial
distribution and cooperation between interested parties and in their degree of heterogeneity.
Over the last decade the research trend in distributed computing has been focused on
a crucial objective. The wide-ranging composition of distributed platforms in terms of
different classes of computing nodes and network technologies, the strong diffusion of
applications that require real-time elaborations and online compute-intensive processing as
in the case of emergency management systems, lead to a pronounced tendency of systems
towards properties like self-managing, self-organization, self-controlling and strictly speaking
adaptivity.
Adaptivity implies the development, deployment, execution and management of applications
that, in general, are dynamic in nature. Dynamicity concerns the number and the specific
identification of cooperating components, the deployment and composition of the most
suitable versions of software components on processing and networking resources and
services, i.e., both the quantity and the quality of the application components to achieve
the needed Quality of Service (QoS). In time-critical applications the QoS specification
can dynamically vary during the execution, according to the user intentions and the
Developing Real-Time Emergency
Management Applications: Methodology for
a Novel Programming Model Approach
Gabriele Mencagli and Marco Vanneschi
Department of Computer Science, University of Pisa, L. Bruno Pontecorvo, Pisa
Italy
2
2 Will-be-set-by-IN-TECH
information produced by sensors and services, as well as according to the monitored state
and performance of networks and nodes.
The general reference point for this kind of systems is the Grid paradigm which, by
definition, aims to enable the access, selection and aggregation of a variety of distributed and
heterogeneous resources and services. However, though notable advancements have been
achieved in recent years, current Grid technology is not yet able to supply the needed software
tools with the features of high adaptivity, ubiquity, proactivity, self-organization, scalability
and performance, interoperability, as well as fault tolerance and security, of the emerging
applications.
For this reason in this chapter we will study a methodology for designing high-performance
computations able to exploit the heterogeneity and dynamicity of distributed environments
by expressing adaptivity and QoS-awareness directly at the application level. An effective
approach needs to address issues like QoS predictability of different application configurations
as well as the predictability of reconfiguration costs. Moreover adaptation strategies need to
be developed assuring properties like the stability degree of a reconfiguration decision and the
execution optimality (i.e. select reconfigurations accounting proper trade-offs among different
QoS objectives). In this chapter we will present the basic points of a novel approach that lays
the foundations for future programming model environments for time-critical applications
such as emergency management systems.
The organization of this chapter is the following. In Section 2 we will compare the existing
research works for developing adaptive systems in critical environments, highlighting their
drawbacks and inefficiencies. In Section 3, in order to clarify the application scenarios that
we are considering, we will present an emergency management system in which the run-time
selection of proper application configuration parameters is of great importance for meeting the
desired QoS constraints. In Section 4we will describe the basic points of our approach in terms
of how compute-intensive operations can be programmed, how they can be dynamically
modified and how adaptation strategies can be expressed. In Section 5 our approach will
be contextualize to the definition of an adaptive parallel module, which is a building block
for composing complex and distributed adaptive computations. Finally in Section 6 we will
describe a set of experimental results that show the viability of our approach and in Section 7
we will give the concluding remarks of this chapter
Advances in Grid Computing
This book approaches the grid computing with a perspective on the latest achievements in the field, providing an insight into the current research trends and advances, and presenting a large range of innovative research papers. The topics covered in this book include resource and data management, grid architectures and development, and grid-enabled applications. New ideas employing heuristic methods from swarm intelligence or genetic algorithm and quantum encryption are considered in order to explain two main aspects of grid computing: resource management and data management. The book addresses also some aspects of grid computing that regard architecture and development, and includes a diverse range of applications for grid computing, including possible human grid computing system, simulation of the fusion reaction, ubiquitous healthcare service provisioning and complex water systems
Fail Over Strategy for Fault Tolerance in Cloud Computing Environment
YesCloud fault tolerance is an important issue in cloud computing platforms and applications. In the event of an unexpected
system failure or malfunction, a robust fault-tolerant design may allow the cloud to continue functioning correctly
possibly at a reduced level instead of failing completely. To ensure high availability of critical cloud services, the
application execution and hardware performance, various fault tolerant techniques exist for building self-autonomous
cloud systems. In comparison to current approaches, this paper proposes a more robust and reliable architecture using
optimal checkpointing strategy to ensure high system availability and reduced system task service finish time. Using
pass rates and virtualised mechanisms, the proposed Smart Failover Strategy (SFS) scheme uses components such as
Cloud fault manager, Cloud controller, Cloud load balancer and a selection mechanism, providing fault tolerance via
redundancy, optimized selection and checkpointing. In our approach, the Cloud fault manager repairs faults generated
before the task time deadline is reached, blocking unrecoverable faulty nodes as well as their virtual nodes. This scheme
is also able to remove temporary software faults from recoverable faulty nodes, thereby making them available for future
request. We argue that the proposed SFS algorithm makes the system highly fault tolerant by considering forward and
backward recovery using diverse software tools. Compared to existing approaches, preliminary experiment of the SFS
algorithm indicate an increase in pass rates and a consequent decrease in failure rates, showing an overall good
performance in task allocations. We present these results using experimental validation tools with comparison to other
techniques, laying a foundation for a fully fault tolerant IaaS Cloud environment
Recommended from our members
Hadoop performance modeling and job optimization for big data analytics
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonBig data has received a momentum from both academia and industry. The MapReduce model has emerged into a major computing model in support of big data analytics. Hadoop, which is an open source implementation of the MapReduce model, has been widely taken up by the community. Cloud service providers such as Amazon EC2 cloud have now supported Hadoop user applications. However, a key challenge is that the cloud service providers do not a have resource provisioning mechanism to satisfy user jobs with deadline requirements. Currently, it is solely the user responsibility to estimate the require amount of resources for their job running in a public cloud. This thesis presents a Hadoop performance model that accurately estimates the execution duration of a job and further provisions the required amount of resources for a job to be completed within a deadline. The proposed model employs Locally Weighted Linear Regression (LWLR) model to estimate execution time of a job and Lagrange Multiplier technique for resource provisioning to satisfy user job with a given deadline. The performance of the propose model is extensively evaluated in both in-house Hadoop cluster and Amazon EC2 Cloud. Experimental results show that the proposed model is highly accurate in job execution estimation and jobs are completed within the required deadlines following on the resource provisioning scheme of the proposed model. In addition, the Hadoop framework has over 190 configuration parameters and some of them have significant effects on the performance of a Hadoop job. Manually setting the optimum values for these parameters is a challenging task and also a time consuming process. This thesis presents optimization works that enhances the performance of Hadoop by automatically tuning its parameter values. It employs Gene Expression Programming (GEP) technique to build an objective function that represents the performance of a job and the correlation among the configuration parameters. For the purpose of optimization, Particle Swarm Optimization (PSO) is employed to find automatically an optimal or a near optimal configuration settings. The performance of the proposed work is intensively evaluated on a Hadoop cluster and the experimental results show that the proposed work enhances the performance of Hadoop significantly compared with the default settings.Abdul Wali Khan University Marda
Adaptive heterogeneous parallelism for semi-empirical lattice dynamics in computational materials science.
With the variability in performance of the multitude of parallel environments available today, the conceptual overhead created by the need to anticipate runtime information to make design-time decisions has become overwhelming. Performance-critical applications and libraries carry implicit assumptions based on incidental metrics that are not portable to emerging computational platforms or even alternative contemporary architectures. Furthermore, the significance of runtime concerns such as makespan, energy efficiency and fault tolerance depends on the situational context. This thesis presents a case study in the application of both Mattsons prescriptive pattern-oriented approach and the more principled structured parallelism formalism to the computational simulation of inelastic neutron scattering spectra on hybrid CPU/GPU platforms. The original ad hoc implementation as well as new patternbased and structured implementations are evaluated for relative performance and scalability. Two new structural abstractions are introduced to facilitate adaptation by lazy optimisation and runtime feedback. A deferred-choice abstraction represents a unified space of alternative structural program variants, allowing static adaptation through model-specific exhaustive calibration with regards to the extrafunctional concerns of runtime, average instantaneous power and total energy usage. Instrumented queues serve as mechanism for structural composition and provide a representation of extrafunctional state that allows realisation of a market-based decentralised coordination heuristic for competitive resource allocation and the Lyapunov drift algorithm for cooperative scheduling
- âŠ