2,490 research outputs found
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
Workflow Partitioning and Deployment on the Cloud using Orchestra
Orchestrating service-oriented workflows is typically based on a design model
that routes both data and control through a single point - the centralised
workflow engine. This causes scalability problems that include the unnecessary
consumption of the network bandwidth, high latency in transmitting data between
the services, and performance bottlenecks. These problems are highly prominent
when orchestrating workflows that are composed from services dispersed across
distant geographical locations. This paper presents a novel workflow
partitioning approach, which attempts to improve the scalability of
orchestrating large-scale workflows. It permits the workflow computation to be
moved towards the services providing the data in order to garner optimal
performance results. This is achieved by decomposing the workflow into smaller
sub workflows for parallel execution, and determining the most appropriate
network locations to which these sub workflows are transmitted and subsequently
executed. This paper demonstrates the efficiency of our approach using a set of
experimental workflows that are orchestrated over Amazon EC2 and across several
geographic network regions.Comment: To appear in Proceedings of the IEEE/ACM 7th International Conference
on Utility and Cloud Computing (UCC 2014
Reusable cost-based scheduling of grid workflows operating on higher-order components
Grid applications are increasingly being developed as workflows built of well-structured, reusable components. We develop a user-transparent scheduling approach for Higher-Order Components (HOCs) . parallel implementations of typical programming patterns, accessible and customizable via Web services. We introduce a set of cost functions for a reusable scheduling: when the workflow recurs, it is mapped to the same execution nodes, avoiding the need for a repeated scheduling phase. We prove the efficiency of our scheduling by implementing it within the KOALA scheduler and comparing it with KOALA's standard Closeto- File policy. Experiments on scheduling HOC-based applications achieve a 40% speedup in communication and a 100% throughput increase
HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges
High Performance Computing (HPC) clouds are becoming an alternative to
on-premise clusters for executing scientific applications and business
analytics services. Most research efforts in HPC cloud aim to understand the
cost-benefit of moving resource-intensive applications from on-premise
environments to public cloud platforms. Industry trends show hybrid
environments are the natural path to get the best of the on-premise and cloud
resources---steady (and sensitive) workloads can run on on-premise resources
and peak demand can leverage remote resources in a pay-as-you-go manner.
Nevertheless, there are plenty of questions to be answered in HPC cloud, which
range from how to extract the best performance of an unknown underlying
platform to what services are essential to make its usage easier. Moreover, the
discussion on the right pricing and contractual models to fit small and large
users is relevant for the sustainability of HPC clouds. This paper brings a
survey and taxonomy of efforts in HPC cloud and a vision on what we believe is
ahead of us, including a set of research challenges that, once tackled, can
help advance businesses and scientific discoveries. This becomes particularly
relevant due to the fast increasing wave of new HPC applications coming from
big data and artificial intelligence.Comment: 29 pages, 5 figures, Published in ACM Computing Surveys (CSUR
Failure-awareness and dynamic adaptation in data scheduling
Over the years, scientific applications have become more complex and more data intensive. Especially large scale simulations and scientific experiments in areas such as physics, biology, astronomy and earth sciences demand highly distributed resources to satisfy excessive computational requirements. Increasing data requirements and the distributed nature of the resources made I/O the major bottleneck for end-to-end application performance. Existing systems fail to address issues such as reliability, scalability, and efficiency in dealing with wide area data access, retrieval and processing. In this study, we explore data-intensive distributed computing and study challenges in data placement in distributed environments. After analyzing different application scenarios, we develop new data scheduling methodologies and the key attributes for reliability, adaptability and performance optimization of distributed data placement tasks. Inspired by techniques used in microprocessor and operating system architectures, we extend and adapt some of the known low-level data handling and optimization techniques to distributed computing. Two major contributions of this work include (i) a failure-aware data placement paradigm for increased fault-tolerance, and (ii) adaptive scheduling of data placement tasks for improved end-to-end performance. The failure-aware data placement includes early error detection, error classification, and use of this information in scheduling decisions for the prevention of and recovery from possible future errors. The adaptive scheduling approach includes dynamically tuning data transfer parameters over wide area networks for efficient utilization of available network capacity and optimized end-to-end data transfer performance
- …