108,540 research outputs found
Failure-awareness and dynamic adaptation in data scheduling
Over the years, scientific applications have become more complex and more data intensive. Especially large scale simulations and scientific experiments in areas such as physics, biology, astronomy and earth sciences demand highly distributed resources to satisfy excessive computational requirements. Increasing data requirements and the distributed nature of the resources made I/O the major bottleneck for end-to-end application performance. Existing systems fail to address issues such as reliability, scalability, and efficiency in dealing with wide area data access, retrieval and processing. In this study, we explore data-intensive distributed computing and study challenges in data placement in distributed environments. After analyzing different application scenarios, we develop new data scheduling methodologies and the key attributes for reliability, adaptability and performance optimization of distributed data placement tasks. Inspired by techniques used in microprocessor and operating system architectures, we extend and adapt some of the known low-level data handling and optimization techniques to distributed computing. Two major contributions of this work include (i) a failure-aware data placement paradigm for increased fault-tolerance, and (ii) adaptive scheduling of data placement tasks for improved end-to-end performance. The failure-aware data placement includes early error detection, error classification, and use of this information in scheduling decisions for the prevention of and recovery from possible future errors. The adaptive scheduling approach includes dynamically tuning data transfer parameters over wide area networks for efficient utilization of available network capacity and optimized end-to-end data transfer performance
Replica placement in peer-to-peer systems
In today’s distributed applications, replica placement is essential since moving the data in the vicinity of an application will provide many benefits. The increasing requirements of data for scientific applications and collaborative access to these data make data placement even more important. Until now, replication is one of the main mechanisms used in distributed data whereby identical copies of data are generated and stored at various distributed sites to improve data access performance and data availability. Most work considers file’s popularity as one of the important parameters taken into consideration when designing replica placement strategies. However, this thesis argues that a combination of popularity and affinity files are the most important parameters which can be used in decision making whilst improving data access performance and data availability in distributed environments. A replica placement mechanism called Affinity Replica Placement Mechanism (ARPM) is proposed focusing on popular files and affinity files. The idea of ARPM is to improve data availability and accessibility in peer-to-peer (P2P) replica placement strategy. A P2P simulator, PeerSim, was used to evaluate the performance of this dynamic replica placement strategy. The simulation results demonstrated the effectiveness of ARPM hence provided a proof that ARPM has contributed towards a new dimension of replica placement strategy that incorporates the affinity and popularity of files replicas in P2P systems
HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges
High Performance Computing (HPC) clouds are becoming an alternative to
on-premise clusters for executing scientific applications and business
analytics services. Most research efforts in HPC cloud aim to understand the
cost-benefit of moving resource-intensive applications from on-premise
environments to public cloud platforms. Industry trends show hybrid
environments are the natural path to get the best of the on-premise and cloud
resources---steady (and sensitive) workloads can run on on-premise resources
and peak demand can leverage remote resources in a pay-as-you-go manner.
Nevertheless, there are plenty of questions to be answered in HPC cloud, which
range from how to extract the best performance of an unknown underlying
platform to what services are essential to make its usage easier. Moreover, the
discussion on the right pricing and contractual models to fit small and large
users is relevant for the sustainability of HPC clouds. This paper brings a
survey and taxonomy of efforts in HPC cloud and a vision on what we believe is
ahead of us, including a set of research challenges that, once tackled, can
help advance businesses and scientific discoveries. This becomes particularly
relevant due to the fast increasing wave of new HPC applications coming from
big data and artificial intelligence.Comment: 29 pages, 5 figures, Published in ACM Computing Surveys (CSUR
Predicting Intermediate Storage Performance for Workflow Applications
Configuring a storage system to better serve an application is a challenging
task complicated by a multidimensional, discrete configuration space and the
high cost of space exploration (e.g., by running the application with different
storage configurations). To enable selecting the best configuration in a
reasonable time, we design an end-to-end performance prediction mechanism that
estimates the turn-around time of an application using storage system under a
given configuration. This approach focuses on a generic object-based storage
system design, supports exploring the impact of optimizations targeting
workflow applications (e.g., various data placement schemes) in addition to
other, more traditional, configuration knobs (e.g., stripe size or replication
level), and models the system operation at data-chunk and control message
level.
This paper presents our experience to date with designing and using this
prediction mechanism. We evaluate this mechanism using micro- as well as
synthetic benchmarks mimicking real workflow applications, and a real
application.. A preliminary evaluation shows that we are on a good track to
meet our objectives: it can scale to model a workflow application run on an
entire cluster while offering an over 200x speedup factor (normalized by
resource) compared to running the actual application, and can achieve, in the
limited number of scenarios we study, a prediction accuracy that enables
identifying the best storage system configuration
Cloud computing resource scheduling and a survey of its evolutionary approaches
A disruptive technology fundamentally transforming the way that computing services are delivered, cloud computing offers information and communication technology users a new dimension of convenience of resources, as services via the Internet. Because cloud provides a finite pool of virtualized on-demand resources, optimally scheduling them has become an essential and rewarding topic, where a trend of using Evolutionary Computation (EC) algorithms is emerging rapidly. Through analyzing the cloud computing architecture, this survey first presents taxonomy at two levels of scheduling cloud resources. It then paints a landscape of the scheduling problem and solutions. According to the taxonomy, a comprehensive survey of state-of-the-art approaches is presented systematically. Looking forward, challenges and potential future research directions are investigated and invited, including real-time scheduling, adaptive dynamic scheduling, large-scale scheduling, multiobjective scheduling, and distributed and parallel scheduling. At the dawn of Industry 4.0, cloud computing scheduling for cyber-physical integration with the presence of big data is also discussed. Research in this area is only in its infancy, but with the rapid fusion of information and data technology, more exciting and agenda-setting topics are likely to emerge on the horizon
A Time-driven Data Placement Strategy for a Scientific Workflow Combining Edge Computing and Cloud Computing
Compared to traditional distributed computing environments such as grids,
cloud computing provides a more cost-effective way to deploy scientific
workflows. Each task of a scientific workflow requires several large datasets
that are located in different datacenters from the cloud computing environment,
resulting in serious data transmission delays. Edge computing reduces the data
transmission delays and supports the fixed storing manner for scientific
workflow private datasets, but there is a bottleneck in its storage capacity.
It is a challenge to combine the advantages of both edge computing and cloud
computing to rationalize the data placement of scientific workflow, and
optimize the data transmission time across different datacenters. Traditional
data placement strategies maintain load balancing with a given number of
datacenters, which results in a large data transmission time. In this study, a
self-adaptive discrete particle swarm optimization algorithm with genetic
algorithm operators (GA-DPSO) was proposed to optimize the data transmission
time when placing data for a scientific workflow. This approach considered the
characteristics of data placement combining edge computing and cloud computing.
In addition, it considered the impact factors impacting transmission delay,
such as the band-width between datacenters, the number of edge datacenters, and
the storage capacity of edge datacenters. The crossover operator and mutation
operator of the genetic algorithm were adopted to avoid the premature
convergence of the traditional particle swarm optimization algorithm, which
enhanced the diversity of population evolution and effectively reduced the data
transmission time. The experimental results show that the data placement
strategy based on GA-DPSO can effectively reduce the data transmission time
during workflow execution combining edge computing and cloud computing
- …