6,130 research outputs found
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
A Case for Cooperative and Incentive-Based Coupling of Distributed Clusters
Research interest in Grid computing has grown significantly over the past
five years. Management of distributed resources is one of the key issues in
Grid computing. Central to management of resources is the effectiveness of
resource allocation as it determines the overall utility of the system. The
current approaches to superscheduling in a grid environment are non-coordinated
since application level schedulers or brokers make scheduling decisions
independently of the others in the system. Clearly, this can exacerbate the
load sharing and utilization problems of distributed resources due to
suboptimal schedules that are likely to occur. To overcome these limitations,
we propose a mechanism for coordinated sharing of distributed clusters based on
computational economy. The resulting environment, called
\emph{Grid-Federation}, allows the transparent use of resources from the
federation when local resources are insufficient to meet its users'
requirements. The use of computational economy methodology in coordinating
resource allocation not only facilitates the QoS based scheduling, but also
enhances utility delivered by resources.Comment: 22 pages, extended version of the conference paper published at IEEE
Cluster'05, Boston, M
Master/worker parallel discrete event simulation
The execution of parallel discrete event simulation across metacomputing infrastructures is examined. A master/worker architecture for parallel discrete event simulation is proposed providing robust executions under a dynamic set of services with system-level support for fault tolerance, semi-automated client-directed load balancing, portability across heterogeneous machines, and the ability to run codes on idle or time-sharing clients without significant interaction by users. Research questions and challenges associated with issues and limitations with the work distribution paradigm, targeted computational domain, performance metrics, and the intended class of applications to be used in this context are analyzed and discussed. A portable web services approach to master/worker parallel discrete event simulation is proposed and evaluated with subsequent optimizations to increase the efficiency of large-scale simulation execution through distributed master service design and intrinsic overhead reduction. New techniques for addressing challenges associated with optimistic parallel discrete event simulation across metacomputing such as rollbacks and message unsending with an inherently different computation paradigm utilizing master services and time windows are proposed and examined. Results indicate that a master/worker approach utilizing loosely coupled resources is a viable means for high throughput parallel discrete event simulation by enhancing existing computational capacity or providing alternate execution capability for less time-critical codes.Ph.D.Committee Chair: Fujimoto, Richard; Committee Member: Bader, David; Committee Member: Perumalla, Kalyan; Committee Member: Riley, George; Committee Member: Vuduc, Richar
Methodology for modeling high performance distributed and parallel systems
Performance modeling of distributed and parallel systems is of considerable importance to the high performance computing community. To achieve high performance, proper task or process assignment and data or file allocation among processing sites is essential. This dissertation describes an elegant approach to model distributed and parallel systems, which combines the optimal static solutions for data allocation with dynamic policies for task assignment. A performance-efficient system model is developed using analytical tools and techniques.
The system model is accomplished in three steps. First, the basic client-server model which allows only data transfer is evaluated. A prediction and evaluation method is developed to examine the system behavior and estimate performance measures. The method is based on known product form queueing networks. The next step extends the model so that each site of the system behaves as both client and server. A data-allocation strategy is designed at this stage which optimally assigns the data to the processing sites. The strategy is based on flow deviation technique in queueing models. The third stage considers process-migration policies. A novel on-line adaptive load-balancing algorithm is proposed which dynamically migrates processes and transfers data among different sites to minimize the job execution cost. The gradient-descent rule is used to optimize the cost function, which expresses the cost of process execution at different processing sites.
The accuracy of the prediction method and the effectiveness of the analytical techniques is established by the simulations. The modeling procedure described here is general and applicable to any message-passing distributed and parallel system. The proposed techniques and tools can be easily utilized in other related areas such as networking and operating systems. This work contributes significantly towards the design of distributed and parallel systems where performance is critical
Decentralized load balancing in heterogeneous computational grids
With the rapid development of high-speed wide-area networks and powerful yet low-cost computational resources, grid computing has emerged as an attractive computing paradigm. The space limitations of conventional distributed systems can thus be overcome, to fully exploit the resources of under-utilised computing resources in every region around the world for distributed jobs. Workload and resource management are key grid services at the service level of grid software infrastructure, where issues of load balancing represent a common concern for most grid infrastructure developers. Although these are established research areas in parallel and distributed computing, grid computing environments present a number of new challenges, including large-scale computing resources, heterogeneous computing power, the autonomy of organisations hosting the resources, uneven job-arrival pattern among grid sites, considerable job transfer costs, and considerable communication overhead involved in capturing the load information of sites. This dissertation focuses on designing solutions for load balancing in computational grids that can cater for the unique characteristics of grid computing environments. To explore the solution space, we conducted a survey for load balancing solutions, which enabled discussion and comparison of existing approaches, and the delimiting and exploration of the apportion of solution space. A system model was developed to study the load-balancing problems in computational grid environments. In particular, we developed three decentralised algorithms for job dispatching and load balancing—using only partial information: the desirability-aware load balancing algorithm (DA), the performance-driven desirability-aware load-balancing algorithm (P-DA), and the performance-driven region-based load-balancing algorithm (P-RB). All three are scalable, dynamic, decentralised and sender-initiated. We conducted extensive simulation studies to analyse the performance of our load-balancing algorithms. Simulation results showed that the algorithms significantly outperform preexisting decentralised algorithms that are relevant to this research
Microservices-based IoT Applications Scheduling in Edge and Fog Computing: A Taxonomy and Future Directions
Edge and Fog computing paradigms utilise distributed, heterogeneous and
resource-constrained devices at the edge of the network for efficient
deployment of latency-critical and bandwidth-hungry IoT application services.
Moreover, MicroService Architecture (MSA) is increasingly adopted to keep up
with the rapid development and deployment needs of the fast-evolving IoT
applications. Due to the fine-grained modularity of the microservices along
with their independently deployable and scalable nature, MSA exhibits great
potential in harnessing both Fog and Cloud resources to meet diverse QoS
requirements of the IoT application services, thus giving rise to novel
paradigms like Osmotic computing. However, efficient and scalable scheduling
algorithms are required to utilise the said characteristics of the MSA while
overcoming novel challenges introduced by the architecture. To this end, we
present a comprehensive taxonomy of recent literature on microservices-based
IoT applications scheduling in Edge and Fog computing environments.
Furthermore, we organise multiple taxonomies to capture the main aspects of the
scheduling problem, analyse and classify related works, identify research gaps
within each category, and discuss future research directions.Comment: 35 pages, 10 figures, submitted to ACM Computing Survey
Optimal pre-scheduling of problem remappings
A large class of scientific computational problems can be characterized as a sequence of steps where a significant amount of computation occurs each step, but the work performed at each step is not necessarily identical. Two good examples of this type of computation are: (1) regridding methods which change the problem discretization during the course of the computation, and (2) methods for solving sparse triangular systems of linear equations. Recent work has investigated a means of mapping such computations onto parallel processors; the method defines a family of static mappings with differing degrees of importance placed on the conflicting goals of good load balance and low communication/synchronization overhead. The performance tradeoffs are controllable by adjusting the parameters of the mapping method. To achieve good performance it may be necessary to dynamically change these parameters at run-time, but such changes can impose additional costs. If the computation's behavior can be determined prior to its execution, it can be possible to construct an optimal parameter schedule using a low-order-polynomial-time dynamic programming algorithm. Since the latter can be expensive, the performance is studied of the effect of a linear-time scheduling heuristic on one of the model problems, and it is shown to be effective and nearly optimal
Dynamic remapping of parallel computations with varying resource demands
A large class of computational problems is characterized by frequent synchronization, and computational requirements which change as a function of time. When such a problem must be solved on a message passing multiprocessor machine, the combination of these characteristics lead to system performance which decreases in time. Performance can be improved with periodic redistribution of computational load; however, redistribution can exact a sometimes large delay cost. We study the issue of deciding when to invoke a global load remapping mechanism. Such a decision policy must effectively weigh the costs of remapping against the performance benefits. We treat this problem by constructing two analytic models which exhibit stochastically decreasing performance. One model is quite tractable; we are able to describe the optimal remapping algorithm, and the optimal decision policy governing when to invoke that algorithm. However, computational complexity prohibits the use of the optimal remapping decision policy. We then study the performance of a general remapping policy on both analytic models. This policy attempts to minimize a statistic W(n) which measures the system degradation (including the cost of remapping) per computation step over a period of n steps. We show that as a function of time, the expected value of W(n) has at most one minimum, and that when this minimum exists it defines the optimal fixed-interval remapping policy. Our decision policy appeals to this result by remapping when it estimates that W(n) is minimized. Our performance data suggests that this policy effectively finds the natural frequency of remapping. We also use the analytic models to express the relationship between performance and remapping cost, number of processors, and the computation's stochastic activity
The OpenDC Microservice Simulator: Design, Implementation, and Experimentation
Microservices is an architectural style that structures an application as a
collection of loosely coupled services, making it easy for developers to build
and scale their applications. The microservices architecture approach differs
from the traditional monolithic style of treating software development as a
single entity. Microservice architecture is becoming more and more adapted.
However, microservice systems can be complex due to dependencies between the
microservices, resulting in unpredictable performance at a large scale.
Simulation is a cheap and fast way to investigate the performance of
microservices in more detail. This study aims to build a microservices
simulator for evaluating and comparing microservices based applications. The
microservices reference architecture is designed. The architecture is used as
the basis for a simulator. The simulator implementation uses statistical models
to generate the workload. The compelling features added to the simulator
include concurrent execution of microservices, configurable request depth,
three load-balancing policies and four request execution order policies. This
paper contains two experiments to show the simulator usage. The first
experiment covers request execution order policies at the microservice
instance. The second experiment compares load balancing policies across
microservice instances.Comment: Bachelor's thesi
- …