1,563 research outputs found
Dimensionerings- en werkverdelingsalgoritmen voor lambda grids
Grids bestaan uit een verzameling reken- en opslagelementen die geografisch verspreid kunnen zijn, maar waarvan men de gezamenlijke capaciteit wenst te benutten. Daartoe dienen deze elementen verbonden te worden met een netwerk. Vermits veel wetenschappelijke applicaties gebruik maken van een Grid, en deze applicaties doorgaans grote hoeveelheden data verwerken, is het noodzakelijk om een netwerk te voorzien dat dergelijke grote datastromen op betrouwbare wijze kan transporteren. Optische transportnetwerken lenen zich hier uitstekend toe. Grids die gebruik maken van dergelijk netwerk noemt men lambda Grids. Deze thesis beschrijft een kader waarin het ontwerp en dimensionering van optische netwerken voor lambda Grids kunnen beschreven worden. Ook wordt besproken hoe werklast kan verdeeld worden op een Grid eens die gedimensioneerd is. Een groot deel van de resultaten werd bekomen door simulatie, waarbij gebruik gemaakt wordt van een eigen Grid simulatiepakket dat precies focust op netwerk- en Gridelementen. Het ontwerp van deze simulator, en de daarbijhorende implementatiekeuzes worden dan ook uitvoerig toegelicht in dit werk
Task assignment in server farms under realistic workload conditions
Server farms have become very popular in recent years since they effectively address the problem of large delays, a common problem faced by many organisations whose systems receive high volumes of traffic. Recently, there has been a wide use of these server farms in two main areas, namely, Web hosting and scientific computing. The performance of such server farms is highly reliant on the underlying task assignment policy, a specific set of rules that defines how the incoming tasks are assigned to and processed at hosts. The aim of a task assignment policy is to optimise certain performance criteria such as the expected waiting time and slowdown. One of the key factors that affect the performance of these policies is the service time distribution of tasks. There is extensive evidence indicating that the service times of modern computer workloads closely follow heavy-tailed distributions that possess high variance. However, in certain environments, the service time distributions of tasks are unknown. Imposing parametric assumptions in such cases can lead to inaccurate and unreliable inferences. Considerable efforts have been made in recent years to devise efficient policies. Although these policies perform well under specific workload conditions, they have several major limitations. These include the assumption of known service times, inability to efficiently assign tasks in time sharing server farms, poor performance under changing workload conditions and poor performance under multiple server farms. This thesis aims at proposing novel task assignment policies for assigning tasks in server farms under two main classes of realistic workload conditions, namely, the heavy-tailed and arbitrary service time distributions. Arbitrary service time distributions are assumed, for cases where the underlying service time distribution of tasks is unknown. First we investigate ways to optimise the performance in a time-sharing server. We concentrate on a particular scheduling policy called multi-level time sharing policy (MLTP). We provide an extensive performance analysis of MTLP and show that MLTP can result in significant performance improvements under certain traffic conditions. Second we investigate how to improve the performance in time sharing server farms using MLTP. Three task assignment policies are proposed for time sharing server farms. Third we investigate how to design efficient task assignment policies to assign tasks in multiple server farms. We propose MCTPM which is based on a multi-tier host architecture. MCTPM supports preemptive task migration and it controls the traffic flow into server farms via a global dispatching device so as to optimise the performance. Finally, we investigate ways to design adaptive task assignment policies that make no assumptions regarding the underlying service time distribution of tasks. We propose a novel task assignment policy, called ADAPT-POLICY, which is based on a set of static-based task assignment policies. ADAPT-POLICY is based on a set of policies for the server farm and it adaptively changes the task assignment policy to suit the most recent traffic conditions. The experimental performance analysis of ADAPT-POLICY shows that ADAPT-POLICY outperforms other policies under a range of traffic conditions
Self-management for large-scale distributed systems
Autonomic computing aims at making computing systems self-managing by using autonomic managers in order to reduce obstacles caused by management complexity. This thesis presents results of research on self-management for large-scale distributed systems. This research was motivated by the increasing complexity of computing systems and their management.
In the first part, we present our platform, called Niche, for programming self-managing component-based distributed applications. In our work on Niche, we have faced and addressed the following four challenges in achieving
self-management in a dynamic environment characterized by volatile resources and high churn: resource discovery, robust and efficient sensing and actuation, management bottleneck, and scale. We present results of our research
on addressing the above challenges. Niche implements the autonomic computing architecture, proposed by IBM, in a fully decentralized way. Niche supports a network-transparent view of the system architecture simplifying
the design of distributed self-management. Niche provides a concise and expressive API for self-management. The implementation of the platform relies on the scalability and robustness of structured overlay networks. We proceed
by presenting a methodology for designing the management part of a distributed self-managing application. We define design steps that include partitioning of management functions and orchestration of multiple autonomic
managers. In the second part, we discuss robustness of management and data consistency, which are necessary in a distributed system. Dealing with the effect of churn on management increases the complexity of the management logic
and thus makes its development time consuming and error prone. We propose the abstraction of Robust Management Elements, which are able to heal themselves under continuous churn. Our approach is based on replicating a
management element using finite state machine replication with a reconfigurable replica set. Our algorithm automates the reconfiguration (migration) of the replica set in order to tolerate continuous churn. For data consistency, we propose a majority-based distributed key-value store supporting multiple consistency levels that is based on a peer-to-peer network. The store enables the tradeoff between high availability and data consistency. Using majority allows avoiding potential drawbacks of a master-based consistency control, namely, a single-point of failure and a potential performance bottleneck. In the third part, we investigate self-management for Cloud-based storage systems with the focus on elasticity control using elements of control theory and machine learning. We have conducted research on a number of different designs of an elasticity controller, including a State-Space feedback controller and a controller that combines feedback and feedforward control. We describe our experience in designing an elasticity controller for a Cloud-based key-value store using state-space model that enables to trade-off performance for cost. We describe the steps in designing an elasticity controller. We continue by
presenting the design and evaluation of ElastMan, an elasticity controller for Cloud-based elastic key-value stores that combines feedforward and feedback control
Towards Loosely-Coupled Programming on Petascale Systems
We have extended the Falkon lightweight task execution framework to make
loosely coupled programming on petascale systems a practical and useful
programming model. This work studies and measures the performance factors
involved in applying this approach to enable the use of petascale systems by a
broader user community, and with greater ease. Our work enables the execution
of highly parallel computations composed of loosely coupled serial jobs with no
modifications to the respective applications. This approach allows a new-and
potentially far larger-class of applications to leverage petascale systems,
such as the IBM Blue Gene/P supercomputer. We present the challenges of I/O
performance encountered in making this model practical, and show results using
both microbenchmarks and real applications from two domains: economic energy
modeling and molecular dynamics. Our benchmarks show that we can scale up to
160K processor-cores with high efficiency, and can achieve sustained execution
rates of thousands of tasks per second.Comment: IEEE/ACM International Conference for High Performance Computing,
Networking, Storage and Analysis (SuperComputing/SC) 200
Climbing Up Cloud Nine: Performance Enhancement Techniques for Cloud Computing Environments
With the transformation of cloud computing technologies from an attractive trend to a business reality, the need is more pressing than ever for efficient cloud service management tools and techniques. As cloud technologies continue to mature, the service model, resource allocation methodologies, energy efficiency models and general service management schemes are not yet saturated. The burden of making this all tick perfectly falls on cloud providers. Surely, economy of scale revenues and leveraging existing infrastructure and giant workforce are there as positives, but it is far from straightforward operation from that point. Performance and service delivery will still depend on the providers’ algorithms and policies which affect all operational areas.
With that in mind, this thesis tackles a set of the more critical challenges faced by cloud providers with the purpose of enhancing cloud service performance and saving on providers’ cost. This is done by exploring innovative resource allocation techniques and developing novel tools and methodologies in the context of cloud resource management, power efficiency, high availability and solution evaluation.
Optimal and suboptimal solutions to the resource allocation problem in cloud data centers from both the computational and the network sides are proposed. Next, a deep dive into the energy efficiency challenge in cloud data centers is presented. Consolidation-based and non-consolidation-based solutions containing a novel dynamic virtual machine idleness prediction technique are proposed and evaluated. An investigation of the problem of simulating cloud environments follows. Available simulation solutions are comprehensively evaluated and a novel design framework for cloud simulators covering multiple variations of the problem is presented. Moreover, the challenge of evaluating cloud resource management solutions performance in terms of high availability is addressed. An extensive framework is introduced to design high availability-aware cloud simulators and a prominent cloud simulator (GreenCloud) is extended to implement it. Finally, real cloud application scenarios evaluation is demonstrated using the new tool.
The primary argument made in this thesis is that the proposed resource allocation and simulation techniques can serve as basis for effective solutions that mitigate performance and cost challenges faced by cloud providers pertaining to resource utilization, energy efficiency, and client satisfaction
Separation Framework: An Enabler for Cooperative and D2D Communication for Future 5G Networks
Soaring capacity and coverage demands dictate that future cellular networks
need to soon migrate towards ultra-dense networks. However, network
densification comes with a host of challenges that include compromised energy
efficiency, complex interference management, cumbersome mobility management,
burdensome signaling overheads and higher backhaul costs. Interestingly, most
of the problems, that beleaguer network densification, stem from legacy
networks' one common feature i.e., tight coupling between the control and data
planes regardless of their degree of heterogeneity and cell density.
Consequently, in wake of 5G, control and data planes separation architecture
(SARC) has recently been conceived as a promising paradigm that has potential
to address most of aforementioned challenges. In this article, we review
various proposals that have been presented in literature so far to enable SARC.
More specifically, we analyze how and to what degree various SARC proposals
address the four main challenges in network densification namely: energy
efficiency, system level capacity maximization, interference management and
mobility management. We then focus on two salient features of future cellular
networks that have not yet been adapted in legacy networks at wide scale and
thus remain a hallmark of 5G, i.e., coordinated multipoint (CoMP), and
device-to-device (D2D) communications. After providing necessary background on
CoMP and D2D, we analyze how SARC can particularly act as a major enabler for
CoMP and D2D in context of 5G. This article thus serves as both a tutorial as
well as an up to date survey on SARC, CoMP and D2D. Most importantly, the
article provides an extensive outlook of challenges and opportunities that lie
at the crossroads of these three mutually entangled emerging technologies.Comment: 28 pages, 11 figures, IEEE Communications Surveys & Tutorials 201
Performance Modeling of Softwarized Network Services Based on Queuing Theory with Experimental Validation
Network Functions Virtualization facilitates the automation of the scaling of softwarized network services (SNSs).
However, the realization of such a scenario requires a way to
determine the needed amount of resources so that the SNSs performance requisites are met for a given workload. This problem is
known as resource dimensioning, and it can be efficiently tackled
by performance modeling. In this vein, this paper describes an
analytical model based on an open queuing network of G/G/m
queues to evaluate the response time of SNSs. We validate our
model experimentally for a virtualized Mobility Management
Entity (vMME) with a three-tiered architecture running on
a testbed that resembles a typical data center virtualization
environment. We detail the description of our experimental
setup and procedures. We solve our resulting queueing network
by using the Queueing Networks Analyzer (QNA), Jackson’s
networks, and Mean Value Analysis methodologies, and compare
them in terms of estimation error. Results show that, for medium
and high workloads, the QNA method achieves less than half of
error compared to the standard techniques. For low workloads,
the three methods produce an error lower than 10%. Finally,
we show the usefulness of the model for performing the dynamic
provisioning of the vMME experimentally.This work has been partially funded by the H2020 research
and innovation project 5G-CLARITY (Grant No. 871428)National research
project 5G-City: TEC2016-76795-C6-4-RSpanish Ministry of
Education, Culture and Sport (FPU Grant 13/04833). We would also like to
thank the reviewers for their valuable feedback to enhance the quality
and contribution of this wor
- …