18 research outputs found
ДОСЯГНЕННЯ ЕФЕКТИВНОГО РОЗПОДІЛЕНОГО ПЛАНУВАННЯ ЗА ДОПОМОГОЮ ЧЕРГ ПОВІДОМЛЕНЬ У ХМАРІ ДЛЯ БАГАТОЗАДАЧНИХ ОБЧИСЛЕНЬ ТА ВИСОКОПРОДУКТИВНИХ ОБЧИСЛЕНЬ
Due to the growth of data and the number of computational tasks, it is necessary to ensure the required level of system performance. Performance can be achieved by scaling the system horizontally / vertically, but even increasing the amount of computing resources does not solve all the problems. For example, a complex computational problem should be decomposed into smaller subtasks, the computation time of which is much shorter. However, the number of such tasks may be constantly increasing, due to which the processing on the services is delayed or even certain messages will not be processed. In many cases, message processing should be coordinated, for example, message A should be processed only after messages B and C. Given the problems of processing a large number of subtasks, we aim in this work - to design a mechanism for effective distributed scheduling through message queues. As services we will choose cloud services Amazon Webservices such as Amazon EC2, SQS and DynamoDB. Our FlexQueue solution can compete with state-of-the-art systems such as Sparrow and MATRIX. Distributed systems are quite complex and require complex algorithms and control units, so the solution of this problem requires detailed research.Due to the growth of data and the number of computational tasks, it is necessary to ensure the required level of system performance. Performance can be achieved by scaling the system horizontally / vertically, but even increasing the amount of computing resources does not solve all the problems. For example, a complex computational problem should be decomposed into smaller subtasks, the computation time of which is much shorter. However, the number of such tasks may be constantly increasing, due to which the processing on the services is delayed or even certain messages will not be processed. In many cases, message processing should be coordinated, for example, message A should be processed only after messages B and C. Given the problems of processing a large number of subtasks, we aim in this work - to design a mechanism for effective distributed scheduling through message queues. As services we will choose cloud services Amazon Webservices such as Amazon EC2, SQS and DynamoDB. Our FlexQueue solution can compete with state-of-the-art systems such as Sparrow and MATRIX. Distributed systems are quite complex and require complex algorithms and control units, so the solution of this problem requires detailed research
Хмарна архітектура обробки даних в реальному часі для групи мобільних роботів
У магістерській дисертації розглянуто проблему синхронізації великих об’ємів даних в реальному часі, що надходять від групи мобільних роботів. В якості рішення обрано хмарні технології.
У розділі аналізу проблематики та постановки задачі визначено основні проблеми, що виникають при синхронізації даних такі як затримки в обробці, відмови в роботі сервісів, втрата даних та інші. Поставлено задачу розробити рішення, яке дозволить гнучко масштабувати систему, здійснювати відтворення втраченої інформації під час обробки та мінімізує затримки при надсилання сигналів керування.
У розділі вибору технологій проаналізовано можливі технологічні підходи та сервіси хмарного провайдера Amazon Web Services (AWS). Визначено перелік сервісів, що є основними компонентами в архітектурі: AWS IoT, AWS DynamoDB, AWS Kinesis та інші.
У розділі проектування архітектури розроблено архітектурні концепції, що вирішують поставлено задачу. Описано сервіси агрегацій даних, що є складовими компонентами спроектованої архітектури.
У розділі маркетингового аналізу стартап-проекту здійснено аналіз поточної ситуацію на ринку, створено стратегії та маркетинговий плани для впровадження рішення.
Розмір пояснювальної записки – 136 аркушів, містить 57 ілюстрацій, 26 таблиць, 8 додатків.The master's thesis deals with the problem of synchronizing large amounts of real-time data from a group of mobile robots. Cloud technologies were chosen as a solution.
In the section of the analysis of problems and statement of the problem the basic problems which arise at data synchronization such as delays in processing, failures in work of services, data loss and others are defined. The task is to develop a solution that will flexibly scale the system, reproduce lost information during processing and minimize delays in sending control signals.
The technology selection section analyzes possible technology approaches and services from the cloud provider Amazon Web Services (AWS). The list of services that are the main components in the architecture is defined: AWS IoT, AWS DynamoDB, AWS Kinesis and others.
In the section of architecture design, architectural concepts are developed that solve the problem. Data aggregation services that are components of the designed architecture are described.
In the section of marketing analysis of the startup project the analysis of a current situation in the market is carried out, strategies and marketing plans for implementation of the decision are created.
Explanatory note size – 136 pages, contains 57 illustrations, 26 tables, 8 applications
Cloud access to interoperable IVOA-compliant VOSpace storage
Handling, processing and archiving the huge amount of data produced by the new generation of experiments and instruments in Astronomy and Astrophysics are among the more exciting challenges to address in designing the future data management infrastructures and computing services. We investigated the feasibility of a data management and computation infrastructure, available world-wide, with the aim of merging the FAIR data management
provided by IVOA standards with the efficiency and reliability of a cloud approach. Our work involved the Canadian Advanced Network for Astronomy
Research (CANFAR) infrastructure and the European EGI federated cloud
(EFC). We designed and deployed a pilot data management and computation
infrastructure that provides IVOA-compliant VOSpace storage resources and
wide access to interoperable federated clouds. In this paper, we detail the
main user requirements covered, the technical choices and the implemented
solutions and we describe the resulting Hybrid cloud Worldwide infrastructure, its benefits and limitation
The Inter-cloud meta-scheduling
Inter-cloud is a recently emerging approach that expands cloud elasticity. By facilitating an adaptable setting, it purposes at the realization of a scalable resource provisioning that enables a diversity of cloud user requirements to be handled efficiently. This study’s contribution is in the inter-cloud performance optimization of job executions using metascheduling concepts. This includes the development of the inter-cloud meta-scheduling (ICMS) framework, the ICMS optimal schemes and the SimIC toolkit. The ICMS model is an architectural strategy for managing and scheduling user services in virtualized dynamically inter-linked clouds. This is achieved by the development of a model that includes a set of algorithms, namely the Service-Request, Service-Distribution, Service-Availability and Service-Allocation algorithms. These along with resource management optimal schemes offer the novel functionalities of the ICMS where the message exchanging implements the job distributions method, the VM deployment offers the VM management features and the local resource management system details the management of the local cloud schedulers. The generated system offers great flexibility by facilitating a lightweight resource management methodology while at the same time handling the heterogeneity of different clouds through advanced service level agreement coordination. Experimental results are productive as the proposed ICMS model achieves enhancement of the performance of service distribution for a variety of criteria such as service execution times, makespan, turnaround times, utilization levels and energy consumption rates for various inter-cloud entities, e.g. users, hosts and VMs. For example, ICMS optimizes the performance of a non-meta-brokering inter-cloud by 3%, while ICMS with full optimal schemes achieves 9% optimization for the same configurations. The whole experimental platform is implemented into the inter-cloud Simulation toolkit (SimIC) developed by the author, which is a discrete event simulation framework
DRIVE: A Distributed Economic Meta-Scheduler for the Federation of Grid and Cloud Systems
The computational landscape is littered with islands of disjoint resource providers including
commercial Clouds, private Clouds, national Grids, institutional Grids, clusters, and data centers.
These providers are independent and isolated due to a lack of communication and coordination,
they are also often proprietary without standardised interfaces, protocols, or execution environments.
The lack of standardisation and global transparency has the effect of binding consumers
to individual providers. With the increasing ubiquity of computation providers there is an opportunity
to create federated architectures that span both Grid and Cloud computing providers
effectively creating a global computing infrastructure. In order to realise this vision, secure and
scalable mechanisms to coordinate resource access are required. This thesis proposes a generic
meta-scheduling architecture to facilitate federated resource allocation in which users can provision
resources from a range of heterogeneous (service) providers.
Efficient resource allocation is difficult in large scale distributed environments due to the inherent
lack of centralised control. In a Grid model, local resource managers govern access to a
pool of resources within a single administrative domain but have only a local view of the Grid
and are unable to collaborate when allocating jobs. Meta-schedulers act at a higher level able to
submit jobs to multiple resource managers, however they are most often deployed on a per-client
basis and are therefore concerned with only their allocations, essentially competing against one
another. In a federated environment the widespread adoption of utility computing models seen in
commercial Cloud providers has re-motivated the need for economically aware meta-schedulers.
Economies provide a way to represent the different goals and strategies that exist in a competitive
distributed environment. The use of economic allocation principles effectively creates an
open service market that provides efficient allocation and incentives for participation.
The major contributions of this thesis are the architecture and prototype implementation of the
DRIVE meta-scheduler. DRIVE is a Virtual Organisation (VO) based distributed economic metascheduler
in which members of the VO collaboratively allocate services or resources. Providers
joining the VO contribute obligation services to the VO. These contributed services are in effect
membership “dues” and are used in the running of the VOs operations – for example allocation,
advertising, and general management. DRIVE is independent from a particular class of provider
(Service, Grid, or Cloud) or specific economic protocol. This independence enables allocation in
federated environments composed of heterogeneous providers in vastly different scenarios. Protocol
independence facilitates the use of arbitrary protocols based on specific requirements and
infrastructural availability. For instance, within a single organisation where internal trust exists,
users can achieve maximum allocation performance by choosing a simple economic protocol.
In a global utility Grid no such trust exists. The same meta-scheduler architecture can be used
with a secure protocol which ensures the allocation is carried out fairly in the absence of trust.
DRIVE establishes contracts between participants as the result of allocation. A contract describes
individual requirements and obligations of each party. A unique two stage contract negotiation
protocol is used to minimise the effect of allocation latency. In addition due to the co-op nature of
the architecture and the use of secure privacy preserving protocols, DRIVE can be deployed in a
distributed environment without requiring large scale dedicated resources.
This thesis presents several other contributions related to meta-scheduling and open service
markets. To overcome the perceived performance limitations of economic systems four high utilisation
strategies have been developed and evaluated. Each strategy is shown to improve occupancy,
utilisation and profit using synthetic workloads based on a production Grid trace. The
gRAVI service wrapping toolkit is presented to address the difficulty web enabling existing applications.
The gRAVI toolkit has been extended for this thesis such that it creates economically
aware (DRIVE-enabled) services that can be transparently traded in a DRIVE market without requiring
developer input. The final contribution of this thesis is the definition and architecture of
a Social Cloud – a dynamic Cloud computing infrastructure composed of virtualised resources
contributed by members of a Social network. The Social Cloud prototype is based on DRIVE
and highlights the ease in which dynamic DRIVE markets can be created and used in different
domains
Automated Bidding in Computing Service Markets. Strategies, Architectures, Protocols
This dissertation contributes to the research on Computational Mechanism Design by providing novel theoretical and software models - a novel bidding strategy called Q-Strategy, which automates bidding processes in imperfect information markets, a software framework for realizing agents and bidding strategies called BidGenerator and a communication protocol called MX/CS, for expressing and exchanging economic and technical information in a market-based scheduling system
Supporting SLA Provisioning in Grids by Risk Management Processes
Gridtechnologien haben heutzutage einen hohen Entwicklungsstand erreicht, aber für die Etablierung eines kommerziellen Grids ist es erforderlich, Defizite in den Bereichen Sicherheit, Vertrauenswürdigkeit und Verlässlichkeit zu beheben. Anwender fordern eine Ausführung ihrer Applikation (Grid Jobs) gemäß einer gewünschten Priorität und Qualität. Um vertraglich derartige Aspekte einzufordern, können Service Level Agreements (SLAs) zwischen Dienstbenutzern und Dienstanbietern verhandelt werden. Dienstanbieter kennen jedoch die Unzuverlässigkeit von Grid Ressourcen und sind daher vorsichtig, strenge Forderungen zu akzeptieren und entsprechende Qualitäten zu garantieren. Können strenge Forderungen jedoch nicht vertraglich vereinbart werden, so bevorzugen es viele Anwender, eigene Rechenressourcen zu verwenden. Zwar ist die Unterhaltung eigener Ressourcen in vielen Fällen teurer, aber sie haben die Kontrolle über ihre Applikation, was ihnen mehr Sicherheit bietet. Für die Etablierung eines kommerziellen Grids ist es daher unerlässlich, dass Grid Provider auch strenge SLAs akzeptieren. Damit Provider strenge SLAs akzeptieren können, benötigen sie Abschätzungen dafür, dass sie die SLA nicht erfüllen können (Risikoberechnung). Des Weiteren sollten solche Abschätzungen als Entscheidungskriterium bei der Ressourcenallokation oder Initiierung von Fehlertoleranzmaßnahmen fungieren (Risikomanagement). Diese Arbeit integriert die Betrachtung von Risiken in die Abläufe des Providers, die in die Erbringung von SLAs involviert sind. Während der SLA Verhandlung wird evaluiert welche Ressourcen für die Diensterbringung verwendet werden. Basierend darauf wird die Fehlerwahrscheinlichkeit dieser Ressourcen und der SLA Erbringung im Gesamten berechnet. Falls die mögliche Fehlerwahrscheinlichkeit zu hoch ist, können risikoreduzierende Maßnahmen durchgeführt werden, so dass die SLA akzeptiert werden kann. Die berechnete Fehlerwahrscheinlichkeit wird von Provider und Benutzer ebenfalls bei der Bestimmung des Preises und der Konventionalstrafe betrachtet. Nach dem Vertragsabschluss ist es für die Vermeidung von SLA Verletzungen aus Grid Provider Sicht essentiell, Ressourcenausfälle kompensieren zu können. Die Verwendung von Fehlertoleranzmaßnahmen in Zusammenhang mit einer Risikobetrachtung unterstützt Grid Provider bei der Bewältigung dieser Aufgabe. Risikomanagementprozesse werden dabei direkt mit dem Ressourcenmanagement verknüpft und sind nicht sichtbar für Anwender. Ein wichtiger Aspekt des entwickelten Risikomanagements sind selbstorganisierende Mechanismen, die eine Fehlertoleranzmaßnahme oder eine Kette solcher initiieren, um auf Instabilitäten oder Ausfälle von Ressourcen zu reagieren. Für kommerzielle Grid Provider ist die Betrachtung finanzieller Aspekte im Ressourcenbetrieb und in der Diensterbringung stets von hoher Bedeutung. Folglich werden alle Entscheidungen unter Berücksichtigung finanzieller Aspekte getroffen, wie zum Beispiel der Gewinnmarge, den Kosten für eine Fehlertoleranzmaßnahme sowie dem erwarteten Profit für eine Jobausführung. Zusammengefasst gilt die Integration von Risikomanagement in die Abläufe eines Grid Providers als initialer Schritt für ein risikobetrachtendes Grid. Es wird die Transparenz, Zuverlässigkeit und Vertrauenswürdigkeit steigern und dient als objektives Kriterium für Entscheidungsprozesse im Ressourcenmanagement. Ein integriertes Risikomanagement bringt enorme Vorteile sowohl während der SLA Verhandlung als auch nach Vertragsabschluss - und damit insgesamt für die Diensterbringung im Rahmen von SLAs.Grid technologies have reached a high level of development, however core shortcomings have been identified relating to security, trust, and dependability of the Grid which reduce its appeal to potential commercial adopters. Users require a job execution with a desired priority and quality. In order to stipulate such requirements, Service Level Agreements (SLA) can be negotiated. These are a powerful instrument enabling the specification of the business relationships between service providers and service users in detail. However, providers are aware of various threats for SLA violations and are reluctant to adopt a mechanism which requires them to meet strict requirements and to guarantee associated quality constraints. If strict guarantees cannot be agreed by contract, many users prefer to operate their own resources instead of using the Grid. This is more expensive but they control their applications, which removes the issues of trust and ensures dependability concerning its successful completion. To establish a commercial Grid environment, it is essential that Grid providers are prepared to accept an approach involving SLAs with associated guarantees. In order to enable providers to accept such SLAs, they need estimates of the likelihood that they are unable to fulfill an SLA, i. e. Risk Assessment. Furthermore the resource management should take into account such estimations when allocating resources or initiating fault-tolerance mechanisms, i. e. Risk Management. This work integrates risk awareness in the provider’s processes which are involved in SLA provisioning: During SLA negotiation they evaluate which resources can be used for service provisioning and estimate the Probability of Failure (PoF) of resources and of fulfilling the SLA. If the estimated PoF is too high, then, by applying risk reduction mechanisms, the provider may be able to reduce it sufficiently to accept the SLA. The estimated PoF will also be considered by the service provider and service consumer when determining the revenue and the contractual penalty. Compared to a service request requiring a relatively low quality of service, providing a more reliable service requires to receive a higher price since more guarantees have to be ensured. If a more reliable service is provided, the consumer might also define a higher contractual penalty. Thus, the PoF is an additional decision making element in the SLA negotiation since it enables end-users to compare different SLA offers by an objective measurement. When providers have accepted an SLA, they have to be able to compensate for resource failures in order to prevent SLA violations. The usage of fault-tolerance mechanisms combined with risk awareness support Grid providers in this task. The Risk Management processes are interlaced with the resource management and thereby transparent for Grid service consumers. An important aspect of the Risk Management developed for the Grid are self-organising mechanisms, which initiate a fault-tolerance action or a chain of them, in order to manage resource instabilities or resource outages. Decisions are made on the basis of financial considerations, such as the profit margin, the cost for performing fault-tolerance, and the expected profit when executing a job. Taking into account such financial factors is of high importance for commercial Grid providers. In conclusion, the integration of Risk Management in the processes of Grid providers is the initial step towards a risk aware Grid. It will increase transparency, reliability, and trust and provides an objective basis for decision processes in the resource management. Risk Management is integrated to address the SLA negotiation as well as the post-negotiation phase and thereby improves the SLA provisioning process in general
Autonomous grid scheduling using probabilistic job runtime scheduling
Computational Grids are evolving into a global, service-oriented architecture –
a universal platform for delivering future computational services to a range of
applications of varying complexity and resource requirements. The thesis focuses
on developing a new scheduling model for general-purpose, utility clusters
based on the concept of user requested job completion deadlines. In such a
system, a user would be able to request each job to finish by a certain deadline,
and possibly to a certain monetary cost. Implementing deadline scheduling is
dependent on the ability to predict the execution time of each queued job, and
on an adaptive scheduling algorithm able to use those predictions to maximise
deadline adherence. The thesis proposes novel solutions to these two problems
and documents their implementation in a largely autonomous and self-managing
way.
The starting point of the work is an extensive analysis of a representative
Grid workload revealing consistent workflow patterns, usage cycles and correlations between the execution times of jobs and its properties commonly collected
by the Grid middleware for accounting purposes. An automated approach is
proposed to identify these dependencies and use them to partition the highly
variable workload into subsets of more consistent and predictable behaviour.
A range of time-series forecasting models, applied in this context for the first
time, were used to model the job execution times as a function of their historical
behaviour and associated properties. Based on the resulting predictions of job
runtimes a novel scheduling algorithm is able to estimate the latest job start
time necessary to meet the requested deadline and sort the queue accordingly to
minimise the amount of deadline overrun.
The testing of the proposed approach was done using the actual job trace
collected from a production Grid facility. The best performing execution time
predictor (the auto-regressive moving average method) coupled to workload
partitioning based on three simultaneous job properties returned the median
absolute percentage error centroid of only 4.75%. This level of prediction
accuracy enabled the proposed deadline scheduling method to reduce the average deadline overrun time ten-fold compared to the benchmark batch scheduler.
Overall, the thesis demonstrates that deadline scheduling of computational
jobs on the Grid is achievable using statistical forecasting of job execution times
based on historical information. The proposed approach is easily implementable,
substantially self-managing and better matched to the human workflow making
it well suited for implementation in the utility Grids of the future
Autonomous grid scheduling using probabilistic job runtime forecasting.
Computational Grids are evolving into a global, service-oriented architecture a universal platform for delivering future computational services to a range of applications of varying complexity and resource requirements. The thesis focuses on developing a new scheduling model for general-purpose, utility clusters based on the concept of user requested job completion deadlines. In such a system, a user would be able to request each job to finish by a certain deadline. and possibly to a certain monetary cost. Implementing deadline scheduling is dependent on the ability to predict the execution time of each queued job. and on an adaptive scheduling algorithm able to use those predictions to maximise deadline adherence. The thesis proposes novel solutions to these two problems and documents their implementation in a largely autonomous and self-managing way. The starting point of the work is an extensive analysis of a representative Grid workload revealing consistent workflow patterns, usage cycles and correlations between the execution times of jobs and its properties commonly collected by the Grid middleware for accounting purposes. An automated approach is proposed to identify these dependencies and use them to partition the highly variable workload into subsets of more consistent and predictable behaviour. A range of time-series forecasting models, applied in this context for the first time, were used to model the job execution times as a function of their historical behaviour and associated properties. Based on the resulting predictions of job runtimes a novel scheduling algorithm is able to estimate the latest job start time necessary to meet the requested deadline and sort the queue accordingly to minimise the amount of deadline overrun. The testing of the proposed approach was done using the actual job trace collected from a production Grid facility. The best performing execution time predictor (the auto-regressive moving average method) coupled to workload partitioning based on three simultaneous job properties returned the median absolute percentage error eentroid of only 4.75CX. This level of prediction accuracy enabled the proposed deadline scheduling method to reduce the average deadline overrun time ten-fold compared to the benchmark batch scheduler. Overall, the thesis demonstrates that deadline scheduling of computational jobs on the Grid is achievable using statistical forecasting of job execution times based on historical information. The proposed approach is easily implementable, substantially self-managing and better matched to the human workflow making it well suited for implementation in the utility Grids of the future
Recommended from our members
A grid computing framework for commercial simulation packages
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.An increased need for collaborative research among different organizations, together with continuing advances in communication technology and computer hardware, has facilitated the development of distributed systems that can provide users non-trivial access to geographically dispersed computing resources (processors, storage, applications, data, instruments, etc.) that are administered in multiple computer domains. The term grid computing or grids is popularly used to refer to such distributed systems. A broader definition of grid computing includes the use of computing resources within an organization for running organization-specific applications. This research is in the context of using grid computing within an enterprise to maximize the use of available hardware and software resources for processing enterprise applications. Large scale scientific simulations have traditionally been the primary benefactor of grid computing. The application of this technology to simulation in industry has, however, been negligible. This research investigates how grid technology can be effectively exploited by simulation practitioners using Windows-based commercially available simulation packages to model simulations in industry. These packages are commonly referred to as Commercial Off-The-Shelf (COTS) Simulation Packages (CSPs). The study identifies several higher level grid services that could be potentially used to support the practise of simulation in industry. It proposes a grid computing framework to investigate these services in the context of CSP-based simulations. This framework is called the CSP-Grid Computing (CSP-GC) Framework. Each identified higher level grid service in this framework is referred to as a CSP-specific service. A total of six case studies are presented to experimentally evaluate how grid computing technologies can be used together with unmodified simulation packages to support some of the CSP-specific services. The contribution of this thesis is the CSP-GC framework that identifies how simulation practise in industry may benefit from the use of grid technology. A further contribution is the recognition of specific grid computing software (grid middleware) that can possibly be used together with existing CSPs to provide grid support. With its focus on end-users and end-user tools, it is intended that this research will encourage wider adoption of grid computing in the workplace and that simulation users will derive benefit from using this technology