1,308 research outputs found
Partitioning workflow applications over federated clouds to meet non-functional requirements
PhD ThesisWith cloud computing, users can acquire computer resources when they need them
on a pay-as-you-go business model. Because of this, many applications are now being
deployed in the cloud, and there are many di erent cloud providers worldwide. Importantly,
all these various infrastructure providers o er services with di erent levels
of quality. For example, cloud data centres are governed by the privacy and security
policies of the country where the centre is located, while many organisations have
created their own internal \private cloud" to meet security needs.
With all this varieties and uncertainties, application developers who decide to host their
system in the cloud face the issue of which cloud to choose to get the best operational
conditions in terms of price, reliability and security. And the decision becomes even
more complicated if their application consists of a number of distributed components,
each with slightly di erent requirements.
Rather than trying to identify the single best cloud for an application, this thesis
considers an alternative approach, that is, combining di erent clouds to meet users'
non-functional requirements. Cloud federation o ers the ability to distribute a single
application across two or more clouds, so that the application can bene t from the
advantages of each one of them. The key challenge for this approach is how to nd the
distribution (or deployment) of application components, which can yield the greatest
bene ts. In this thesis, we tackle this problem and propose a set of algorithms, and a
framework, to partition a work
ow-based application over federated clouds in order to
exploit the strengths of each cloud. The speci c goal is to split a distributed application
structured as a work
ow such that the security and reliability requirements of each
component are met, whilst the overall cost of execution is minimised.
To achieve this, we propose and evaluate a cloud broker for partitioning a work
ow
application over federated clouds. The broker integrates with the e-Science Central
cloud platform to automatically deploy a work
ow over public and private clouds.
We developed a deployment planning algorithm to partition a large work
ow appli-
- i -
cation across federated clouds so as to meet security requirements and minimise the
monetary cost.
A more generic framework is then proposed to model, quantify and guide the partitioning
and deployment of work
ows over federated clouds. This framework considers
the situation where changes in cloud availability (including cloud failure) arise during
work
ow execution
Using Blockchain to support Data & Service Monetization
Two required features of a data monetization platform are query and retrieval of the metadata of the resources to be monetized. Centralized platforms rely on the maturity of traditional NoSQL database systems to support these features. These databases, for example, MongoDB allows for very efficient query and retrieval of data it stores. However, centralized platforms come with a bag of security and privacy concerns, making them not the ideal approach for a data monetization platform. On the other hand, most existing decentralized platforms are only partially decentralized. In this research, I developed Cowry, a platform for publishing metadata describing available resources (data or services), discovery of published metadata including fast search and filtering. My main contribution is a fully decentralized architecture that combines blockchain and traditional distributed database to gain additional features such as efficient query and retrieval of metadata stored on the blockchain
Trustworthy Federated Learning: A Survey
Federated Learning (FL) has emerged as a significant advancement in the field
of Artificial Intelligence (AI), enabling collaborative model training across
distributed devices while maintaining data privacy. As the importance of FL
increases, addressing trustworthiness issues in its various aspects becomes
crucial. In this survey, we provide an extensive overview of the current state
of Trustworthy FL, exploring existing solutions and well-defined pillars
relevant to Trustworthy . Despite the growth in literature on trustworthy
centralized Machine Learning (ML)/Deep Learning (DL), further efforts are
necessary to identify trustworthiness pillars and evaluation metrics specific
to FL models, as well as to develop solutions for computing trustworthiness
levels. We propose a taxonomy that encompasses three main pillars:
Interpretability, Fairness, and Security & Privacy. Each pillar represents a
dimension of trust, further broken down into different notions. Our survey
covers trustworthiness challenges at every level in FL settings. We present a
comprehensive architecture of Trustworthy FL, addressing the fundamental
principles underlying the concept, and offer an in-depth analysis of trust
assessment mechanisms. In conclusion, we identify key research challenges
related to every aspect of Trustworthy FL and suggest future research
directions. This comprehensive survey serves as a valuable resource for
researchers and practitioners working on the development and implementation of
Trustworthy FL systems, contributing to a more secure and reliable AI
landscape.Comment: 45 Pages, 8 Figures, 9 Table
Architecture for Provenance Systems
This document covers the logical and process architectures of provenance systems. The logical architecture identifies key roles and their interactions, whereas the process architecture discusses distribution and security. A fundamental aspect of our presentation is its technology-independent nature, which makes it reusable: the principles that are exposed in this document may be applied to different technologies
Distributed Management of Grid-based Scientific Workflows
Grids and service-oriented technologies are emerging as dominant approaches for distributed systems. With the evolution of these technologies, scientific workflows have been introduced as a tool for scientists to assemble highly specialized applications, and to exchange large heterogeneous datasets in order to automate and accelerate the accomplishment of complex scientific tasks. Several Scientific Workflow Management Systems (SWfMS) have already been designed to support the specification, execution, and monitoring of scientific workflows. Meanwhile, they still face key challenges from two different perspectives: system usability and system efficiency. From the system usability perspective, current SWfMS are not designed to be simple enough for scientists who have quite limited IT knowledge. Whatās more, there is no easy mechanism by which scientists can share and re-use scientific experiments that have already been designed and proved by others. From the perspective of system efficiency, existing SWfMS are coordinating and executing workflows in a centralized fashion using a single scheduler and / or a workflow enactor. This creates a single point of failure, forms a scalability bottleneck, and enforces centralized fault handling. In addition, they donāt consider load balancing while mapping abstract jobs onto several computational nodes. Another important challenge exists due to the common nature of scientific workflow applications, that need to exchange a huge amount of data during the execution process. Some available SWfMS use a mediator-based approach for data transfer where data must be transferred first to a centralized data manager, which is completely inefficient. Other SWfMS apply a peer-to-peer approach via data references. Even this approach is not sufficient for scientific workflows as a single complex scientific activity can produce an extensive amount of data. In this thesis, we introduce SWIMS (Scientific Workflow Integration and Management System) framework. It employs the Web Services technology to originate a distributed management system for data-intensive scientific workflows. The purpose of SWIMS is to overcome the previously mentioned challenges through a set of salient features: i) Support for distributed execution and management of workflows, ii) diminution of communication traffic, iii) support for smart re-run, iv) distributed fault handling and load balancing, v) ease of use, and vi) extensive sharing of scientific workflows. We discuss the motivation, design, and implementation of the SWIMS framework. Then, we evaluate it through the Montage application from the astronomy domain
An Architecture for Provenance Systems
This document covers the logical and process architectures of provenance systems. The logical architecture identifies key roles and their interactions, whereas the process architecture discusses distribution and security. A fundamental aspect of our presentation is its technology-independent nature, which makes it reusable: the principles that are exposed in this document may be applied to different technologies
Managing scientific data with named data networking
Many scientific domains, such as climate science and High Energy Physics (HEP), have data management requirements that are not well supported by the IP network architecture. Named Data Networking (NDN) is a new network architecture whose service model is better aligned with the needs of data-oriented applications. NDN provides features such as best-location retrieval, caching, load sharing, and transparent failover that would otherwise be painstakingly (re-)implemented by each application using point-to-point semantics in an IP network.
We present the first scientific data management application designed and implemented on top of NDN. We use this application to manage climate and HEP data over a dedicated, high-performance, testbed. Our application has two main components: a UI for dataset discovery queries and a federation of synchronized name catalogs. We show how NDN primitives can be used to implement common data management operations such as publishing, search, efficient retrieval, and publication access control
- ā¦