280 research outputs found
CloudOps: Towards the Operationalization of the Cloud Continuum: Concepts, Challenges and a Reference Framework
The current trend of developing highly distributed, context aware, heterogeneous computing intense and data-sensitive applications is changing the boundaries of cloud computing. Encouraged by the growing IoT paradigm and with flexible edge devices available, an ecosystem of a combination of resources, ranging from high density compute and storage to very lightweight embedded computers running on batteries or solar power, is available for DevOps teams from what is known as the Cloud Continuum. In this dynamic context, manageability is key, as well as controlled operations and resources monitoring for handling anomalies. Unfortunately, the operation and management of such heterogeneous computing environments (including edge, cloud and network services) is complex and operators face challenges such as the continuous optimization and autonomous (re-)deployment of context-aware stateless and stateful applications where, however, they must ensure service continuity while anticipating potential failures in the underlying infrastructure. In this paper, we propose a novel CloudOps workflow (extending the traditional DevOps pipeline), proposing techniques and methods for applications’ operators to fully embrace the possibilities of the Cloud Continuum. Our approach will support DevOps teams in the operationalization of the Cloud Continuum. Secondly, we provide an extensive explanation of the scope, possibilities and future of the CloudOps.This research was funded by the European project PIACERE (Horizon 2020 Research and Innovation Programme, under grant agreement No. 101000162)
CloudOps: Towards the Operationalization of the Cloud Continuum: Concepts, Challenges and a Reference Framework
The current trend of developing highly distributed, context aware, heterogeneous computing intense and data-sensitive applications is changing the boundaries of cloud computing. Encouraged by the growing IoT paradigm and with flexible edge devices available, an ecosystem of a combination of resources, ranging from high density compute and storage to very lightweight embedded computers running on batteries or solar power, is available for DevOps teams from what is known as the Cloud Continuum. In this dynamic context, manageability is key, as well as controlled operations and resources monitoring for handling anomalies. Unfortunately, the operation and management of such heterogeneous computing environments (including edge, cloud and network services) is complex and operators face challenges such as the continuous optimization and autonomous (re-)deployment of context-aware stateless and stateful applications where, however, they must ensure service continuity while anticipating potential failures in the underlying infrastructure. In this paper, we propose a novel CloudOps workflow (extending the traditional DevOps pipeline), proposing techniques and methods for applications’ operators to fully embrace the possibilities of the Cloud Continuum. Our approach will support DevOps teams in the operationalization of the Cloud Continuum. Secondly, we provide an extensive explanation of the scope, possibilities and future of the CloudOps.This research was funded by the European project PIACERE (Horizon 2020 Research and Innovation Programme, under grant agreement No. 101000162)
Parallel database operations in heterogeneous environments
Im Gegensatz zu dem traditionellen Begriff eines Supercomputers, der aus vielen mittels
superschneller, lokaler Netzwerkverbindungen miteinander verbundenen Superrechnern
besteht, basieren heterogene Computerumgebungen auf "kompletten" Computersystemen,
die mit Hilfe eines herkömmlichen Netzwerkanschlusses an private oder öffentliche Netzwerke angeschlossen sind. Der Bereich des Computernetzwerkens hat sich über die letzten drei Jahrzehnte entwickelt und ist, wie viele andere Technologien, in bezug auf Performance, Funktionalität und Verlässlichkeit extrem gewachsen. Zu Beginn des 21.Jahrhunderts zählt das betriebssichere Hochgeschwindigkeitsnetz genauso zur Alltäglichkeit wie Elektrizität, und auch Rechnerressourcen sind, was Verfügbarkeit und universellen Gebrauch anbelangt, ebenso Standard wie elektrischer Strom.
Wissenschafter haben für die Verwendung von heterogenen Grids bei verschiedenen rechenintensiven Applikationen eine Architektur von computational Grids konzipiert und
darin Modelle aufgesetzt, die zum einen Rechenleistungen defnieren und zum anderen
die komplexen Eigenschaften der Grid-Organisation vor den Benutzern verborgen halten.
Somit wird die Verwendung für den Benutzer genauso einfach wie es möglich ist elektrischen Strom zu beziehen. Grundsätzlich existiert keine generell akzeptierte Definition für Grids. Einige Wissenschafter bezeichnen sie als hochleistungsfähige verteilte Umgebung.
Manche berücksichtigen bei der Definierung auch die geographische Verteilung und ihre
Multi-Domain-Eigenschaft. Andere Wissenschafter wiederum definieren Grids über die
Anzahl der Ressourcen, die sie verbinden.
Parallele Datenbanksysteme haben in den letzten zwei Jahrzehnten große Bedeutung
erlangt, da das rechenintensive wissenschaftliche Arbeiten, wie z.B. auf dem Gebiet der
Bioinformatik, Strömungslehre und Hochenergie physik die Verarbeitung riesiger verteilter
Datensätze erfordert. Diese Tendenz resultierte daraus, dass man von der fehlgeschlagenen
Entwicklung hochspezialisierter Datenbankmaschinen zur Verwendung herkömmlicher
paralleler Hardware-Architekturen übergegangen ist. Grundsätzlich wird die gleichzeitige
Abarbeitung entweder durch verteilte Datenbankoperationen oder durch Datenparallelität
gelöst. Im ersten Fall wird ein unterteilter Abfragenabarbeitungsplan durch verschiedene
Datenbankoperatoren parallel durchgeführt. Im Fall der Datenparallelität erfolgt eine
Unterteilung der Daten, wobei mehrere Prozessoren die gleichen Operationen parallel an
Teilen der Daten durchführen.
Es liegen genaue Analysen von parallelen Datenbank-Arbeitsvorgängen für sequenzielle
Prozessoren vor. Eine Reihe von Publikationen haben dieses Thema abgehandelt
und dabei Vorschläge und Analysen für parallele Datenbankmaschinen erstellt.
Bis dato existiert allerdings noch keine spezifische Analyse paralleler Algorithmen mit dem
Fokus der speziellen Eigenschaften einer "Grid"-Infrastruktur.
Der spezifische Unterschied liegt in der Heterogenität von Grid-Ressourcen. In "shared
nothing"-Architekturen, wie man sie bei klassischen Supercomputern und Cluster-
Systemen vorfindet, sind alle Ressourcen wie z.B. Verarbeitungsknoten, Festplatten und
Netzwerkverbindungen angesichts ihrer Leistung, Zugriffszeit und Bandbreite üblicherweise
gleich (homogen). Im Gegensatz dazu zeigen Grid-Architekturen heterogene Ressourcen
mit verschiedenen Leistungseigenschaften. Der herausfordernde Aspekt dieser Arbeit bestand
darin aufzuzeigen, wie man das Problem heterogener Ressourcen löst, d.h. diese Ressourcen einerseits zur Leistungsmaximierung und andererseits zur Definition von Algorithmen
einsetzt, um die Arbeitsablauf-Orchestrierung von Datenbankprozessoren zu
optimieren.
Um dieser Herausforderung gerecht werden zu können, wurde ein mathematisches Modell
zur Untersuchung des Leistungsverhaltens paralleler Datenbankoperationen in heterogenen
Umgebungen, wie z.B. in Grids, basierend auf generalisierten Multiprozessor-
Architekturen entwickelt. Es wurden dabei sowohl die Parameter und deren Einfluss auf
die Leistung als auch das Verhalten der Algorithmen in heterogenen Umgebungen beobachtet.
Dabei konnte man feststellen, dass kleine Anpassungen an den Algorithmen zur
signifikanten Leistungsverbesserung heterogener Umgebungen führen. Weiters wurde eine
graphische Darstellung der Knotenkonfiguration entwickelt und ein optimierter Algorithmus,
mit dem ein optimaler Knoten zur Ausführung von Datenbankoperationen gefunden
werden kann.
Diese Ergebnisse zum neuen Algorithmus wurden durch die Implementierung in einer serviceorientierten Architektur (SODA) bestätigt. Durch diese Implementierung konnte
die Gültigkeit des Modells und des neu entwickelten optimierten Algorithmus nachgewiesen
werden.
In dieser Arbeit werden auch die Möglichkeiten für eine brauchbare Erweiterung des
vorgestellten Modells gezeigt, wie z.B. für den Einsatz von Leistungskennziffern für Algorithmen zur Findung optimaler Knoten, die Verlässlichkeit der Knoten oder Vorgehensweisen/Lösungsaufgaben zur dynamischen Optimierung von Arbeitsabläufen.In contrast to the traditional notion of a supercomputer, which has many processors connected by a local high-speed computer bus, heterogeneous computing environments rely on "complete" computer nodes (CPU, storage, network interface, etc.) connected to a private or public network by a conventional network interface. Computer networking has evolved over the past three decades, and, like many technologies, has grown exponentially in terms of performance, functionality and reliability. At the beginning of the twenty-first century, high-speed, highly reliable Internet connectivity has become as commonplace as electricity, and computing resources have become as standard in terms of availability and universal use as electrical power.
To use heterogeneous Grids for various applications requiring high-processing power, researchers propose the notion of computational Grids where rules are defined relating to both services and hiding the complexity of the Grid organization from the users. Thus, users would find it as easy to use as electrical power.
Generally, there is no widely accepted definition of Grids. Some researchers define it as a high-performance distributed environment. Some take into consideration its geographically distributed, multi-domain feature. Others define Grids based on the number of resources they unify.
Parallel database systems gained an important role in database research
over the past two decades due to the necessity of handling large distributed datasets for scientific computing such as bioinformatics, fluid dynamics and high energy physics (HEP). This was connected with the shift from the (actually failed) development of highly specialized database machines to the usage of conventional parallel hardware architectures. Generally, concurrent execution is employed either by database operator or data parallelism. The first is achieved through parallel execution of a partitioned query execution plan by different operators, while the latter is achieved through parallel execution of the same operation on the partitioned data among multiple processors.
Parallel database operation algorithms have been well analyzed for sequential processors. A number of publications have covered this topic which proposed and analyzed these algorithms for parallel database machines. Until now, to the best knowledge of the author, no specific analysis has been done so far on parallel algorithms with a focus on the specific characteristics of a Grid infrastructure.
The specific difference lies in the heterogeneous nature of Grid resources. In a "shared nothing architecture", which can be found in classical supercomputers and cluster systems, all resources such as processing nodes, disks and network interconnection have typically homogeneous characteristics as regards to performance, access time and bandwidth.
In contrast, in a Grid architecture heterogeneous resources are found that show different performance characteristics. The challenge of this research is to discover the way how to cope with or to exploit this situation to maximize performance and to define algorithms that lead to a solution for an optimized workflow orchestration.
To address this challenge, we developed a mathematical model to investigate the performance behavior of parallel database operations in heterogeneous environments, such as a Grid, based on generalized multiprocessor architecture. We also studied the parameters and their influence on the performance as well as the behavior of the algorithms in heterogeneous environments. We discovered that only a small adjustment on the algorithm is necessary to significantly improve the performance for heterogeneous environments. A graphical representation of the node configuration and an optimized algorithm for finding the optimal node configuration for the execution of the parallel binary merge sort have been developed.
Finally, we have proved our findings of the new algorithm by implementing it on a service-orientated infrastructure (SODA). The model and our new developed modified algorithms have been verified with the implementation.
We also give an outlook of useful extensions to our model e.g. using performance indices, reliability of the nodes and approaches for dynamic optimization of workflow
Realising the Network Service Federation vision
In Press / En PrensaThe 5G-TRANSFORMER project proposes an NFV/SDN-based architecture to manage the end-to-end deployment of composite NFV network services, which may involve multiple administrative domains, hence, requiring network service federation capabilities. At the architectural level, this article presents the service federation functionality of the 5G-TRANSFORMER service orchestrator. It covers the gaps identified in ETSI NFV reports and specifications (e.g., IFA028). Some recommendations are also presented based on this experience, particularly on the relevance of multi-domain resource orchestration. Experimental results show that the federated service under evaluation is deployed in less than 5 minutes. Time profiling of the various processing federation-related operation shows its reduced impact in the experienced deployment time. A comparison of service deployments of increasing complexity also offers valuable insights.This work has been partially funded by the EC H2020 5G-Transformer Project (grant no. 761536), by MINECO grant TEC2017-88373-R (5G-REFINE) and Generalitat de Catalunya grant 2017 SGR 1195
Contribución a la estimulación del uso de soluciones Cloud Computing: Diseño de un intermediador de servicios Cloud para fomentar el uso de ecosistemas distribuidos digitales confiables, interoperables y de acuerdo a la legalidad. Aplicación en entornos multi-cloud.
184 p.El objetivo del trabajo de investigación presentado en esta tesis es facilitar a los desarrolladores y operadores de aplicaciones desplegadas en múltiples Nubes el descubrimiento y la gestión de los diferentes servicios de Computación, soportando su reutilización y combinación, para generar una red de servicios interoperables, que cumplen con las leyes y cuyos acuerdos de nivel de servicio pueden ser evaluados de manera continua. Una de las contribuciones de esta tesis es el diseño y desarrollo de un bróker de servicios de Computación llamado ACSmI (Advanced Cloud Services meta-Intermediator). ACSmI permite evaluar el cumplimiento de los acuerdos de nivel de servicio incluyendo la legislación. ACSmI también proporciona una capa de abstracción intermedia para los servicios de Computación donde los desarrolladores pueden acceder fácilmente a un catálogo de servicios acreditados y compatibles con los requisitos no funcionales establecidos.Además, este trabajo de investigación propone la caracterización de las aplicaciones nativas multiNube y el concepto de "DevOps extendido" especialmente pensado para este tipo de aplicaciones. El concepto "DevOps extendido" pretende resolver algunos de los problemas actuales del diseño, desarrollo, implementación y adaptación de aplicaciones multiNube, proporcionando un enfoque DevOps novedoso y extendido para la adaptación de las prácticas actuales de DevOps al paradigma multiNube
mkite: A distributed computing platform for high-throughput materials simulations
Advances in high-throughput simulation (HTS) software enabled computational
databases and big data to become common resources in materials science.
However, while computational power is increasingly larger, software packages
orchestrating complex workflows in heterogeneous environments are scarce. This
paper introduces mkite, a Python package for performing HTS in distributed
computing environments. The mkite toolkit is built with the server-client
pattern, decoupling production databases from client runners. When used in
combination with message brokers, mkite enables any available client to perform
calculations without prior hardware specification on the server side.
Furthermore, the software enables the creation of complex workflows with
multiple inputs and branches, facilitating the exploration of combinatorial
chemical spaces. Software design principles are discussed in detail,
highlighting the usefulness of decoupling simulations and data management tasks
to diversify simulation environments. To exemplify how mkite handles simulation
workflows of combinatorial systems, case studies on zeolite synthesis and
surface catalyst discovery are provided. Finally, key differences with other
atomistic simulation workflows are outlined. The mkite suite can enable HTS in
distributed computing environments, simplifying workflows with heterogeneous
hardware and software, and helping deployment of calculations at scale.Comment: preprint; code available soo
- …