12 research outputs found

    A horizontally-scalable multiprocessing platform based on Node.js

    Full text link
    This paper presents a scalable web-based platform called Node Scala which allows to split and handle requests on a parallel distributed system according to pre-defined use cases. We applied this platform to a client application that visualizes climate data stored in a NoSQL database MongoDB. The design of Node Scala leads to efficient usage of available computing resources in addition to allowing the system to scale simply by adding new workers. Performance evaluation of Node Scala demonstrated a gain of up to 74 % compared to the state-of-the-art techniques.Comment: 8 pages, 7 figures. Accepted for publication as a conference paper for the 13th IEEE International Symposium on Parallel and Distributed Processing with Applications (IEEE ISPA-15

    A container orchestration development that optimizes the etherpad collaborative editing tool through a novel management system

    Full text link
    The use of collaborative tools has notably increased recently. It is common to see distinct users that need to work simultaneously on shared documents. In most cases, large companies provide tools whose implementations have been a very complicated and expensive task. Likewise, their platform deployment requirements should be robust hardware infrastructures. It becomes even more critical when their main target is to reach scalability and highavailability. Therefore, this study aims to design and implement a microservices-based collaborative architecture using assembled containers in the cloud, enabling them to deploy Etherpad instances to guarantee high availability. To ensure such a task, we developed and optimized a central management system that creates Etherpad instances and continuously interacts with other Etherpad tools running on Docker containers. This design goes from the monolithic Etherpad instantiation and handling towards a service architecture, where every Etherpad is offered as a microservice. Furthermore, the management system follows (implements) the Observer, Factory Method, Proxy, and Service Layerpopular design patterns. This allows users to gain more privacy through access to validations and shared resources. Our results indicate both the correct operation in the automation of containers’ creation for new users who register in the system and quantifiable improvement in performance.The funding of this research is provided by the Mobility Regulation of the Universidad de las Fuerzas Armadas ESPE, from Sangolquí, Ecuado

    OpenDataHub: an open dataset management system

    Get PDF
    This thesis presents a cloud-based software platform for sharing publicly available scientific datasets. The proposed platform leverages the potential of NoSQL databases and asynchronous IO technologies, such as Node.JS, in order to achieve high performances and flexible solutions. This solution will serve two main groups of users. The dataset providers, which are the researchers responsible for sharing and maintaining datasets, and the dataset users, that are those who desire to access the public data. To the former are given tools to easily publish and maintain large volumes of data, whereas the later are given tools to enable the preview and creation of subsets of the original data through the introduction of filter and aggregation operations. The choice of NoSQL over more traditional RDDMS emerged from and extended benchmark between relational databases (MySQL) and NoSQL (MongoDB) that is also presented in this thesis. The obtained results come to confirm the theoretical guarantees that NoSQL databases are more suitable for the kind of data that our system users will be handling, i. e., non-homogeneous data structures that can grow really fast. It is envisioned that a platform like this can lead the way to a new era of scientific data sharing where researchers are able to easily share and access all kinds of datasets, and even in more advanced scenarios be presented with recommended datasets and already existing research results on top of those recommendations

    Advanced Elastic Platforms for High Throughput Computing on Container-based and Serverless Infrastructures

    Full text link
    [ES] El principal objetivo de esta tesis es ofrecer a los usuarios científicos un modo de crear y ejecutar aplicaciones sin servidor (i.e. serverless) altamente paralelas, dirigidas por eventos y orientadas al procesado de datos, tanto en proveedores en la nube públicos (e.g. AWS) como privados (e.g. OpenNebula, OpenStack). Para llevar a cabo dicho objetivo, se han desarrollado e integrado diferentes herramientas que ofrecen una vía para desplegar aplicaciones de computación de altas prestaciones basadas en contenedores, que además pueden beneficiarse de la alta escalabilidad presente en los entornos serverless. Primero se ha creado una herramienta que permite el despliegue de cargas de trabajo genéricas en el proveedor público AWS. Esta herramienta posibilita que se puedan aprovechar las funcionalidades de AWS Lambda (e.g. alta escalabilidad, computación basada en eventos) para el despliegue y la integración de aplicaciones computacionalmente intensivas que usan el modelo de funciones como servicio (FaaS). En segundo lugar se ha desarrollado un modelo de programación de alto rendimiento para el procesado de datos y orientado a eventos que permite a los usuarios desplegar flujos de trabajo como un conjunto de funciones serverless, a la vez que ofrece una gestión transparente de los datos. En tercer lugar, para poder superar los problemas presentes en los proveedores públicos (e.g. tiempo de ejecución limitado), se ha creado una plataforma que facilita el uso del modelo FaaS en infraestructuras privadas. Esta plataforma también puede ser desplegada automáticamente en distintos proveedores públicos de la nube. Finalmente, para comprobar y validar las diferentes herramientas y plataformas desarrolladas, se han probado diferentes casos de uso con interés tanto para investigación como para la empresa.[CA] El principal objectiu d'aquesta tesi és oferir als usuaris científics una manera de crear i executar aplicacions sense servidor (i.e. serverless) altament paral·leles, dirigides per esdeveniments i orientades al processament de dades, tant en proveïdors en núvol públics (e.g. AWS) com en privats (e.g. OpenNebula, OpenStack). Per a dur a terme aquest objectiu, s'ha desenvolupat e integrat diferents eines que ofereixen una via per desplegar aplicacions de computació d'altes prestacions basades en contenidors, alhora que es poden beneficiar de l'alta escalabilitat present en els entorns serverless. Primerament, s'ha creat una eina que possibilita el desplegament de càrregues de treball genèriques al proveïdor públic en núvol AWS. Aquesta eina permet aprofitar les funcionalitats de AWS Lambda (e.g. alta escalabilitat, computació basada en esdeveniments) per al desplegament i la integració d'aplicacions computacionalment intensives que fan ús del model de funcions com a servei (FaaS). En segon lloc, s'ha desenvolupat un model de programació d'alt rendiment per al processament de dades i orientat a esdeveniments, que permet als usuaris desplegar fluxos de treball com un conjunt de funcions serverless, alhora que ofereix una gestió transparent de les dades. En tercer lloc, per a superar els problemes presents als proveïdors públics (e.g. temps d'execució limitat) s'ha creat una plataforma que permet utilitzar el model FaaS en infraestructures privades. A més, aquesta plataforma pot ser desplegada automàticament en múltiples proveïdors públics en núvol. Finalment, per a comprobar i validar les diferents eines i plataformes dutes a terme, s'han provat diferents casos d'ús amb interès tant per a la recerca com per a l'empresa.[EN] The main objective of this thesis is to allow scientific users to deploy and execute highly-parallel event-driven file-processing serverless applications both in public (e.g. AWS), and in private (e.g. OpenNebula, OpenStack) cloud infrastructures. To achieve this objective, different tools and platforms are developed and integrated to provide scientific users with a way for deploying High Throughput Computing applications based on containers that can benefit from the high elasticity capabilities of the serverless environments. First, an open-source tool to deploy generic serverless workloads in the AWS public Cloud provider has been created. This tool allows the scientific users to benefit from the features of AWS Lambda (e.g. high scalability, event-driven computing) for the deployment and integration of compute-intensive applications that use the Functions as a Service (FaaS) model. Second, an event-driven file-processing high-throughput programming model has been developed to allow the users deploy generic applications as workflows of functions in serverless architectures, offering transparent data management. Third, in order to overcome the drawbacks of public serverless services such as limited execution time or computing capabilities, an open-source platform to support FaaS for compute-intensive applications in on-premises Clouds was created. The platform can be automatically deployed on multi-Clouds in order to create highly-parallel event-driven file-processing serverless applications. Finally, in order to assess and validate all the developed tools and platforms, several use cases with business and scientific backgrounds have been tested.Pérez González, AM. (2020). Advanced Elastic Platforms for High Throughput Computing on Container-based and Serverless Infrastructures [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/146365TESI

    Rise of the Planet of Serverless Computing: A Systematic Review

    Get PDF
    Serverless computing is an emerging cloud computing paradigm, being adopted to develop a wide range of software applications. It allows developers to focus on the application logic in the granularity of function, thereby freeing developers from tedious and error-prone infrastructure management. Meanwhile, its unique characteristic poses new challenges to the development and deployment of serverless-based applications. To tackle these challenges, enormous research efforts have been devoted. This paper provides a comprehensive literature review to characterize the current research state of serverless computing. Specifically, this paper covers 164 papers on 17 research directions of serverless computing, including performance optimization, programming framework, application migration, multi-cloud development, testing and debugging, etc. It also derives research trends, focus, and commonly-used platforms for serverless computing, as well as promising research opportunities

    Prototypische Implementation einer oBPM-Ausführungsumgebung : basierend auf der NoSQL-Datenbank ArangoDB

    Get PDF
    Das Konzept von Prozessautomatisierungs-Systemen ist bereits seit Jahrzehnten ein fester Bestandteil von Geschäftsorganisationen. Mit den Jahren haben sich anhand verschiedener Einsatzgebiete unterschiedliche Konzepte entwickelt, wie und auf welche Weise Prozessmodelle definiert und in Automatisierungslösungen implementiert werden. Neben den traditionellen control-flow-basierten Prozessmodellen haben sich dokument- und artefakt-zentrische Modellierungskonzepte durchgesetzt. Diese stellen die Dokumente und Artefakte eines Prozesses in den Mittelpunkt und fokussieren sich weniger auf den statischen Control-Flow traditioneller Prozessmodelle. Zu den bereits bestehenden dokument-zentrischen Prozessmodellen hat sich das Konzept des Opportunistic Business Process Modeling (oBPM) dazu gesellt. Im Rahmen dieser Arbeit wird ein Software-Prototyp basierend auf dem Datenbanksystem ArangoDB implementiert, auf dessen Basis oBPM-basierte Prozessmodelle definiert und ausgeführt werden können. Mit Hilfe des umgesetzten Prototypen wird geprüft, inwiefern sich ArangoDB für die Umsetzung eines oBPMSystems eignet hinsichtlich der Performance, Skalierbarkeit und weiteren nichtfunktionalen Anforderungen. Dazu werden in dieser Arbeit in einem ersten Schritt die Anforderungen an ein oBPM-Modellierungs- und Ausführungssystem analysiert und zusammengefasst. In einem nächsten Schritt wird der Funktionsumfang und die Einsatzmöglichkeiten von ArangoDB geprüft, um auf dieser Basis die zu implementierende Datenstruktur zu planen. Danach werden verschiedene Varianten von möglichen Systemarchitekturen evaluiert und miteinander verglichen. Nach Abschluss der Analyse wird die Umsetzung der Implementation aufgezeigt, hinsichtlich der Datenstrukturen und Applikationsschnittstellen. Als letzter Teil dieser Arbeit wird aufgezeigt, wie die umgesetzte Implementation bezüglich der funktionalen Anforderungen, der Performance und der Skalierbarkeit getestet wird. Anhand des in dieser Arbeit implementierten Prototypen kann aufgezeigt werden, dass sich die verwendeten Software-Komponenten, im Speziellen ArangoDB, sehr gut für die Umsetzung eines oBPM-Systems eignen. Alle funktionalen Anforderungen können im Prototypen umgesetzt werden. Vor allem das Multi-Model-Konzept von ArangoDB, welches dokumenten- und graphen-basierte Datenbankkonzepte vereint, eignet sich gut um die in der Modellierung nach oBPM anfallenden Datenstrukturen zu persistieren. Mit Hilfe von Performancetests anhand verschiedener Benutzungsszenarien kann aufgezeigt werden, dass die vom implementierten Prototyp erreichte Performance und Skalierbarkeit nicht für den produktiven Betrieb genügend ist. Die Reaktionszeit des Systems unter hoher Last übersteigt die in den Testszenarien definierten Richtwerten von unter 2 Sekunden beträchtlich. Nichtsdestotrotz kann diese Arbeit aufzeigen, dass die Implementation eines oBPMbasierten Systems zur Modellierung und Ausführung von Prozessen in funktionaler Hinsicht möglich ist und dass sich das Datenbanksystem ArangoDB als zentrale Einheit einer oBPM-Umgebung bewährt

    Optimización del rendimiento y la eficiencia energética en sistemas masivamente paralelos

    Get PDF
    RESUMEN Los sistemas heterogéneos son cada vez más relevantes, debido a sus capacidades de rendimiento y eficiencia energética, estando presentes en todo tipo de plataformas de cómputo, desde dispositivos embebidos y servidores, hasta nodos HPC de grandes centros de datos. Su complejidad hace que sean habitualmente usados bajo el paradigma de tareas y el modelo de programación host-device. Esto penaliza fuertemente el aprovechamiento de los aceleradores y el consumo energético del sistema, además de dificultar la adaptación de las aplicaciones. La co-ejecución permite que todos los dispositivos cooperen para computar el mismo problema, consumiendo menos tiempo y energía. No obstante, los programadores deben encargarse de toda la gestión de los dispositivos, la distribución de la carga y la portabilidad del código entre sistemas, complicando notablemente su programación. Esta tesis ofrece contribuciones para mejorar el rendimiento y la eficiencia energética en estos sistemas masivamente paralelos. Se realizan propuestas que abordan objetivos generalmente contrapuestos: se mejora la usabilidad y la programabilidad, a la vez que se garantiza una mayor abstracción y extensibilidad del sistema, y al mismo tiempo se aumenta el rendimiento, la escalabilidad y la eficiencia energética. Para ello, se proponen dos motores de ejecución con enfoques completamente distintos. EngineCL, centrado en OpenCL y con una API de alto nivel, favorece la máxima compatibilidad entre todo tipo de dispositivos y proporciona un sistema modular extensible. Su versatilidad permite adaptarlo a entornos para los que no fue concebido, como aplicaciones con ejecuciones restringidas por tiempo o simuladores HPC de dinámica molecular, como el utilizado en un centro de investigación internacional. Considerando las tendencias industriales y enfatizando la aplicabilidad profesional, CoexecutorRuntime proporciona un sistema flexible centrado en C++/SYCL que dota de soporte a la co-ejecución a la tecnología oneAPI. Este runtime acerca a los programadores al dominio del problema, posibilitando la explotación de estrategias dinámicas adaptativas que mejoran la eficiencia en todo tipo de aplicaciones.ABSTRACT Heterogeneous systems are becoming increasingly relevant, due to their performance and energy efficiency capabilities, being present in all types of computing platforms, from embedded devices and servers to HPC nodes in large data centers. Their complexity implies that they are usually used under the task paradigm and the host-device programming model. This strongly penalizes accelerator utilization and system energy consumption, as well as making it difficult to adapt applications. Co-execution allows all devices to simultaneously compute the same problem, cooperating to consume less time and energy. However, programmers must handle all device management, workload distribution and code portability between systems, significantly complicating their programming. This thesis offers contributions to improve performance and energy efficiency in these massively parallel systems. The proposals address the following generally conflicting objectives: usability and programmability are improved, while ensuring enhanced system abstraction and extensibility, and at the same time performance, scalability and energy efficiency are increased. To achieve this, two runtime systems with completely different approaches are proposed. EngineCL, focused on OpenCL and with a high-level API, provides an extensible modular system and favors maximum compatibility between all types of devices. Its versatility allows it to be adapted to environments for which it was not originally designed, including applications with time-constrained executions or molecular dynamics HPC simulators, such as the one used in an international research center. Considering industrial trends and emphasizing professional applicability, CoexecutorRuntime provides a flexible C++/SYCL-based system that provides co-execution support for oneAPI technology. This runtime brings programmers closer to the problem domain, enabling the exploitation of dynamic adaptive strategies that improve efficiency in all types of applications.Funding: This PhD has been supported by the Spanish Ministry of Education (FPU16/03299 grant), the Spanish Science and Technology Commission under contracts TIN2016-76635-C2-2-R and PID2019-105660RB-C22. This work has also been partially supported by the Mont-Blanc 3: European Scalable and Power Efficient HPC Platform based on Low-Power Embedded Technology project (G.A. No. 671697) from the European Union’s Horizon 2020 Research and Innovation Programme (H2020 Programme). Some activities have also been funded by the Spanish Science and Technology Commission under contract TIN2016-81840-REDT (CAPAP-H6 network). The Integration II: Hybrid programming models of Chapter 4 has been partially performed under the Project HPC-EUROPA3 (INFRAIA-2016-1-730897), with the support of the EC Research Innovation Action under the H2020 Programme. In particular, the author gratefully acknowledges the support of the SPMT Department of the High Performance Computing Center Stuttgart (HLRS)
    corecore