78 research outputs found

    A customizable multi-agent system for distributed data mining

    Get PDF
    We present a general Multi-Agent System framework for distributed data mining based on a Peer-to-Peer model. Agent protocols are implemented through message-based asynchronous communication. The framework adopts a dynamic load balancing policy that is particularly suitable for irregular search algorithms. A modular design allows a separation of the general-purpose system protocols and software components from the specific data mining algorithm. The experimental evaluation has been carried out on a parallel frequent subgraph mining algorithm, which has shown good scalability performances

    The future of computing beyond Moore's Law.

    Get PDF
    Moore's Law is a techno-economic model that has enabled the information technology industry to double the performance and functionality of digital electronics roughly every 2 years within a fixed cost, power and area. Advances in silicon lithography have enabled this exponential miniaturization of electronics, but, as transistors reach atomic scale and fabrication costs continue to rise, the classical technological driver that has underpinned Moore's Law for 50 years is failing and is anticipated to flatten by 2025. This article provides an updated view of what a post-exascale system will look like and the challenges ahead, based on our most recent understanding of technology roadmaps. It also discusses the tapering of historical improvements, and how it affects options available to continue scaling of successors to the first exascale machine. Lastly, this article covers the many different opportunities and strategies available to continue computing performance improvements in the absence of historical technology drivers. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

    A survey of high level frameworks in block-structured adaptive mesh refinement packages

    Get PDF
    pre-printOver the last decade block-structured adaptive mesh refinement (SAMR) has found increasing use in large, publicly available codes and frameworks. SAMR frameworks have evolved along different paths. Some have stayed focused on specific domain areas, others have pursued a more general functionality, providing the building blocks for a larger variety of applications. In this survey paper we examine a representative set of SAMR packages and SAMR-based codes that have been in existence for half a decade or more, have a reasonably sized and active user base outside of their home institutions, and are publicly available. The set consists of a mix of SAMR packages and application codes that cover a broad range of scientific domains. We look at their high-level frameworks, their design trade-offs and their approach to dealing with the advent of radical changes in hardware architecture. The codes included in this survey are BoxLib, Cactus, Chombo, Enzo, FLASH, and Uintah

    Parallel Processes in HPX: Designing an Infrastructure for Adaptive Resource Management

    Get PDF
    Advancement in cutting edge technologies have enabled better energy efficiency as well as scaling computational power for the latest High Performance Computing(HPC) systems. However, complexity, due to hybrid architectures as well as emerging classes of applications, have shown poor computational scalability using conventional execution models. Thus alternative means of computation, that addresses the bottlenecks in computation, is warranted. More precisely, dynamic adaptive resource management feature, both from systems as well as application\u27s perspective, is essential for better computational scalability and efficiency. This research presents and expands the notion of Parallel Processes as a placeholder for procedure definitions, targeted at one or more synchronous domains, meta data for computation and resource management as well as infrastructure for dynamic policy deployment. In addition to this, the research presents additional guidelines for a framework for resource management in HPX runtime system. Further, this research also lists design principles for scalability of Active Global Address Space (AGAS), a necessary feature for Parallel Processes. Also, to verify the usefulness of Parallel Processes, a preliminary performance evaluation of different task scheduling policies is carried out using two different applications. The applications used are: Unbalanced Tree Search, a reference dynamic graph application, implemented by this research in HPX and MiniGhost, a reference stencil based application using bulk synchronous parallel model. The results show that different scheduling policies provide better performance for different classes of applications; and for the same application class, in certain instances, one policy fared better than the others, while vice versa in other instances, hence supporting the hypothesis of the need of dynamic adaptive resource management infrastructure, for deploying different policies and task granularities, for scalable distributed computing

    Resiliency in numerical algorithm design for extreme scale simulations

    Get PDF
    This work is based on the seminar titled ‘Resiliency in Numerical Algorithm Design for Extreme Scale Simulations’ held March 1–6, 2020, at Schloss Dagstuhl, that was attended by all the authors. Advanced supercomputing is characterized by very high computation speeds at the cost of involving an enormous amount of resources and costs. A typical large-scale computation running for 48 h on a system consuming 20 MW, as predicted for exascale systems, would consume a million kWh, corresponding to about 100k Euro in energy cost for executing 1023 floating-point operations. It is clearly unacceptable to lose the whole computation if any of the several million parallel processes fails during the execution. Moreover, if a single operation suffers from a bit-flip error, should the whole computation be declared invalid? What about the notion of reproducibility itself: should this core paradigm of science be revised and refined for results that are obtained by large-scale simulation? Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated. More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? While the analysis of use cases can help understand the particular reliability requirements, the construction of remedies is currently wide open. One avenue would be to refine and improve on system- or application-level checkpointing and rollback strategies in the case an error is detected. Developers might use fault notification interfaces and flexible runtime systems to respond to node failures in an application-dependent fashion. Novel numerical algorithms or more stochastic computational approaches may be required to meet accuracy requirements in the face of undetectable soft errors. These ideas constituted an essential topic of the seminar. The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge. This article gathers a broad range of perspectives on the role of algorithms, applications and systems in achieving resilience for extreme scale simulations. The ultimate goal is to spark novel ideas and encourage the development of concrete solutions for achieving such resilience holistically.Peer Reviewed"Article signat per 36 autors/es: Emmanuel Agullo, Mirco Altenbernd, Hartwig Anzt, Leonardo Bautista-Gomez, Tommaso Benacchio, Luca Bonaventura, Hans-Joachim Bungartz, Sanjay Chatterjee, Florina M. Ciorba, Nathan DeBardeleben, Daniel Drzisga, Sebastian Eibl, Christian Engelmann, Wilfried N. Gansterer, Luc Giraud, Dominik G ̈oddeke, Marco Heisig, Fabienne Jezequel, Nils Kohl, Xiaoye Sherry Li, Romain Lion, Miriam Mehl, Paul Mycek, Michael Obersteiner, Enrique S. Quintana-Ortiz, Francesco Rizzi, Ulrich Rude, Martin Schulz, Fred Fung, Robert Speck, Linda Stals, Keita Teranishi, Samuel Thibault, Dominik Thonnes, Andreas Wagner and Barbara Wohlmuth"Postprint (author's final draft

    A component-based framework for certification of components in a cloud of HPC services

    Get PDF
    HPC Shelfis a proposal of a cloud computing platform to provide component-oriented services for High Performance Computing (HPC) applications. This paper presents a Verification-as-a-Service (VaaS) framework for component certification onHPC Shelf. Certification is aimed at providing higher confidence that components of parallel computing systems ofHPC Shelfbehave as expected according to one or more requirements expressed in their contracts. To this end, new abstractions are introduced, starting with certifier components. They are designed to inspect other components and verify them for different types of functional, non-functional and behavioral requirements. The certification framework is naturally based on parallel computing techniques to speed up verification tasks.NORTE-01-0145- FEDER-000037

    RIACS

    Get PDF
    The Research Institute for Advanced Computer Science (RIACS) was established by the Universities Space Research Association (USRA) at the NASA Ames Research Center (ARC) on June 6, 1983. RIACS is privately operated by USRA, a consortium of universities that serves as a bridge between NASA and the academic community. Under a five-year co-operative agreement with NASA, research at RIACS is focused on areas that are strategically enabling to the Ames Research Center's role as NASA's Center of Excellence for Information Technology. The primary mission of RIACS is charted to carry out research and development in computer science. This work is devoted in the main to tasks that are strategically enabling with respect to NASA's bold mission in space exploration and aeronautics. There are three foci for this work: (1) Automated Reasoning. (2) Human-Centered Computing. and (3) High Performance Computing and Networking. RIACS has the additional goal of broadening the base of researcher in these areas of importance to the nation's space and aeronautics enterprises. Through its visiting scientist program, RIACS facilitates the participation of university-based researchers, including both faculty and students, in the research activities of NASA and RIACS. RIACS researchers work in close collaboration with NASA computer scientists on projects such as the Remote Agent Experiment on Deep Space One mission, and Super-Resolution Surface Modeling

    Modern web-programming language concurrency

    Get PDF
    This Masters Thesis compares Elixir, Go and JavaScript (Node.js) as programming language candi- dates for writing concurrent RESTful webservice backends. First we describe each of the languages. Next we compare the functional concurrency characteristics of the languages to each other. Finally we do scalability testing for each of the languages. Scalability testing is done using the Locust.io framework. For testing purposes we introduce for simple REST-api implementations for each of the languages. Result from the tests was that JavaScript performed the worst of the languages and Go was the most verbose language to program with

    Edge-Cloud Polarization and Collaboration: A Comprehensive Survey for AI

    Full text link
    Influenced by the great success of deep learning via cloud computing and the rapid development of edge chips, research in artificial intelligence (AI) has shifted to both of the computing paradigms, i.e., cloud computing and edge computing. In recent years, we have witnessed significant progress in developing more advanced AI models on cloud servers that surpass traditional deep learning models owing to model innovations (e.g., Transformers, Pretrained families), explosion of training data and soaring computing capabilities. However, edge computing, especially edge and cloud collaborative computing, are still in its infancy to announce their success due to the resource-constrained IoT scenarios with very limited algorithms deployed. In this survey, we conduct a systematic review for both cloud and edge AI. Specifically, we are the first to set up the collaborative learning mechanism for cloud and edge modeling with a thorough review of the architectures that enable such mechanism. We also discuss potentials and practical experiences of some on-going advanced edge AI topics including pretraining models, graph neural networks and reinforcement learning. Finally, we discuss the promising directions and challenges in this field.Comment: 20 pages, Transactions on Knowledge and Data Engineerin

    Serverless Computing Strategies on Cloud Platforms

    Full text link
    [ES] Con el desarrollo de la Computación en la Nube, la entrega de recursos virtualizados a través de Internet ha crecido enormemente en los últimos años. Las Funciones como servicio (FaaS), uno de los modelos de servicio más nuevos dentro de la Computación en la Nube, permite el desarrollo e implementación de aplicaciones basadas en eventos que cubren servicios administrados en Nubes públicas y locales. Los proveedores públicos de Computación en la Nube adoptan el modelo FaaS dentro de su catálogo para proporcionar computación basada en eventos altamente escalable para las aplicaciones. Por un lado, los desarrolladores especializados en esta tecnología se centran en crear marcos de código abierto serverless para evitar el bloqueo con los proveedores de la Nube pública. A pesar del desarrollo logrado por la informática serverless, actualmente hay campos relacionados con el procesamiento de datos y la optimización del rendimiento en la ejecución en los que no se ha explorado todo el potencial. En esta tesis doctoral se definen tres estrategias de computación serverless que permiten evidenciar los beneficios de esta tecnología para el procesamiento de datos. Las estrategias implementadas permiten el análisis de datos con la integración de dispositivos de aceleración para la ejecución eficiente de aplicaciones científicas en plataformas cloud públicas y locales. En primer lugar, se desarrolló la plataforma CloudTrail-Tracker. CloudTrail-Tracker es una plataforma serverless de código abierto basada en eventos para el procesamiento de datos que puede escalar automáticamente hacia arriba y hacia abajo, con la capacidad de escalar a cero para minimizar los costos operativos. Seguidamente, se plantea la integración de GPUs en una plataforma serverless local impulsada por eventos para el procesamiento de datos escalables. La plataforma admite la ejecución de aplicaciones como funciones severless en respuesta a la carga de un archivo en un sistema de almacenamiento de ficheros, lo que permite la ejecución en paralelo de las aplicaciones según los recursos disponibles. Este procesamiento es administrado por un cluster Kubernetes elástico que crece y decrece automáticamente según las necesidades de procesamiento. Ciertos enfoques basados en tecnologías de virtualización de GPU como rCUDA y NVIDIA-Docker se evalúan para acelerar el tiempo de ejecución de las funciones. Finalmente, se implementa otra solución basada en el modelo serverless para ejecutar la fase de inferencia de modelos de aprendizaje automático previamente entrenados, en la plataforma de Amazon Web Services y en una plataforma privada con el framework OSCAR. El sistema crece elásticamente de acuerdo con la demanda y presenta una escalado a cero para minimizar los costes. Por otra parte, el front-end proporciona al usuario una experiencia simplificada en la obtención de la predicción de modelos de aprendizaje automático. Para demostrar las funcionalidades y ventajas de las soluciones propuestas durante esta tesis se recogen varios casos de estudio que abarcan diferentes campos del conocimiento como la analítica de aprendizaje y la Inteligencia Artificial. Esto demuestra que la gama de aplicaciones donde la computación serverless puede aportar grandes beneficios es muy amplia. Los resultados obtenidos avalan el uso del modelo serverless en la simplificación del diseño de arquitecturas para el uso intensivo de datos en aplicaciones complejas.[CA] Amb el desenvolupament de la Computació en el Núvol, el lliurament de recursos virtualitzats a través d'Internet ha crescut granment en els últims anys. Les Funcions com a Servei (FaaS), un dels models de servei més nous dins de la Computació en el Núvol, permet el desenvolupament i implementació d'aplicacions basades en esdeveniments que cobreixen serveis administrats en Núvols públics i locals. Els proveïdors de computació en el Núvol públic adopten el model FaaS dins del seu catàleg per a proporcionar a les aplicacions computació altament escalable basada en esdeveniments. D'una banda, els desenvolupadors especialitzats en aquesta tecnologia se centren en crear marcs de codi obert serverless per a evitar el bloqueig amb els proveïdors del Núvol públic. Malgrat el desenvolupament alcançat per la informàtica serverless, actualment hi ha camps relacionats amb el processament de dades i l'optimització del rendiment d'execució en els quals no s'ha explorat tot el potencial. En aquesta tesi doctoral es defineixen tres estratègies informàtiques serverless que permeten demostrar els beneficis d'aquesta tecnologia per al processament de dades. Les estratègies implementades permeten l'anàlisi de dades amb a integració de dispositius accelerats per a l'execució eficient d'aplicacion scientífiques en plataformes de Núvol públiques i locals. En primer lloc, es va desenvolupar la plataforma CloudTrail-Tracker. CloudTrail-Tracker és una plataforma de codi obert basada en esdeveniments per al processament de dades serverless que pot escalar automáticament cap amunt i cap avall, amb la capacitat d'escalar a zero per a minimitzar els costos operatius. A continuació es planteja la integració de GPUs en una plataforma serverless local impulsada per esdeveniments per al processament de dades escalables. La plataforma admet l'execució d'aplicacions com funcions severless en resposta a la càrrega d'un arxiu en un sistema d'emmagatzemaments de fitxers, la qual cosa permet l'execució en paral·lel de les aplicacions segon sels recursos disponibles. Este processament és administrat per un cluster Kubernetes elàstic que creix i decreix automàticament segons les necessitats de processament. Certs enfocaments basats en tecnologies de virtualització de GPU com rCUDA i NVIDIA-Docker s'avaluen per a accelerar el temps d'execució de les funcions. Finalment s'implementa una altra solució basada en el model serverless per a executar la fase d'inferència de models d'aprenentatge automàtic prèviament entrenats en la plataforma de Amazon Web Services i en una plataforma privada amb el framework OSCAR. El sistema creix elàsticament d'acord amb la demanda i presenta una escalada a zero per a minimitzar els costos. D'altra banda el front-end proporciona a l'usuari una experiència simplificada en l'obtenció de la predicció de models d'aprenentatge automàtic. Per a demostrar les funcionalitats i avantatges de les solucions proposades durant esta tesi s'arrepleguen diversos casos d'estudi que comprenen diferents camps del coneixement com l'analítica d'aprenentatge i la Intel·ligència Artificial. Això demostra que la gamma d'aplicacions on la computació serverless pot aportar grans beneficis és molt àmplia. Els resultats obtinguts avalen l'ús del model serverless en la simplificació del disseny d'arquitectures per a l'ús intensiu de dades en aplicacions complexes.[EN] With the development of Cloud Computing, the delivery of virtualized resources over the Internet has greatly grown in recent years. Functions as a Service (FaaS), one of the newest service models within Cloud Computing, allows the development and implementation of event-based applications that cover managed services in public and on-premises Clouds. Public Cloud Computing providers adopt the FaaS model within their catalog to provide event-driven highly-scalable computing for applications. On the one hand, developers specialized in this technology focus on creating open-source serverless frameworks to avoid the lock-in with public Cloud providers. Despite the development achieved by serverless computing, there are currently fields related to data processing and execution performance optimization where the full potential has not been explored. In this doctoral thesis three serverless computing strategies are defined that allow to demonstrate the benefits of this technology for data processing. The implemented strategies allow the analysis of data with the integration of accelerated devices for the efficient execution of scientific applications on public and on-premises Cloud platforms. Firstly, the CloudTrail-Tracker platform was developed to extract and process learning analytics in the Cloud. CloudTrail-Tracker is an event-driven open-source platform for serverless data processing that can automatically scale up and down, featuring the ability to scale to zero for minimizing the operational costs. Next, the integration of GPUs in an event-driven on-premises serverless platform for scalable data processing is discussed. The platform supports the execution of applications as serverless functions in response to the loading of a file in a file storage system, which allows the parallel execution of applications according to available resources. This processing is managed by an elastic Kubernetes cluster that automatically grows and shrinks according to the processing needs. Certain approaches based on GPU virtualization technologies such as rCUDA and NVIDIA-Docker are evaluated to speed up the execution time of the functions. Finally, another solution based on the serverless model is implemented to run the inference phase of previously trained machine learning models on theAmazon Web Services platform and in a private platform with the OSCAR framework. The system grows elastically according to demand and is scaled to zero to minimize costs. On the other hand, the front-end provides the user with a simplified experience in obtaining the prediction of machine learning models. To demonstrate the functionalities and advantages of the solutions proposed during this thesis, several case studies are collected covering different fields of knowledge such as learning analytics and Artificial Intelligence. This shows the wide range of applications where serverless computing can bring great benefits. The results obtained endorse the use of the serverless model in simplifying the design of architectures for the intensive data processing in complex applications.Naranjo Delgado, DM. (2021). Serverless Computing Strategies on Cloud Platforms [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/160916TESI
    corecore