2,530 research outputs found

    Domain-specific implementation of high-order Discontinuous Galerkin methods in spherical geometry

    Get PDF
    In recent years, domain-specific languages (DSLs) have achieved significant success in large-scale efforts to reimplement existing meteorological models in a performance portable manner. The dynamical cores of these models are based on finite difference and finite volume schemes, and existing DSLs are generally limited to supporting only these numerical methods. In the meantime, there have been numerous attempts to use high-order Discontinuous Galerkin (DG) methods for atmospheric dynamics, which are currently largely unsupported in main-stream DSLs. In order to link these developments, we present two domain-specific languages which extend the existing GridTools (GT) ecosystem to high-order DG discretization. The first is a C++-based DSL called G4GT, which, despite being no longer supported, gave us the impetus to implement extensions to the subsequent Python-based production DSL called GT4Py to support the operations needed for DG solvers. As a proof of concept, the shallow water equations in spherical geometry are implemented in both DSLs, thus providing a blueprint for the application of domain-specific languages to the development of global atmospheric models. We believe this is the first GPU-capable DSL implementation of DG in spherical geometry. The results demonstrate that a DSL designed for finite difference/volume methods can be successfully extended to implement a DG solver, while preserving the performance-portability of the DSL.ISSN:0010-4655ISSN:1879-294

    Serverless Strategies and Tools in the Cloud Computing Continuum

    Full text link
    Tesis por compendio[ES] En los últimos años, la popularidad de la computación en nube ha permitido a los usuarios acceder a recursos de cómputo, red y almacenamiento sin precedentes bajo un modelo de pago por uso. Esta popularidad ha propiciado la aparición de nuevos servicios para resolver determinados problemas informáticos a gran escala y simplificar el desarrollo y el despliegue de aplicaciones. Entre los servicios más destacados en los últimos años se encuentran las plataformas FaaS (Función como Servicio), cuyo principal atractivo es la facilidad de despliegue de pequeños fragmentos de código en determinados lenguajes de programación para realizar tareas específicas en respuesta a eventos. Estas funciones son ejecutadas en los servidores del proveedor Cloud sin que los usuarios se preocupen de su mantenimiento ni de la gestión de su elasticidad, manteniendo siempre un modelo de pago por uso de grano fino. Las plataformas FaaS pertenecen al paradigma informático conocido como Serverless, cuyo propósito es abstraer la gestión de servidores por parte de los usuarios, permitiéndoles centrar sus esfuerzos únicamente en el desarrollo de aplicaciones. El problema del modelo FaaS es que está enfocado principalmente en microservicios y tiende a tener limitaciones en el tiempo de ejecución y en las capacidades de computación (por ejemplo, carece de soporte para hardware de aceleración como GPUs). Sin embargo, se ha demostrado que la capacidad de autoaprovisionamiento y el alto grado de paralelismo de estos servicios pueden ser muy adecuados para una mayor variedad de aplicaciones. Además, su inherente ejecución dirigida por eventos hace que las funciones sean perfectamente adecuadas para ser definidas como pasos en flujos de trabajo de procesamiento de archivos (por ejemplo, flujos de trabajo de computación científica). Por otra parte, el auge de los dispositivos inteligentes e integrados (IoT), las innovaciones en las redes de comunicación y la necesidad de reducir la latencia en casos de uso complejos han dado lugar al concepto de Edge computing, o computación en el borde. El Edge computing consiste en el procesamiento en dispositivos cercanos a las fuentes de datos para mejorar los tiempos de respuesta. La combinación de este paradigma con la computación en nube, formando arquitecturas con dispositivos a distintos niveles en función de su proximidad a la fuente y su capacidad de cómputo, se ha acuñado como continuo de la computación en la nube (o continuo computacional). Esta tesis doctoral pretende, por lo tanto, aplicar diferentes estrategias Serverless para permitir el despliegue de aplicaciones generalistas, empaquetadas en contenedores de software, a través de los diferentes niveles del continuo computacional. Para ello, se han desarrollado múltiples herramientas con el fin de: i) adaptar servicios FaaS de proveedores Cloud públicos; ii) integrar diferentes componentes software para definir una plataforma Serverless en infraestructuras privadas y en el borde; iii) aprovechar dispositivos de aceleración en plataformas Serverless; y iv) facilitar el despliegue de aplicaciones y flujos de trabajo a través de interfaces de usuario. Además, se han creado y adaptado varios casos de uso para evaluar los desarrollos conseguidos.[CA] En els últims anys, la popularitat de la computació al núvol ha permès als usuaris accedir a recursos de còmput, xarxa i emmagatzematge sense precedents sota un model de pagament per ús. Aquesta popularitat ha propiciat l'aparició de nous serveis per resoldre determinats problemes informàtics a gran escala i simplificar el desenvolupament i desplegament d'aplicacions. Entre els serveis més destacats en els darrers anys hi ha les plataformes FaaS (Funcions com a Servei), el principal atractiu de les quals és la facilitat de desplegament de petits fragments de codi en determinats llenguatges de programació per realitzar tasques específiques en resposta a esdeveniments. Aquestes funcions són executades als servidors del proveïdor Cloud sense que els usuaris es preocupen del seu manteniment ni de la gestió de la seva elasticitat, mantenint sempre un model de pagament per ús de gra fi. Les plataformes FaaS pertanyen al paradigma informàtic conegut com a Serverless, el propòsit del qual és abstraure la gestió de servidors per part dels usuaris, permetent centrar els seus esforços únicament en el desenvolupament d'aplicacions. El problema del model FaaS és que està enfocat principalment a microserveis i tendeix a tenir limitacions en el temps d'execució i en les capacitats de computació (per exemple, no té suport per a maquinari d'acceleració com GPU). Tot i això, s'ha demostrat que la capacitat d'autoaprovisionament i l'alt grau de paral·lelisme d'aquests serveis poden ser molt adequats per a més aplicacions. A més, la seva inherent execució dirigida per esdeveniments fa que les funcions siguen perfectament adequades per ser definides com a passos en fluxos de treball de processament d'arxius (per exemple, fluxos de treball de computació científica). D'altra banda, l'auge dels dispositius intel·ligents i integrats (IoT), les innovacions a les xarxes de comunicació i la necessitat de reduir la latència en casos d'ús complexos han donat lloc al concepte d'Edge computing, o computació a la vora. L'Edge computing consisteix en el processament en dispositius propers a les fonts de dades per millorar els temps de resposta. La combinació d'aquest paradigma amb la computació en núvol, formant arquitectures amb dispositius a diferents nivells en funció de la proximitat a la font i la capacitat de còmput, s'ha encunyat com a continu de la computació al núvol (o continu computacional). Aquesta tesi doctoral pretén, doncs, aplicar diferents estratègies Serverless per permetre el desplegament d'aplicacions generalistes, empaquetades en contenidors de programari, a través dels diferents nivells del continu computacional. Per això, s'han desenvolupat múltiples eines per tal de: i) adaptar serveis FaaS de proveïdors Cloud públics; ii) integrar diferents components de programari per definir una plataforma Serverless en infraestructures privades i a la vora; iii) aprofitar dispositius d'acceleració a plataformes Serverless; i iv) facilitar el desplegament d'aplicacions i fluxos de treball mitjançant interfícies d'usuari. A més, s'han creat i s'han adaptat diversos casos d'ús per avaluar els desenvolupaments aconseguits.[EN] In recent years, the popularity of Cloud computing has allowed users to access unprecedented compute, network, and storage resources under a pay-per-use model. This popularity led to new services to solve specific large-scale computing challenges and simplify the development and deployment of applications. Among the most prominent services in recent years are FaaS (Function as a Service) platforms, whose primary appeal is the ease of deploying small pieces of code in certain programming languages to perform specific tasks on an event-driven basis. These functions are executed on the Cloud provider's servers without users worrying about their maintenance or elasticity management, always keeping a fine-grained pay-per-use model. FaaS platforms belong to the computing paradigm known as Serverless, which aims to abstract the management of servers from the users, allowing them to focus their efforts solely on the development of applications. The problem with FaaS is that it focuses on microservices and tends to have limitations regarding the execution time and the computing capabilities (e.g. lack of support for acceleration hardware such as GPUs). However, it has been demonstrated that the self-provisioning capability and high degree of parallelism of these services can be well suited to broader applications. In addition, their inherent event-driven triggering makes functions perfectly suitable to be defined as steps in file processing workflows (e.g. scientific computing workflows). Furthermore, the rise of smart and embedded devices (IoT), innovations in communication networks and the need to reduce latency in challenging use cases have led to the concept of Edge computing. Edge computing consists of conducting the processing on devices close to the data sources to improve response times. The coupling of this paradigm together with Cloud computing, involving architectures with devices at different levels depending on their proximity to the source and their compute capability, has been coined as Cloud Computing Continuum (or Computing Continuum). Therefore, this PhD thesis aims to apply different Serverless strategies to enable the deployment of generalist applications, packaged in software containers, across the different tiers of the Cloud Computing Continuum. To this end, multiple tools have been developed in order to: i) adapt FaaS services from public Cloud providers; ii) integrate different software components to define a Serverless platform on on-premises and Edge infrastructures; iii) leverage acceleration devices on Serverless platforms; and iv) facilitate the deployment of applications and workflows through user interfaces. Additionally, several use cases have been created and adapted to assess the developments achieved.Risco Gallardo, S. (2023). Serverless Strategies and Tools in the Cloud Computing Continuum [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/202013Compendi

    Exploring Hardware Fault Impacts on Different Real Number Representations of the Structural Resilience of TCUs in GPUs

    Get PDF
    The most recent generations of graphics processing units (GPUs) boost the execution of convolutional operations required by machine learning applications by resorting to specialized and efficient in-chip accelerators (Tensor Core Units or TCUs) that operate on matrix multiplication tiles. Unfortunately, modern cutting-edge semiconductor technologies are increasingly prone to hardware defects, and the trend to highly stress TCUs during the execution of safety-critical and high-performance computing (HPC) applications increases the likelihood of TCUs producing different kinds of failures. In fact, the intrinsic resiliency to hardware faults of arithmetic units plays a crucial role in safety-critical applications using GPUs (e.g., in automotive, space, and autonomous robotics). Recently, new arithmetic formats have been proposed, particularly those suited to neural network execution. However, the reliability characterization of TCUs supporting different arithmetic formats was still lacking. In this work, we quantitatively assessed the impact of hardware faults in TCU structures while employing two distinct formats (floating-point and posit) and using two different configurations (16 and 32 bits) to represent real numbers. For the experimental evaluation, we resorted to an architectural description of a TCU core (PyOpenTCU) and performed 120 fault simulation campaigns, injecting around 200,000 faults per campaign and requiring around 32 days of computation. Our results demonstrate that the posit format of TCUs is less affected by faults than the floating-point one (by up to three orders of magnitude for 16 bits and up to twenty orders for 32 bits). We also identified the most sensible fault locations (i.e., those that produce the largest errors), thus paving the way to adopting smart hardening solutions

    GPU-optimized approaches to molecular docking-based virtual screening in drug discovery: A comparative analysis

    Get PDF
    Finding a novel drug is a very long and complex procedure. Using computer simulations, it is possible to accelerate the preliminary phases by performing a virtual screening that filters a large set of drug candidates to a manageable number. This paper presents the implementations and comparative analysis of two GPU-optimized implementations of a virtual screening algorithm targeting novel GPU architectures. This work focuses on the analysis of parallel computation patterns and their mapping onto the target architecture. The first method adopts a traditional approach that spreads the computation for a single molecule across the entire GPU. The second uses a novel batched approach that exploits the parallel architecture of the GPU to evaluate more molecules in parallel. Experimental results showed a different behavior depending on the size of the database to be screened, either reaching a performance plateau sooner or having a more extended initial transient period to achieve a higher throughput (up to 5x), which is more suitable for extreme-scale virtual screening campaigns

    Modern computing: Vision and challenges

    Get PDF
    Over the past six decades, the computing systems field has experienced significant transformations, profoundly impacting society with transformational developments, such as the Internet and the commodification of computing. Underpinned by technological advancements, computer systems, far from being static, have been continuously evolving and adapting to cover multifaceted societal niches. This has led to new paradigms such as cloud, fog, edge computing, and the Internet of Things (IoT), which offer fresh economic and creative opportunities. Nevertheless, this rapid change poses complex research challenges, especially in maximizing potential and enhancing functionality. As such, to maintain an economical level of performance that meets ever-tighter requirements, one must understand the drivers of new model emergence and expansion, and how contemporary challenges differ from past ones. To that end, this article investigates and assesses the factors influencing the evolution of computing systems, covering established systems and architectures as well as newer developments, such as serverless computing, quantum computing, and on-device AI on edge devices. Trends emerge when one traces technological trajectory, which includes the rapid obsolescence of frameworks due to business and technical constraints, a move towards specialized systems and models, and varying approaches to centralized and decentralized control. This comprehensive review of modern computing systems looks ahead to the future of research in the field, highlighting key challenges and emerging trends, and underscoring their importance in cost-effectively driving technological progress

    Risk and threat mitigation techniques in internet of things (IoT) environments: a survey

    Get PDF
    Security in the Internet of Things (IoT) remains a predominant area of concern. Although several other surveys have been published on this topic in recent years, the broad spectrum that this area aims to cover, the rapid developments and the variety of concerns make it impossible to cover the topic adequately. This survey updates the state of the art covered in previous surveys and focuses on defences and mitigations against threats rather than on the threats alone, an area that is less extensively covered by other surveys. This survey has collated current research considering the dynamicity of the IoT environment, a topic missed in other surveys and warrants particular attention. To consider the IoT mobility, a life-cycle approach is adopted to the study of dynamic and mobile IoT environments and means of deploying defences against malicious actors aiming to compromise an IoT network and to evolve their attack laterally within it and from it. This survey takes a more comprehensive and detailed step by analysing a broad variety of methods for accomplishing each of the mitigation steps, presenting these uniquely by introducing a “defence-in-depth” approach that could significantly slow down the progress of an attack in the dynamic IoT environment. This survey sheds a light on leveraging redundancy as an inherent nature of multi-sensor IoT applications, to improve integrity and recovery. This study highlights the challenges of each mitigation step, emphasises novel perspectives, and reconnects the discussed mitigation steps to the ground principles they seek to implement

    In-depth Analysis On Parallel Processing Patterns for High-Performance Dataframes

    Full text link
    The Data Science domain has expanded monumentally in both research and industry communities during the past decade, predominantly owing to the Big Data revolution. Artificial Intelligence (AI) and Machine Learning (ML) are bringing more complexities to data engineering applications, which are now integrated into data processing pipelines to process terabytes of data. Typically, a significant amount of time is spent on data preprocessing in these pipelines, and hence improving its e fficiency directly impacts the overall pipeline performance. The community has recently embraced the concept of Dataframes as the de-facto data structure for data representation and manipulation. However, the most widely used serial Dataframes today (R, pandas) experience performance limitations while working on even moderately large data sets. We believe that there is plenty of room for improvement by taking a look at this problem from a high-performance computing point of view. In a prior publication, we presented a set of parallel processing patterns for distributed dataframe operators and the reference runtime implementation, Cylon [1]. In this paper, we are expanding on the initial concept by introducing a cost model for evaluating the said patterns. Furthermore, we evaluate the performance of Cylon on the ORNL Summit supercomputer

    Guided rewriting and constraint satisfaction for parallel GPU code generation

    Get PDF
    Graphics Processing Units (GPUs) are notoriously hard to optimise for manually due to their scheduling and memory hierarchies. What is needed are good automatic code generators and optimisers for such parallel hardware. Functional approaches such as Accelerate, Futhark and LIFT leverage a high-level algorithmic Intermediate Representation (IR) to expose parallelism and abstract the implementation details away from the user. However, producing efficient code for a given accelerator remains challenging. Existing code generators depend on the user input to choose a subset of hard-coded optimizations or automated exploration of implementation search space. The former suffers from the lack of extensibility, while the latter is too costly due to the size of the search space. A hybrid approach is needed, where a space of valid implementations is built automatically and explored with the aid of human expertise. This thesis presents a solution combining user-guided rewriting and automatically generated constraints to produce high-performance code. The first contribution is an automatic tuning technique to find a balance between performance and memory consumption. Leveraging its functional patterns, the LIFT compiler is empowered to infer tuning constraints and limit the search to valid tuning combinations only. Next, the thesis reframes parallelisation as a constraint satisfaction problem. Parallelisation constraints are extracted automatically from the input expression, and a solver is used to identify valid rewriting. The constraints truncate the search space to valid parallel mappings only by capturing the scheduling restrictions of the GPU in the context of a given program. A synchronisation barrier insertion technique is proposed to prevent data races and improve the efficiency of the generated parallel mappings. The final contribution of this thesis is the guided rewriting method, where the user encodes a design space of structural transformations using high-level IR nodes called rewrite points. These strongly typed pragmas express macro rewrites and expose design choices as explorable parameters. The thesis proposes a small set of reusable rewrite points to achieve tiling, cache locality, data reuse and memory optimisation. A comparison with the vendor-provided handwritten kernel ARM Compute Library and the TVM code generator demonstrates the effectiveness of this thesis' contributions. With convolution as a use case, LIFT-generated direct and GEMM-based convolution implementations are shown to perform on par with the state-of-the-art solutions on a mobile GPU. Overall, this thesis demonstrates that a functional IR yields well to user-guided and automatic rewriting for high-performance code generation
    corecore