50 research outputs found

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum

    LIPIcs, Volume 274, ESA 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 274, ESA 2023, Complete Volum

    LIPIcs, Volume 258, SoCG 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 258, SoCG 2023, Complete Volum

    Leveraging disaggregated accelerators and non-volatile memories to improve the efficiency of modern datacenters

    Get PDF
    (English) Traditional data centers consist of computing nodes that possess all the resources physically attached. When there was the need to deal with more significant demands, the solution has been to either add more nodes (scaling out) or increase the capacity of existing ones (scaling-up). Workload requirements are traditionally fulfilled by selecting compute platforms from pools that better satisfy their average or maximum resource requirements depending on the price that the user is willing to pay. The amount of processor, memory, storage, and network bandwidth of a selected platform needs to meet or exceed the platform requirements of the workload. Beyond those explicitly required by the workload, additional resources are considered stranded resources (if not used) or bonus resources (if used). Meanwhile, workloads in all market segments have evolved significantly during the last decades. Today, workloads have a larger variety of requirements in terms of characteristics related to the computing platforms. Those workload new requirements include new technologies such as GPU, FPGA, NVMe, etc. These new technologies are more expensive and thus become more limited. It is no longer feasible to increase the number of resources according to potential peak demands, as this significantly raises the total cost of ownership. Software-Defined-Infrastructures (SDI), a new concept for the data center architecture, is being developed to address those issues. The main SDI proposition is to disaggregate all the resources over the fabric to enable the required flexibility. On SDI, instead of pools of computational nodes, the pools consist of individual units of resources (CPU, memory, FPGA, NVMe, GPU, etc.). When an application needs to be executed, SDI identifies the computational requirements and assembles all the resources required, creating a composite node. Resource disaggregation brings new challenges and opportunities that this thesis will explore. This thesis demonstrates that resource disaggregation brings opportunities to increase the efficiency of modern data centers. This thesis demonstrates that resource disaggregation may increase workloads' performance when sharing a single resource. Thus, needing fewer resources to achieve similar results. On the other hand, this thesis demonstrates how through disaggregation, aggregation of resources can be made, increasing a workload's performance. However, to take maximum advantage of those characteristics and flexibility, orchestrators must be aware of them. This thesis demonstrates how workload-aware techniques applied at the resource management level allow for improved quality of service leveraging resource disaggregation. Enabling resource disaggregation, this thesis demonstrates a reduction of up to 49% missed deadlines compared to a traditional schema. This reduction can rise up to 100% when enabling workload awareness. Moreover, this thesis demonstrates that GPU partitioning and disaggregation further enhances the data center flexibility. This increased flexibility can achieve the same results with half the resources. That is, with a single physical GPU partitioned and disaggregated, the same results can be achieved with 2 GPU disaggregated but not partitioned. Finally, this thesis demonstrates that resource fragmentation becomes key when having a limited set of heterogeneous resources, namely NVMe and GPU. For the case of an heterogeneous set of resources, and specifically when some of those resources are highly demanded but limited in quantity. That is, the situation where the demand for a resource is unexpectedly high, this thesis proposes a technique to minimize fragmentation that reduces deadlines missed compared to a disaggregation-aware policy of up to 86%.(Català) Els datacenters tradicionals consisteixen en un seguit de nodes computacionals que contenen al seu interior tots els recursos necessaris. Quan hi ha una necessitat de gestionar demandes superiors la solució era o afegir més nodes (scale-out) o incrementar la capacitat dels existents (scale-up). Els requisits de les aplicacions tradicionalment són satisfets seleccionant recursos de racks que satisfan millor el seu SLA basats o en la mitjana dels requisits o en el màxim possible, en funció del preu que l'usuari estigui disposat a pagar. La quantitat de processadors, memòria, disc, i banda d'ampla d'un rack necessita satisfer o excedir els requisits de l'aplicació. Els recursos addicionals als requerits per les aplicacions són considerats inactius (si no es fan servir) o addicionals (si es fan servir). Per altra banda, les aplicacions en tots els segments de mercat han evolucionat significativament en les últimes dècades. Avui en dia, les aplicacions tenen una gran varietat de requisits en termes de característiques que ha de tenir la infraestructura. Aquests nous requisits inclouen tecnologies com GPU, FPGA, NVMe, etc. Aquestes tecnologies són més cares i, per tant, més limitades. Ja no és factible incrementar el nombre de recursos segons el potencial pic de demanda, ja que això incrementa significativament el cost total de la infraestructura. Software-Defined Infrastructures és un nou concepte per a l'arquitectura de datacenters que s'està desenvolupant per pal·liar aquests problemes. La proposició principal de SDI és desagregar tots els recursos sobre la xarxa per garantir una major flexibilitat. Sota SDI, en comptes de racks de nodes computacionals, els racks consisteix en unitats individuals de recursos (CPU, memòria, FPGA, NVMe, GPU, etc). Quan una aplicació necessita executar, SDI identifica els requisits computacionals i munta una plataforma amb tots els recursos necessaris, creant un node composat. La desagregació de recursos porta nous reptes i oportunitats que s'exploren en aquesta tesi. Aquesta tesi demostra que la desagregació de recursos ens dona l'oportunitat d'incrementar l'eficiència dels datacenters moderns. Aquesta tesi demostra la desagregació pot incrementar el rendiment de les aplicacions. Però per treure el màxim partit a aquestes característiques i d'aquesta flexibilitat, els orquestradors n'han de ser conscient. Aquesta tesi demostra que aplicant tècniques conscients de l'aplicació aplicades a la gestió de recursos permeten millorar la qualitat del servei a través de la desagregació de recursos. Habilitar la desagregació de recursos porta a una reducció de fins al 49% els deadlines perduts comparat a una política tradicional. Aquesta reducció pot incrementar-se fins al 100% quan s'habilita la consciència de l'aplicació. A més a més, aquesta tesi demostra que el particionat de GPU combinat amb la desagregació millora encara més la flexibilitat. Aquesta millora permet aconseguir els mateixos resultats amb la meitat de recursos. És a dir, amb una sola GPU física particionada i desagregada, els mateixos resultats són obtinguts que utilitzant-ne dues desagregades però no particionades. Finalment, aquesta tesi demostra que la gestió de la fragmentació de recursos és una peça clau quan la quantitat de recursos és limitada en un conjunt heterogeni de recursos. Pel cas d'un conjunt heterogeni de recursos, i especialment quan aquests recursos tenen molta demanda però són limitats en quantitat. És a dir, quan la demanda pels recursos és inesperadament alta, aquesta tesi proposa una tècnica minimitzant la fragmentació que redueix els deadlines perduts comparats a una política de desagregació de fins al 86%.Arquitectura de computador

    Proyecto Docente e Investigador, Trabajo Original de Investigación y Presentación de la Defensa, preparado por Germán Moltó para concursar a la plaza de Catedrático de Universidad, concurso 082/22, plaza 6708, área de Ciencia de la Computación e Inteligencia Artificial

    Full text link
    Este documento contiene el proyecto docente e investigador del candidato Germán Moltó Martínez presentado como requisito para el concurso de acceso a plazas de Cuerpos Docentes Universitarios. Concretamente, el documento se centra en el concurso para la plaza 6708 de Catedrático de Universidad en el área de Ciencia de la Computación en el Departamento de Sistemas Informáticos y Computación de la Universitat Politécnica de València. La plaza está adscrita a la Escola Técnica Superior d'Enginyeria Informàtica y tiene como perfil las asignaturas "Infraestructuras de Cloud Público" y "Estructuras de Datos y Algoritmos".También se incluye el Historial Académico, Docente e Investigador, así como la presentación usada durante la defensa.Germán Moltó Martínez (2022). Proyecto Docente e Investigador, Trabajo Original de Investigación y Presentación de la Defensa, preparado por Germán Moltó para concursar a la plaza de Catedrático de Universidad, concurso 082/22, plaza 6708, área de Ciencia de la Computación e Inteligencia Artificial. http://hdl.handle.net/10251/18903

    Acceleration of Computational Geometry Algorithms for High Performance Computing Based Geo-Spatial Big Data Analysis

    Get PDF
    Geo-Spatial computing and data analysis is the branch of computer science that deals with real world location-based data. Computational geometry algorithms are algorithms that process geometry/shapes and is one of the pillars of geo-spatial computing. Real world map and location-based data can be huge in size and the data structures used to process them extremely big leading to huge computational costs. Furthermore, Geo-Spatial datasets are growing on all V’s (Volume, Variety, Value, etc.) and are becoming larger and more complex to process in-turn demanding more computational resources. High Performance Computing is a way to breakdown the problem in ways that it can run in parallel on big computers with massive processing power and hence reduce the computing time delivering the same results but much faster.This dissertation explores different techniques to accelerate the processing of computational geometry algorithms and geo-spatial computing like using Many-core Graphics Processing Units (GPU), Multi-core Central Processing Units (CPU), Multi-node setup with Message Passing Interface (MPI), Cache optimizations, Memory and Communication optimizations, load balancing, Algorithmic Modifications, Directive based parallelization with OpenMP or OpenACC and Vectorization with compiler intrinsic (AVX). This dissertation has applied at least one of the mentioned techniques to the following problems. Novel method to parallelize plane sweep based geometric intersection for GPU with directives is presented. Parallelization of plane sweep based Voronoi construction, parallelization of Segment tree construction, Segment tree queries and Segment tree-based operations has been presented. Spatial autocorrelation, computation of getis-ord hotspots are also presented. Acceleration performance and speedup results are presented in each corresponding chapter

    Design Space Exploration and Resource Management of Multi/Many-Core Systems

    Get PDF
    The increasing demand of processing a higher number of applications and related data on computing platforms has resulted in reliance on multi-/many-core chips as they facilitate parallel processing. However, there is a desire for these platforms to be energy-efficient and reliable, and they need to perform secure computations for the interest of the whole community. This book provides perspectives on the aforementioned aspects from leading researchers in terms of state-of-the-art contributions and upcoming trends

    Partial aggregation for collective communication in distributed memory machines

    Get PDF
    High Performance Computing (HPC) systems interconnect a large number of Processing Elements (PEs) in high-bandwidth networks to simulate complex scientific problems. The increasing scale of HPC systems poses great challenges on algorithm designers. As the average distance between PEs increases, data movement across hierarchical memory subsystems introduces high latency. Minimizing latency is particularly challenging in collective communications, where many PEs may interact in complex communication patterns. Although collective communications can be optimized for network-level parallelism, occasional synchronization delays due to dependencies in the communication pattern degrade application performance. To reduce the performance impact of communication and synchronization costs, parallel algorithms are designed with sophisticated latency hiding techniques. The principle is to interleave computation with asynchronous communication, which increases the overall occupancy of compute cores. However, collective communication primitives abstract parallelism which limits the integration of latency hiding techniques. Approaches to work around these limitations either modify the algorithmic structure of application codes, or replace collective primitives with verbose low-level communication calls. While these approaches give fine-grained control for latency hiding, implementing collective communication algorithms is challenging and requires expertise knowledge about HPC network topologies. A collective communication pattern is commonly described as a Directed Acyclic Graph (DAG) where a set of PEs, represented as vertices, resolve data dependencies through communication along the edges. Our approach improves latency hiding in collective communication through partial aggregation. Based on mathematical rules of binary operations and homomorphism, we expose data parallelism in a respective DAG to overlap computation with communication. The proposed concepts are implemented and evaluated with a subset of collective primitives in the Message Passing Interface (MPI), an established communication standard in scientific computing. An experimental analysis with communication-bound microbenchmarks shows considerable performance benefits for the evaluated collective primitives. A detailed case study with a large-scale distributed sort algorithm demonstrates, how partial aggregation significantly improves performance in data-intensive scenarios. Besides better latency hiding capabilities with collective communication primitives, our approach enables further optimizations of their implementations within MPI libraries. The vast amount of asynchronous programming models, which are actively studied in the HPC community, benefit from partial aggregation in collective communication patterns. Future work can utilize partial aggregation to improve the interaction of MPI collectives with acclerator architectures, and to design more efficient communication algorithms

    Mobile Ad-Hoc Networks

    Get PDF
    Being infrastructure-less and without central administration control, wireless ad-hoc networking is playing a more and more important role in extending the coverage of traditional wireless infrastructure (cellular networks, wireless LAN, etc). This book includes state-of the-art techniques and solutions for wireless ad-hoc networks. It focuses on the following topics in ad-hoc networks: vehicular ad-hoc networks, security and caching, TCP in ad-hoc networks and emerging applications. It is targeted to provide network engineers and researchers with design guidelines for large scale wireless ad hoc networks

    Resource Management in Multi-Access Edge Computing (MEC)

    Get PDF
    This PhD thesis investigates the effective ways of managing the resources of a Multi-Access Edge Computing Platform (MEC) in 5th Generation Mobile Communication (5G) networks. The main characteristics of MEC include distributed nature, proximity to users, and high availability. Based on these key features, solutions have been proposed for effective resource management. In this research, two aspects of resource management in MEC have been addressed. They are the computational resource and the caching resource which corresponds to the services provided by the MEC. MEC is a new 5G enabling technology proposed to reduce latency by bringing cloud computing capability closer to end-user Internet of Things (IoT) and mobile devices. MEC would support latency-critical user applications such as driverless cars and e-health. These applications will depend on resources and services provided by the MEC. However, MEC has limited computational and storage resources compared to the cloud. Therefore, it is important to ensure a reliable MEC network communication during resource provisioning by eradicating the chances of deadlock. Deadlock may occur due to a huge number of devices contending for a limited amount of resources if adequate measures are not put in place. It is crucial to eradicate deadlock while scheduling and provisioning resources on MEC to achieve a highly reliable and readily available system to support latency-critical applications. In this research, a deadlock avoidance resource provisioning algorithm has been proposed for industrial IoT devices using MEC platforms to ensure higher reliability of network interactions. The proposed scheme incorporates Banker’s resource-request algorithm using Software Defined Networking (SDN) to reduce communication overhead. Simulation and experimental results have shown that system deadlock can be prevented by applying the proposed algorithm which ultimately leads to a more reliable network interaction between mobile stations and MEC platforms. Additionally, this research explores the use of MEC as a caching platform as it is proclaimed as a key technology for reducing service processing delays in 5G networks. Caching on MEC decreases service latency and improve data content access by allowing direct content delivery through the edge without fetching data from the remote server. Caching on MEC is also deemed as an effective approach that guarantees more reachability due to proximity to endusers. In this regard, a novel hybrid content caching algorithm has been proposed for MEC platforms to increase their caching efficiency. The proposed algorithm is a unification of a modified Belady’s algorithm and a distributed cooperative caching algorithm to improve data access while reducing latency. A polynomial fit algorithm with Lagrange interpolation is employed to predict future request references for Belady’s algorithm. Experimental results show that the proposed algorithm obtains 4% more cache hits due to its selective caching approach when compared with case study algorithms. Results also show that the use of a cooperative algorithm can improve the total cache hits up to 80%. Furthermore, this thesis has also explored another predictive caching scheme to further improve caching efficiency. The motivation was to investigate another predictive caching approach as an improvement to the formal. A Predictive Collaborative Replacement (PCR) caching framework has been proposed as a result which consists of three schemes. Each of the schemes addresses a particular problem. The proactive predictive scheme has been proposed to address the problem of continuous change in cache popularity trends. The collaborative scheme addresses the problem of cache redundancy in the collaborative space. Finally, the replacement scheme is a solution to evict cold cache blocks and increase hit ratio. Simulation experiment has shown that the replacement scheme achieves 3% more cache hits than existing replacement algorithms such as Least Recently Used, Multi Queue and Frequency-based replacement. PCR algorithm has been tested using a real dataset (MovieLens20M dataset) and compared with an existing contemporary predictive algorithm. Results show that PCR performs better with a 25% increase in hit ratio and a 10% CPU utilization overhead
    corecore