510 research outputs found

    Serverless Strategies and Tools in the Cloud Computing Continuum

    Full text link
    Tesis por compendio[ES] En los últimos años, la popularidad de la computación en nube ha permitido a los usuarios acceder a recursos de cómputo, red y almacenamiento sin precedentes bajo un modelo de pago por uso. Esta popularidad ha propiciado la aparición de nuevos servicios para resolver determinados problemas informáticos a gran escala y simplificar el desarrollo y el despliegue de aplicaciones. Entre los servicios más destacados en los últimos años se encuentran las plataformas FaaS (Función como Servicio), cuyo principal atractivo es la facilidad de despliegue de pequeños fragmentos de código en determinados lenguajes de programación para realizar tareas específicas en respuesta a eventos. Estas funciones son ejecutadas en los servidores del proveedor Cloud sin que los usuarios se preocupen de su mantenimiento ni de la gestión de su elasticidad, manteniendo siempre un modelo de pago por uso de grano fino. Las plataformas FaaS pertenecen al paradigma informático conocido como Serverless, cuyo propósito es abstraer la gestión de servidores por parte de los usuarios, permitiéndoles centrar sus esfuerzos únicamente en el desarrollo de aplicaciones. El problema del modelo FaaS es que está enfocado principalmente en microservicios y tiende a tener limitaciones en el tiempo de ejecución y en las capacidades de computación (por ejemplo, carece de soporte para hardware de aceleración como GPUs). Sin embargo, se ha demostrado que la capacidad de autoaprovisionamiento y el alto grado de paralelismo de estos servicios pueden ser muy adecuados para una mayor variedad de aplicaciones. Además, su inherente ejecución dirigida por eventos hace que las funciones sean perfectamente adecuadas para ser definidas como pasos en flujos de trabajo de procesamiento de archivos (por ejemplo, flujos de trabajo de computación científica). Por otra parte, el auge de los dispositivos inteligentes e integrados (IoT), las innovaciones en las redes de comunicación y la necesidad de reducir la latencia en casos de uso complejos han dado lugar al concepto de Edge computing, o computación en el borde. El Edge computing consiste en el procesamiento en dispositivos cercanos a las fuentes de datos para mejorar los tiempos de respuesta. La combinación de este paradigma con la computación en nube, formando arquitecturas con dispositivos a distintos niveles en función de su proximidad a la fuente y su capacidad de cómputo, se ha acuñado como continuo de la computación en la nube (o continuo computacional). Esta tesis doctoral pretende, por lo tanto, aplicar diferentes estrategias Serverless para permitir el despliegue de aplicaciones generalistas, empaquetadas en contenedores de software, a través de los diferentes niveles del continuo computacional. Para ello, se han desarrollado múltiples herramientas con el fin de: i) adaptar servicios FaaS de proveedores Cloud públicos; ii) integrar diferentes componentes software para definir una plataforma Serverless en infraestructuras privadas y en el borde; iii) aprovechar dispositivos de aceleración en plataformas Serverless; y iv) facilitar el despliegue de aplicaciones y flujos de trabajo a través de interfaces de usuario. Además, se han creado y adaptado varios casos de uso para evaluar los desarrollos conseguidos.[CA] En els últims anys, la popularitat de la computació al núvol ha permès als usuaris accedir a recursos de còmput, xarxa i emmagatzematge sense precedents sota un model de pagament per ús. Aquesta popularitat ha propiciat l'aparició de nous serveis per resoldre determinats problemes informàtics a gran escala i simplificar el desenvolupament i desplegament d'aplicacions. Entre els serveis més destacats en els darrers anys hi ha les plataformes FaaS (Funcions com a Servei), el principal atractiu de les quals és la facilitat de desplegament de petits fragments de codi en determinats llenguatges de programació per realitzar tasques específiques en resposta a esdeveniments. Aquestes funcions són executades als servidors del proveïdor Cloud sense que els usuaris es preocupen del seu manteniment ni de la gestió de la seva elasticitat, mantenint sempre un model de pagament per ús de gra fi. Les plataformes FaaS pertanyen al paradigma informàtic conegut com a Serverless, el propòsit del qual és abstraure la gestió de servidors per part dels usuaris, permetent centrar els seus esforços únicament en el desenvolupament d'aplicacions. El problema del model FaaS és que està enfocat principalment a microserveis i tendeix a tenir limitacions en el temps d'execució i en les capacitats de computació (per exemple, no té suport per a maquinari d'acceleració com GPU). Tot i això, s'ha demostrat que la capacitat d'autoaprovisionament i l'alt grau de paral·lelisme d'aquests serveis poden ser molt adequats per a més aplicacions. A més, la seva inherent execució dirigida per esdeveniments fa que les funcions siguen perfectament adequades per ser definides com a passos en fluxos de treball de processament d'arxius (per exemple, fluxos de treball de computació científica). D'altra banda, l'auge dels dispositius intel·ligents i integrats (IoT), les innovacions a les xarxes de comunicació i la necessitat de reduir la latència en casos d'ús complexos han donat lloc al concepte d'Edge computing, o computació a la vora. L'Edge computing consisteix en el processament en dispositius propers a les fonts de dades per millorar els temps de resposta. La combinació d'aquest paradigma amb la computació en núvol, formant arquitectures amb dispositius a diferents nivells en funció de la proximitat a la font i la capacitat de còmput, s'ha encunyat com a continu de la computació al núvol (o continu computacional). Aquesta tesi doctoral pretén, doncs, aplicar diferents estratègies Serverless per permetre el desplegament d'aplicacions generalistes, empaquetades en contenidors de programari, a través dels diferents nivells del continu computacional. Per això, s'han desenvolupat múltiples eines per tal de: i) adaptar serveis FaaS de proveïdors Cloud públics; ii) integrar diferents components de programari per definir una plataforma Serverless en infraestructures privades i a la vora; iii) aprofitar dispositius d'acceleració a plataformes Serverless; i iv) facilitar el desplegament d'aplicacions i fluxos de treball mitjançant interfícies d'usuari. A més, s'han creat i s'han adaptat diversos casos d'ús per avaluar els desenvolupaments aconseguits.[EN] In recent years, the popularity of Cloud computing has allowed users to access unprecedented compute, network, and storage resources under a pay-per-use model. This popularity led to new services to solve specific large-scale computing challenges and simplify the development and deployment of applications. Among the most prominent services in recent years are FaaS (Function as a Service) platforms, whose primary appeal is the ease of deploying small pieces of code in certain programming languages to perform specific tasks on an event-driven basis. These functions are executed on the Cloud provider's servers without users worrying about their maintenance or elasticity management, always keeping a fine-grained pay-per-use model. FaaS platforms belong to the computing paradigm known as Serverless, which aims to abstract the management of servers from the users, allowing them to focus their efforts solely on the development of applications. The problem with FaaS is that it focuses on microservices and tends to have limitations regarding the execution time and the computing capabilities (e.g. lack of support for acceleration hardware such as GPUs). However, it has been demonstrated that the self-provisioning capability and high degree of parallelism of these services can be well suited to broader applications. In addition, their inherent event-driven triggering makes functions perfectly suitable to be defined as steps in file processing workflows (e.g. scientific computing workflows). Furthermore, the rise of smart and embedded devices (IoT), innovations in communication networks and the need to reduce latency in challenging use cases have led to the concept of Edge computing. Edge computing consists of conducting the processing on devices close to the data sources to improve response times. The coupling of this paradigm together with Cloud computing, involving architectures with devices at different levels depending on their proximity to the source and their compute capability, has been coined as Cloud Computing Continuum (or Computing Continuum). Therefore, this PhD thesis aims to apply different Serverless strategies to enable the deployment of generalist applications, packaged in software containers, across the different tiers of the Cloud Computing Continuum. To this end, multiple tools have been developed in order to: i) adapt FaaS services from public Cloud providers; ii) integrate different software components to define a Serverless platform on on-premises and Edge infrastructures; iii) leverage acceleration devices on Serverless platforms; and iv) facilitate the deployment of applications and workflows through user interfaces. Additionally, several use cases have been created and adapted to assess the developments achieved.Risco Gallardo, S. (2023). Serverless Strategies and Tools in the Cloud Computing Continuum [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/202013Compendi

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum

    Development and application of methodologies and infrastructures for cancer genome analysis within Personalized Medicine

    Full text link
    [eng] Next-generation sequencing (NGS) has revolutionized biomedical sciences, especially in the area of cancer. It has nourished genomic research with extensive collections of sequenced genomes that are investigated to untangle the molecular bases of disease, as well as to identify potential targets for the design of new treatments. To exploit all this information, several initiatives have emerged worldwide, among which the Pan-Cancer project of the ICGC (International Cancer Genome Consortium) stands out. This project has jointly analyzed thousands of tumor genomes of different cancer types in order to elucidate the molecular bases of the origin and progression of cancer. To accomplish this task, new emerging technologies, including virtualization systems such as virtual machines or software containers, were used and had to be adapted to various computing centers. The portability of this system to the supercomputing infrastructure of the BSC (Barcelona Supercomputing Center) has been carried out during the first phase of the thesis. In parallel, other projects promote the application of genomics discoveries into the clinics. This is the case of MedPerCan, a national initiative to design a pilot project for the implementation of personalized medicine in oncology in Catalonia. In this context, we have centered our efforts on the methodological side, focusing on the detection and characterization of somatic variants in tumors. This step is a challenging action, due to the heterogeneity of the different methods, and an essential part, as it lays at the basis of all downstream analyses. On top of the methodological section of the thesis, we got into the biological interpretation of the results to study the evolution of chronic lymphocytic leukemia (CLL) in a close collaboration with the group of Dr. Elías Campo from the Hospital Clínic/IDIBAPS. In the first study, we have focused on the Richter transformation (RT), a transformation of CLL into a high-grade lymphoma that leads to a very poor prognosis and with unmet clinical needs. We found that RT has greater genomic, epigenomic and transcriptomic complexity than CLL. Its genome may reflect the imprint of therapies that the patients received prior to RT, indicating the presence of cells exposed to these mutagenic treatments which later expand giving rise to the clinical manifestation of the disease. Multiple NGS- based techniques, including whole-genome sequencing and single-cell DNA and RNA sequencing, among others, confirmed the pre-existence of cells with the RT characteristics years before their manifestation, up to the time of CLL diagnosis. The transcriptomic profile of RT is remarkably different from that of CLL. Of particular importance is the overexpression of the OXPHOS pathway, which could be used as a therapeutic vulnerability. Finally, in a second study, the analysis of a case of CLL in a young adult, based on whole genome and single-cell sequencing at different times of the disease, revealed that the founder clone of CLL did not present any somatic driver mutations and was characterized by germline variants in ATM, suggesting its role in the origin of the disease, and highlighting the possible contribution of germline variants or other non-genetic mechanisms in the initiation of CLL

    Co-designing reliability and performance for datacenter memory

    Get PDF
    Memory is one of the key components that affects reliability and performance of datacenter servers. Memory in today’s servers is organized and shared in several ways to provide the most performant and efficient access to data. For example, cache hierarchy in multi-core chips to reduce access latency, non-uniform memory access (NUMA) in multi-socket servers to improve scalability, disaggregation to increase memory capacity. In all these organizations, hardware coherence protocols are used to maintain memory consistency of this shared memory and implicitly move data to the requesting cores. This thesis aims to provide fault-tolerance against newer models of failure in the organization of memory in datacenter servers. While designing for improved reliability, this thesis explores solutions that can also enhance performance of applications. The solutions build over modern coherence protocols to achieve these properties. First, we observe that DRAM memory system failure rates have increased, demanding stronger forms of memory reliability. To combat this, the thesis proposes Dvé, a hardware driven replication mechanism where data blocks are replicated across two different memory controllers in a cache-coherent NUMA system. Data blocks are accompanied by a code with strong error detection capabilities so that when an error is detected, correction is performed using the replica. Dvé’s organization offers two independent points of access to data which enables: (a) strong error correction that can recover from a range of faults affecting any of the components in the memory and (b) higher performance by providing another nearer point of memory access. Dvé’s coherent replication keeps the replicas in sync for reliability and also provides coherent access to read replicas during fault-free operation for improved performance. Dvé can flexibly provide these benefits on-demand at runtime. Next, we observe that the coherence protocol itself requires to be hardened against failures. Memory in datacenter servers is being disaggregated from the compute servers into dedicated memory servers, driven by standards like CXL. CXL specifies the coherence protocol semantics for compute servers to access and cache data from a shared region in the disaggregated memory. However, the CXL specification lacks the requisite level of fault-tolerance necessary to operate at an inter-server scale within the datacenter. Compute servers can fail or be unresponsive in the datacenter and therefore, it is important that the coherence protocol remain available in the presence of such failures. The thesis proposes Āpta, a CXL-based, shared disaggregated memory system for keeping the cached data consistent without compromising availability in the face of compute server failures. Āpta architects a high-performance fault-tolerant object-granular memory server that significantly improves performance for stateless function-as-a-service (FaaS) datacenter applications

    Development and application of methodologies and infrastructures for cancer genome analysis within Personalized Medicine

    Get PDF
    Programa de Doctorat en Biomedicina / Tesi realitzada al Barcelona Supercomputing Cener (BSC)[eng] Next-generation sequencing (NGS) has revolutionized biomedical sciences, especially in the area of cancer. It has nourished genomic research with extensive collections of sequenced genomes that are investigated to untangle the molecular bases of disease, as well as to identify potential targets for the design of new treatments. To exploit all this information, several initiatives have emerged worldwide, among which the Pan-Cancer project of the ICGC (International Cancer Genome Consortium) stands out. This project has jointly analyzed thousands of tumor genomes of different cancer types in order to elucidate the molecular bases of the origin and progression of cancer. To accomplish this task, new emerging technologies, including virtualization systems such as virtual machines or software containers, were used and had to be adapted to various computing centers. The portability of this system to the supercomputing infrastructure of the BSC (Barcelona Supercomputing Center) has been carried out during the first phase of the thesis. In parallel, other projects promote the application of genomics discoveries into the clinics. This is the case of MedPerCan, a national initiative to design a pilot project for the implementation of personalized medicine in oncology in Catalonia. In this context, we have centered our efforts on the methodological side, focusing on the detection and characterization of somatic variants in tumors. This step is a challenging action, due to the heterogeneity of the different methods, and an essential part, as it lays at the basis of all downstream analyses. On top of the methodological section of the thesis, we got into the biological interpretation of the results to study the evolution of chronic lymphocytic leukemia (CLL) in a close collaboration with the group of Dr. Elías Campo from the Hospital Clínic/IDIBAPS. In the first study, we have focused on the Richter transformation (RT), a transformation of CLL into a high-grade lymphoma that leads to a very poor prognosis and with unmet clinical needs. We found that RT has greater genomic, epigenomic and transcriptomic complexity than CLL. Its genome may reflect the imprint of therapies that the patients received prior to RT, indicating the presence of cells exposed to these mutagenic treatments which later expand giving rise to the clinical manifestation of the disease. Multiple NGS- based techniques, including whole-genome sequencing and single-cell DNA and RNA sequencing, among others, confirmed the pre-existence of cells with the RT characteristics years before their manifestation, up to the time of CLL diagnosis. The transcriptomic profile of RT is remarkably different from that of CLL. Of particular importance is the overexpression of the OXPHOS pathway, which could be used as a therapeutic vulnerability. Finally, in a second study, the analysis of a case of CLL in a young adult, based on whole genome and single-cell sequencing at different times of the disease, revealed that the founder clone of CLL did not present any somatic driver mutations and was characterized by germline variants in ATM, suggesting its role in the origin of the disease, and highlighting the possible contribution of germline variants or other non-genetic mechanisms in the initiation of CLL

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters

    Detection and Mitigation of Steganographic Malware

    Get PDF
    A new attack trend concerns the use of some form of steganography and information hiding to make malware stealthier and able to elude many standard security mechanisms. Therefore, this Thesis addresses the detection and the mitigation of this class of threats. In particular, it considers malware implementing covert communications within network traffic or cloaking malicious payloads within digital images. The first research contribution of this Thesis is in the detection of network covert channels. Unfortunately, the literature on the topic lacks of real traffic traces or attack samples to perform precise tests or security assessments. Thus, a propaedeutic research activity has been devoted to develop two ad-hoc tools. The first allows to create covert channels targeting the IPv6 protocol by eavesdropping flows, whereas the second allows to embed secret data within arbitrary traffic traces that can be replayed to perform investigations in realistic conditions. This Thesis then starts with a security assessment concerning the impact of hidden network communications in production-quality scenarios. Results have been obtained by considering channels cloaking data in the most popular protocols (e.g., TLS, IPv4/v6, and ICMPv4/v6) and showcased that de-facto standard intrusion detection systems and firewalls (i.e., Snort, Suricata, and Zeek) are unable to spot this class of hazards. Since malware can conceal information (e.g., commands and configuration files) in almost every protocol, traffic feature or network element, configuring or adapting pre-existent security solutions could be not straightforward. Moreover, inspecting multiple protocols, fields or conversations at the same time could lead to performance issues. Thus, a major effort has been devoted to develop a suite based on the extended Berkeley Packet Filter (eBPF) to gain visibility over different network protocols/components and to efficiently collect various performance indicators or statistics by using a unique technology. This part of research allowed to spot the presence of network covert channels targeting the header of the IPv6 protocol or the inter-packet time of generic network conversations. In addition, the approach based on eBPF turned out to be very flexible and also allowed to reveal hidden data transfers between two processes co-located within the same host. Another important contribution of this part of the Thesis concerns the deployment of the suite in realistic scenarios and its comparison with other similar tools. Specifically, a thorough performance evaluation demonstrated that eBPF can be used to inspect traffic and reveal the presence of covert communications also when in the presence of high loads, e.g., it can sustain rates up to 3 Gbit/s with commodity hardware. To further address the problem of revealing network covert channels in realistic environments, this Thesis also investigates malware targeting traffic generated by Internet of Things devices. In this case, an incremental ensemble of autoencoders has been considered to face the ''unknown'' location of the hidden data generated by a threat covertly exchanging commands towards a remote attacker. The second research contribution of this Thesis is in the detection of malicious payloads hidden within digital images. In fact, the majority of real-world malware exploits hiding methods based on Least Significant Bit steganography and some of its variants, such as the Invoke-PSImage mechanism. Therefore, a relevant amount of research has been done to detect the presence of hidden data and classify the payload (e.g., malicious PowerShell scripts or PHP fragments). To this aim, mechanisms leveraging Deep Neural Networks (DNNs) proved to be flexible and effective since they can learn by combining raw low-level data and can be updated or retrained to consider unseen payloads or images with different features. To take into account realistic threat models, this Thesis studies malware targeting different types of images (i.e., favicons and icons) and various payloads (e.g., URLs and Ethereum addresses, as well as webshells). Obtained results showcased that DNNs can be considered a valid tool for spotting the presence of hidden contents since their detection accuracy is always above 90% also when facing ''elusion'' mechanisms such as basic obfuscation techniques or alternative encoding schemes. Lastly, when detection or classification are not possible (e.g., due to resource constraints), approaches enforcing ''sanitization'' can be applied. Thus, this Thesis also considers autoencoders able to disrupt hidden malicious contents without degrading the quality of the image

    CEP-DTHP : A Complex Event Processing using the Dual-Tier Hybrid Paradigm Over the Stream Mining Process

    Get PDF
    CEP is a widely used technique for the reliability and recognition of arbitrarily complex patterns in enormous data streams with great performance in real time. Real-time detection of crucial events and rapid response to them are the key goals of sophisticated event processing.  The performance of event processing systems can be improved by parallelizing CEP evaluation procedures. Utilizing CEP in parallel while deploying a multi-core or distributed environment is one of the most popular and widely recognized tackles to accomplish the goal. This paper demonstrates the ability to use an unusual parallelization strategy to effectively process complicated events over streams of data. This method depends on a dual-tier hybrid paradigm that combines several parallelism levels. Thread-level or task-level parallelism (TLP) and Data-level parallelism (DLP) were combined in this research. Many threads or instruction sequences from a comparable application can run concurrently under the TLP paradigm. In the DLP paradigm, instruc-tions from a single stream operate on several data streams at the same time. In our suggested model, there are four major stages: data mining, pre-processing, load shedding, and optimization. The first phase is online data mining, following which the data is materialized into a publicly available solution that combines a CEP engine with a library. Next, data pre-processing encompasses the efficient adaptation of the content or format of raw data from many, perhaps diverse sources. Finally, parallelization approaches have been created to reduce CEP processing time. By providing this two-type parallelism, our proposed solution combines the benefits of DLP and TLP while addressing their constraints. The JAVA tool will be used to assess the suggested technique. The performance of the suggested technique is compared to that of other current ways for determining the efficacy and efficiency of the proposed algorithm
    corecore