138 research outputs found

    Serverless Computing Strategies on Cloud Platforms

    Full text link
    [ES] Con el desarrollo de la Computación en la Nube, la entrega de recursos virtualizados a través de Internet ha crecido enormemente en los últimos años. Las Funciones como servicio (FaaS), uno de los modelos de servicio más nuevos dentro de la Computación en la Nube, permite el desarrollo e implementación de aplicaciones basadas en eventos que cubren servicios administrados en Nubes públicas y locales. Los proveedores públicos de Computación en la Nube adoptan el modelo FaaS dentro de su catálogo para proporcionar computación basada en eventos altamente escalable para las aplicaciones. Por un lado, los desarrolladores especializados en esta tecnología se centran en crear marcos de código abierto serverless para evitar el bloqueo con los proveedores de la Nube pública. A pesar del desarrollo logrado por la informática serverless, actualmente hay campos relacionados con el procesamiento de datos y la optimización del rendimiento en la ejecución en los que no se ha explorado todo el potencial. En esta tesis doctoral se definen tres estrategias de computación serverless que permiten evidenciar los beneficios de esta tecnología para el procesamiento de datos. Las estrategias implementadas permiten el análisis de datos con la integración de dispositivos de aceleración para la ejecución eficiente de aplicaciones científicas en plataformas cloud públicas y locales. En primer lugar, se desarrolló la plataforma CloudTrail-Tracker. CloudTrail-Tracker es una plataforma serverless de código abierto basada en eventos para el procesamiento de datos que puede escalar automáticamente hacia arriba y hacia abajo, con la capacidad de escalar a cero para minimizar los costos operativos. Seguidamente, se plantea la integración de GPUs en una plataforma serverless local impulsada por eventos para el procesamiento de datos escalables. La plataforma admite la ejecución de aplicaciones como funciones severless en respuesta a la carga de un archivo en un sistema de almacenamiento de ficheros, lo que permite la ejecución en paralelo de las aplicaciones según los recursos disponibles. Este procesamiento es administrado por un cluster Kubernetes elástico que crece y decrece automáticamente según las necesidades de procesamiento. Ciertos enfoques basados en tecnologías de virtualización de GPU como rCUDA y NVIDIA-Docker se evalúan para acelerar el tiempo de ejecución de las funciones. Finalmente, se implementa otra solución basada en el modelo serverless para ejecutar la fase de inferencia de modelos de aprendizaje automático previamente entrenados, en la plataforma de Amazon Web Services y en una plataforma privada con el framework OSCAR. El sistema crece elásticamente de acuerdo con la demanda y presenta una escalado a cero para minimizar los costes. Por otra parte, el front-end proporciona al usuario una experiencia simplificada en la obtención de la predicción de modelos de aprendizaje automático. Para demostrar las funcionalidades y ventajas de las soluciones propuestas durante esta tesis se recogen varios casos de estudio que abarcan diferentes campos del conocimiento como la analítica de aprendizaje y la Inteligencia Artificial. Esto demuestra que la gama de aplicaciones donde la computación serverless puede aportar grandes beneficios es muy amplia. Los resultados obtenidos avalan el uso del modelo serverless en la simplificación del diseño de arquitecturas para el uso intensivo de datos en aplicaciones complejas.[CA] Amb el desenvolupament de la Computació en el Núvol, el lliurament de recursos virtualitzats a través d'Internet ha crescut granment en els últims anys. Les Funcions com a Servei (FaaS), un dels models de servei més nous dins de la Computació en el Núvol, permet el desenvolupament i implementació d'aplicacions basades en esdeveniments que cobreixen serveis administrats en Núvols públics i locals. Els proveïdors de computació en el Núvol públic adopten el model FaaS dins del seu catàleg per a proporcionar a les aplicacions computació altament escalable basada en esdeveniments. D'una banda, els desenvolupadors especialitzats en aquesta tecnologia se centren en crear marcs de codi obert serverless per a evitar el bloqueig amb els proveïdors del Núvol públic. Malgrat el desenvolupament alcançat per la informàtica serverless, actualment hi ha camps relacionats amb el processament de dades i l'optimització del rendiment d'execució en els quals no s'ha explorat tot el potencial. En aquesta tesi doctoral es defineixen tres estratègies informàtiques serverless que permeten demostrar els beneficis d'aquesta tecnologia per al processament de dades. Les estratègies implementades permeten l'anàlisi de dades amb a integració de dispositius accelerats per a l'execució eficient d'aplicacion scientífiques en plataformes de Núvol públiques i locals. En primer lloc, es va desenvolupar la plataforma CloudTrail-Tracker. CloudTrail-Tracker és una plataforma de codi obert basada en esdeveniments per al processament de dades serverless que pot escalar automáticament cap amunt i cap avall, amb la capacitat d'escalar a zero per a minimitzar els costos operatius. A continuació es planteja la integració de GPUs en una plataforma serverless local impulsada per esdeveniments per al processament de dades escalables. La plataforma admet l'execució d'aplicacions com funcions severless en resposta a la càrrega d'un arxiu en un sistema d'emmagatzemaments de fitxers, la qual cosa permet l'execució en paral·lel de les aplicacions segon sels recursos disponibles. Este processament és administrat per un cluster Kubernetes elàstic que creix i decreix automàticament segons les necessitats de processament. Certs enfocaments basats en tecnologies de virtualització de GPU com rCUDA i NVIDIA-Docker s'avaluen per a accelerar el temps d'execució de les funcions. Finalment s'implementa una altra solució basada en el model serverless per a executar la fase d'inferència de models d'aprenentatge automàtic prèviament entrenats en la plataforma de Amazon Web Services i en una plataforma privada amb el framework OSCAR. El sistema creix elàsticament d'acord amb la demanda i presenta una escalada a zero per a minimitzar els costos. D'altra banda el front-end proporciona a l'usuari una experiència simplificada en l'obtenció de la predicció de models d'aprenentatge automàtic. Per a demostrar les funcionalitats i avantatges de les solucions proposades durant esta tesi s'arrepleguen diversos casos d'estudi que comprenen diferents camps del coneixement com l'analítica d'aprenentatge i la Intel·ligència Artificial. Això demostra que la gamma d'aplicacions on la computació serverless pot aportar grans beneficis és molt àmplia. Els resultats obtinguts avalen l'ús del model serverless en la simplificació del disseny d'arquitectures per a l'ús intensiu de dades en aplicacions complexes.[EN] With the development of Cloud Computing, the delivery of virtualized resources over the Internet has greatly grown in recent years. Functions as a Service (FaaS), one of the newest service models within Cloud Computing, allows the development and implementation of event-based applications that cover managed services in public and on-premises Clouds. Public Cloud Computing providers adopt the FaaS model within their catalog to provide event-driven highly-scalable computing for applications. On the one hand, developers specialized in this technology focus on creating open-source serverless frameworks to avoid the lock-in with public Cloud providers. Despite the development achieved by serverless computing, there are currently fields related to data processing and execution performance optimization where the full potential has not been explored. In this doctoral thesis three serverless computing strategies are defined that allow to demonstrate the benefits of this technology for data processing. The implemented strategies allow the analysis of data with the integration of accelerated devices for the efficient execution of scientific applications on public and on-premises Cloud platforms. Firstly, the CloudTrail-Tracker platform was developed to extract and process learning analytics in the Cloud. CloudTrail-Tracker is an event-driven open-source platform for serverless data processing that can automatically scale up and down, featuring the ability to scale to zero for minimizing the operational costs. Next, the integration of GPUs in an event-driven on-premises serverless platform for scalable data processing is discussed. The platform supports the execution of applications as serverless functions in response to the loading of a file in a file storage system, which allows the parallel execution of applications according to available resources. This processing is managed by an elastic Kubernetes cluster that automatically grows and shrinks according to the processing needs. Certain approaches based on GPU virtualization technologies such as rCUDA and NVIDIA-Docker are evaluated to speed up the execution time of the functions. Finally, another solution based on the serverless model is implemented to run the inference phase of previously trained machine learning models on theAmazon Web Services platform and in a private platform with the OSCAR framework. The system grows elastically according to demand and is scaled to zero to minimize costs. On the other hand, the front-end provides the user with a simplified experience in obtaining the prediction of machine learning models. To demonstrate the functionalities and advantages of the solutions proposed during this thesis, several case studies are collected covering different fields of knowledge such as learning analytics and Artificial Intelligence. This shows the wide range of applications where serverless computing can bring great benefits. The results obtained endorse the use of the serverless model in simplifying the design of architectures for the intensive data processing in complex applications.Naranjo Delgado, DM. (2021). Serverless Computing Strategies on Cloud Platforms [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/160916TESI

    Hardware dedicado para sistemas empotrados de visión

    Get PDF
    La constante evolución de las Tecnologías de la Información y las Comunicaciones no solo ha permitido que más de la mitad de la población mundial esté actualmente interconectada a través de Internet, sino que ha sido el caldo de cultivo en el que han surgido nuevos paradigmas, como el ‘Internet de las cosas’ (IoT) o la ‘Inteligencia ambiental’ (AmI), que plantean la necesidad de interconectar objetos con distintas funcionalidades para lograr un entorno digital, sensible y adaptativo, que proporcione servicios de muy distinta índole a sus usuarios. La consecución de este entorno requiere el desarrollo de dispositivos electrónicos de bajo coste que, con tamaño y peso reducido, sean capaces de interactuar con el medio que los rodea, operar con máxima autonomía y proporcionar un elevado nivel de inteligencia. La funcionalidad de muchos de estos dispositivos incluirá la capacidad para adquirir, procesar y transmitir imágenes, extrayendo, interpretando o modificando la información visual que resulte de interés para una determinada aplicación. En el marco de este desafío surge la presente Tesis Doctoral, cuyo eje central es el desarrollo de hardware dedicado para la implementación de algoritmos de procesamiento de imágenes y secuencias de vídeo usados en sistemas empotrados de visión. El trabajo persigue una doble finalidad. Por una parte, la búsqueda de soluciones que, por sus prestaciones y rendimiento, puedan ser incorporadas en sistemas que satisfagan las estrictas exigencias de funcionalidad, tamaño, consumo de energía y velocidad de operación demandadas por las nuevas aplicaciones. Por otra, el diseño de una serie de bloques funcionales implementados como módulos de propiedad intelectual, que permitan aliviar la carga computacional de las unidades de procesado de los sistemas en los que se integren. En la Tesis se proponen soluciones específicas para la implementación de dos tipos de operaciones habitualmente presentes en muchos sistemas de visión artificial: la sustracción de fondo y el etiquetado de componentes conexos. Las distintas alternativas surgen como consecuencia de aplicar una adecuada relación de compromiso entre funcionalidad y coste, entendiendo este último criterio en términos de recursos de cómputo, velocidad de operación y potencia consumida, lo que permite cubrir un amplio espectro de aplicaciones. En algunas de las soluciones propuestas se han utilizado además, técnicas de inferencia basadas en Lógica Difusa con idea de mejorar la calidad de los sistemas de visión resultantes. Para la realización de los diferentes bloques funcionales se ha seguido una metodología de diseño basada en modelos, que ha permitido la realización de todo el ciclo de desarrollo en un único entorno de trabajo. Dicho entorno combina herramientas informáticas que facilitan las etapas de codificación algorítmica, diseño de circuitos, implementación física y verificación funcional y temporal de las distintas alternativas, acelerando con ello todas las fases del flujo de diseño y posibilitando una exploración más eficiente del espacio de posibles soluciones. Asimismo, con el objetivo de demostrar la funcionalidad de las distintas aportaciones de esta Tesis Doctoral, algunas de las soluciones propuestas han sido integradas en sistemas de vídeo reales, que emplean buses estándares de uso común. Los dispositivos seleccionados para llevar a cabo estos demostradores han sido FPGAs y SoPCs de Xilinx, ya que sus excelentes propiedades para el prototipado y la construcción de sistemas que combinan componentes software y hardware, los convierten en candidatos ideales para dar soporte a la implementación de este tipo de sistemas.The continuous evolution of the Information and Communication Technologies (ICT), not only has allowed more than half of the global population to be currently interconnected through Internet, but it has also been the breeding ground for new paradigms such as Internet of Things (ioT) or Ambient Intelligence (AmI). These paradigms expose the need of interconnecting elements with different functionalities in order to achieve a digital, sensitive, adaptive and responsive environment that provides services of distinct nature to the users. The development of low cost devices, with small size, light weight and a high level of autonomy, processing power and ability for interaction is required to obtain this environment. Attending to this last feature, many of these devices will include the capacity to acquire, process and transmit images, extracting, interpreting and modifying the visual information that could be of interest for a certain application. This PhD Thesis, focused on the development of dedicated hardware for the implementation of image and video processing algorithms used in embedded systems, attempts to response to this challenge. The work has a two-fold purpose: on one hand, the search of solutions that, for its performance and properties, could be integrated on systems with strict requirements of functionality, size, power consumption and speed of operation; on the other hand, the design of a set of blocks that, packaged and implemented as IP-modules, allow to alleviate the computational load of the processing units of the systems where they could be integrated. In this Thesis, specific solutions for the implementation of two kinds of usual operations in many computer vision systems are provided. These operations are background subtraction and connected component labelling. Different solutions are created as the result of applying a good performance/cost trade-off (approaching this last criteria in terms of area, speed and consumed power), able to cover a wide range of applications. Inference techniques based on Fuzzy Logic have been applied to some of the proposed solutions in order to improve the quality of the resulting systems. To obtain the mentioned solutions, a model based-design methodology has been applied. This fact has allowed us to carry out all the design flow from a single work environment. That environment combines CAD tools that facilitate the stages of code programming, circuit design, physical implementation and functional and temporal verification of the different algorithms, thus accelerating the overall processes and making it possible to explore the space of solutions. Moreover, aiming to demonstrate the functionality of this PhD Thesis’s contributions, some of the proposed solutions have been integrated on real video systems that employ common and standard buses. The devices selected to perform these demonstrators have been FPGA and SoPCs (manufactured by Xilinx) since, due to their excellent properties for prototyping and creating systems that combine software and hardware components, they are ideal to develop these applications

    Protection in commodity monolithic operating systems

    Get PDF
    This dissertation suggests and partially demonstrates that it is feasible to retrofit real privilege separation within commodity operating systems by "nesting" a small memory management protection domain inside a monolithic kernel's single-address space: all the while allowing both domains to operate at the same hardware privilege level. This dissertation also demonstrates a microarchitectural return-integrity protection domain that efficiently asserts dynamic "return-to-sender" semantics for all operating system return control-flow operations. Employing these protection domains, we provide mitigations to large classes of kernel attacks such as code injection and return-oriented programming and deploy information protection policies that are not feasible with existing systems. Operating systems form the foundation of information protection in multiprogramming environments. Unfortunately, today's commodity operating systems employ monolithic kernel design, where any single exploit in the vast code base undermines all information protection in the system because all kernel code operates with full supervisor privileges, meaning that even perfectly secure applications are vulnerable. This dissertation explores an approach that retrofits fundamental information protection design principles into commodity monolithic operating systems, the aim of which is a micro-evolution of commodity system design that incrementally decomposes monolithic operating systems from the ground up, thereby applying microkernel-like security properties for billions of users worldwide. The key contribution is the creation of a new operating system organization, the Nested Kernel Architecture, which "nests" a new, efficient intra-kernel memory isolation mechanism into a traditional monolithic operating system design. Using the Nested Kernel Architecture we introduce write-protection services for kernel developers to deploy security policies in ways not possible in current systems—while greatly reducing the trusted computing base—and demonstrate the value of these services by deploying three special data protection policies. Overall, the Nested Kernel Architecture demonstrates practical in-place protections that require only minor code modifications with minimal run- time overheads

    FPGA structures for high speed and low overhead dynamic circuit specialization

    Get PDF
    A Field Programmable Gate Array (FPGA) is a programmable digital electronic chip. The FPGA does not come with a predefined function from the manufacturer; instead, the developer has to define its function through implementing a digital circuit on the FPGA resources. The functionality of the FPGA can be reprogrammed as desired and hence the name “field programmable”. FPGAs are useful in small volume digital electronic products as the design of a digital custom chip is expensive. Changing the FPGA (also called configuring it) is done by changing the configuration data (in the form of bitstreams) that defines the FPGA functionality. These bitstreams are stored in a memory of the FPGA called configuration memory. The SRAM cells of LookUp Tables (LUTs), Block Random Access Memories (BRAMs) and DSP blocks together form the configuration memory of an FPGA. The configuration data can be modified according to the user’s needs to implement the user-defined hardware. The simplest way to program the configuration memory is to download the bitstreams using a JTAG interface. However, modern techniques such as Partial Reconfiguration (PR) enable us to configure a part in the configuration memory with partial bitstreams during run-time. The reconfiguration is achieved by swapping in partial bitstreams into the configuration memory via a configuration interface called Internal Configuration Access Port (ICAP). The ICAP is a hardware primitive (macro) present in the FPGA used to access the configuration memory internally by an embedded processor. The reconfiguration technique adds flexibility to use specialized ci rcuits that are more compact and more efficient t han t heir b ulky c ounterparts. An example of such an implementation is the use of specialized multipliers instead of big generic multipliers in an FIR implementation with constant coefficients. To specialize these circuits and reconfigure during the run-time, researchers at the HES group proposed the novel technique called parameterized reconfiguration that can be used to efficiently and automatically implement Dynamic Circuit Specialization (DCS) that is built on top of the Partial Reconfiguration method. It uses the run-time reconfiguration technique that is tailored to implement a parameterized design. An application is said to be parameterized if some of its input values change much less frequently than the rest. These inputs are called parameters. Instead of implementing these parameters as regular inputs, in DCS these inputs are implemented as constants, and the application is optimized for the constants. For every change in parameter values, the design is re-optimized (specialized) during run-time and implemented by reconfiguring the optimized design for a new set of parameters. In DCS, the bitstreams of the parameterized design are expressed as Boolean functions of the parameters. For every infrequent change in parameters, a specialized FPGA configuration is generated by evaluating the corresponding Boolean functions, and the FPGA is reconfigured with the specialized configuration. A detailed study of overheads of DCS and providing suitable solutions with appropriate custom FPGA structures is the primary goal of the dissertation. I also suggest different improvements to the FPGA configuration memory architecture. After offering the custom FPGA structures, I investigated the role of DCS on FPGA overlays and the use of custom FPGA structures that help to reduce the overheads of DCS on FPGA overlays. By doing so, I hope I can convince the developer to use DCS (which now comes with minimal costs) in real-world applications. I start the investigations of overheads of DCS by implementing an adaptive FIR filter (using the DCS technique) on three different Xilinx FPGA platforms: Virtex-II Pro, Virtex-5, and Zynq-SoC. The study of how DCS behaves and what is its overhead in the evolution of the three FPGA platforms is the non-trivial basis to discover the costs of DCS. After that, I propose custom FPGA structures (reconfiguration controllers and reconfiguration drivers) to reduce the main overhead (reconfiguration time) of DCS. These structures not only reduce the reconfiguration time but also help curbing the power hungry part of the DCS system. After these chapters, I study the role of DCS on FPGA overlays. I investigate the effect of the proposed FPGA structures on Virtual-Coarse-Grained Reconfigurable Arrays (VCGRAs). I classify the VCGRA implementations into three types: the conventional VCGRA, partially parameterized VCGRA and fully parameterized VCGRA depending upon the level of parameterization. I have designed two variants of VCGRA grids for HPC image processing applications, namely, the MAC grid and Pixie. Finally, I try to tackle the reconfiguration time overhead at the hardware level of the FPGA by customizing the FPGA configuration memory architecture. In this part of my research, I propose to use a parallel memory structure to improve the reconfiguration time of DCS drastically. However, this improvement comes with a significant overhead of hardware resources which will need to be solved in future research on commercial FPGA configuration memory architectures

    G-PUF : asoftware-only PUF for GPUs

    Get PDF
    Physical Unclonable Functions (PUFs) are security primitives which allow the generation of unique IDs and security keys. Their security stems from the inherent process variations of silicon chips manufacturing, and the minute random effects introduced in integrated circuits. PUFs usually are manufactured speciffically for this purpose, but in the last few years several proposals have developed PUFs from off-the-shelf components. These Intrinsic PUFs avoid modifications in the hardware and explore the low cost of adapting existing technologies. Graphical Processing Units (GPUs) present themselves as promising candidates for an Intrinsic PUF. GPUs are massively multi-processed systems originally built for graphical computing and more recently re-designed for general computing. These devices are distributed across a variety of systems and application environments, from computer vision platforms, to server clusters and home computers. Building PUFs with software-only strategies is a challenging problem, since a PUF must evaluate process variations without rendering system performance, characteristics which are easily done in hardware. In this work we present G-PUF, an intrinsic PUF technology running entirely on CUDA. The proposed solution maps the distribution of soft-errors in matrix multiplications when the GPU is running on adversarial conditions of overclock and undervoltage. The resulting error map will be unique to each GPU, and using a novel Challenge-Response Pair extraction algorithm, G-PUF is able to retrieve secure-keys or an device ID without disclosing information about the PUF randomness. The system was tested in real setups and requires no modifications whatsoever to an already operational GPU. G-PUF was capable of achieving upwards of 94.73% of reliability without any error correction code and can provide up to 253 unique Challenge-Response Pairs.Physically Unclonable Functions (PUFs) são primitivas de segurança que permitem a criação de identidades únicas e de chaves seguras. Sua segurança deriva das variações de processo intrínsecas à fabricação de chips de silício, e os diminutos efeitos aleatórios introduzidos em circuitos integrados. PUFs normalmente são fabricados especificamente para esse propósito, mas nos últimos anos várias propostas desenvolveram PUFs com componentes comuns. Esses PUFs Intrínsecos evitam modificações de hardware e exploram o baixo custo de adaptar tecnologias já existentes. Unidades de Processamento Gráfico (GPUs) se apresentam como candidatos promissores para um PUF Intrínseco. GPUs são sistemas massivamente multi-processados, desenvolvidos originalmente para computação gráfica e mais recentemente reprojetadas para computação genérica. Esses dispositivos estão distribuidos através de uma variedade de sistemas e aplicações, desde plataformas de visão computacional até clusters de servidores e computadores pessoais. Construir PUFs com estratégias puramente em software é um processo desafiador, já que um PUF deve avaliar variações de processo sem afetar a performance do sistema, características que são mais facilmente alcançáceis em hardware. Nesse trabalho, apresentamos o G-PUF, uma tecnologia de PUF Intrínseco rodando puramente em CUDA. A solução proposta mapeia a distribuição de soft-errors em multiplicações de matrizes, enquanto a GPU opera em condições adversas como overclock e subalimentação. O mapa de erros resultante será único para cada GPU, e utilizando um novo algorítmo para a extração de pares de desafio-resposta, o G-PUF consegue extrair chaves seguras e a identidade do dispositivo sem revelar informações sobre a sua aleatoriedade. O sistema foi testado em condições reais e não requer nenhuma modificação para um sistema de GPU já em operação. G-PUF foi capaz de alcançar uma reliability de até 94.73% sem utilizar nenhum código de correção de erros e pode prover até 253 pares de desafio-resposta únicos

    Annual Report, 2015-2016

    Get PDF

    Veröffentlichungen und Vorträge 2009 der Mitglieder der Fakultät für Informatik

    Get PDF

    Cloud Radio Access Network architecture. Towards 5G mobile networks

    Get PDF
    corecore