    Firmware and gateway for the ACE1 reconfigurable accelerator card

    This thesis describes the continued work on the in-house designed FPGA based co-processor daughtercard referred to as ACE1. The aim: to create an ecosystem incorporating firmware, bootstrapping code, drivers and a development environment to create a seamless environment. Challenges in setting up and debugging the interface that connects the coprocessor daughtercard to the host server include: problems with the power network, the edge connectors and timing problems with the primary protocol which prevented host-based communications. The options include allowing the daughtercard to function in a stand-alone fashion and we present a gateware solution that allows users to select from a number of alternatives for each of the layers in the Open Systems Interconnect networking model

    Next Generation UAS Based Spectral Systems for Environmental Monitoring

    This presentation provides information on the development of a small Unmanned Aerial System(UAS) with a low power, high performance Intelligent Payload Module (IPM) and a hyperspectral imager to enable intelligent gathering of science grade vegetation data over agricultural fields at about 150 ft. The IPM performs real time data processing over the image data and then enables the navigation system to move the UAS to locations where measurements are optimal for science. This is important because the small UAS typically has about 30 minutes of battery power and therefore over large agricultural fields, resource utilization efficiency is important. The key innovation is the shrinking of the IPM and the cross communication with the navigation software to allow the data processing to interact with desired way points while using Field Programmable Gate Arrays to enable high performance on large data volumes produced by the hyperspectral imager

    Leveraging NFV heterogeneity at the network edge

    With network function virtualisation (NFV) and network programmability, network functions (NFs) such as firewalls, traffic load balancers, content filters, and intrusion detection systems (IDS) are virtualized and either instantiated on user space hosts using virtual machines (VMs), lightweight containers, or in the network data plane using programmable switching technology such as P4 or offloaded onto Smart network interface cards (NICs) – often chained together to create a service function chain (SFC), based on defined service level agreement (SLA). The need to leverage heterogeneous programmable platforms to support the in-network acceleration of functions keeps growing as emerging use cases come with peculiar requirements. This thesis identifies various heterogeneous frameworks for deploying virtual network functions that network operators can leverage in service provider networks. A novel taxonomy that provides network operators and the wider research community valuable insights is proposed. The thesis presents the performance gains obtained from using heterogeneous frameworks for deploying virtual network functions using real testbeds. In addition, this thesis investigates the optimal placement of vNFs over the distributed edge network while considering the heterogeneity of packet processing elements. In particular, the work questions the status quo of how vNFs are currently being deployed, i.e., the lack of frameworks to support the seamless deployment of vNFs that are implemented on diverse packet processing platforms – leveraging the capability of the programmable network data plane. In response, the thesis presents a novel integer linear programming (ILP) model for the hybrid placement of diverse network functions that leverages the heterogeneity of the network data plane and the abundant processing capability of user space hosts, with the objective function of minimizing end-to-end latency for vNF placement. A novel hybrid placement heuristic algorithm, HYPHA, is also proposed to find a quick, efficient solution to the hybrid vNF placement problem. Using optimal stopping theory (OST) principles, an optimal placement scheduling model is presented to handle dynamic edge placement scenarios. The results in this work demonstrate that employing a hybrid deployment scheme that leverages the processing capability of the network data plane yields minimal user-tovNF latency and overall end-to-end latency while fulfilling the placement of a diverse set of user requests from emerging use cases to speed up service delivery by network operators. The results also show that network operators can leverage the high-speed, low-latency feature of data plane packet processing elements for hosting delay-sensitive applications and improving service delivery for subscribed users. It is shown that the proposed hybrid heuristic algorithm can obtain near-optimal vNF mapping while incurring fewer latency threshold violations set by network operators. Furthermore, in addition to emerging edge use cases, the placement solution presented in this thesis can be adapted to place network functions efficiently in core network infrastructure while leveraging the heterogeneity of servers. The dynamic placement scheduler also minimises the number of latency violations and vNF migrations between heterogeneous hosts based on SLAs set by network operators

    NFV Based Gateways for Virtualized Wireless Sensors Networks: A Case Study

    Virtualization enables the sharing of a same wireless sensor network (WSN) by multiple applications. However, in heterogeneous environments, virtualized wireless sensor networks (VWSN) raises new challenges such as the need for on-the-fly, dynamic, elastic and scalable provisioning of gateways. Network Functions Virtualization (NFV) is an emerging paradigm that can certainly aid in tackling these new challenges. It leverages standard virtualization technology to consolidate special-purpose network elements on top of commodity hardware. This article presents a case study on NFV based gateways for VWSNs. In the study, a VWSN gateway provider, operates and manages an NFV based infrastructure. We use two different brands of wireless sensors. The NFV infrastructure makes possible the dynamic, elastic and scalable deployment of gateway modules in this heterogeneous VWSN environment. The prototype built with Openstack as platform is described

    Performance Comparison of 3D Sinc Interpolation for fMRI Motion Correction by Language of Implementation and Hardware Platform

    Substantial effort is devoted to improving neuroimaging data processing; this effort however, is typically from the algorithmic perspective only. I demonstrate that substantive running time performance improvements to neuroscientific data processing algorithms can be realized by considering their implementation. Focusing specifically on 3D sinc interpolation, an algorithm used for processing functional magnetic resonance imaging (fMRI) data, I compare the performance of Python, C and OpenCL implementations of this algorithm across multiple hardware platforms. I also benchmark the performance of a novel implementation of 3D sinc interpolation on a field programmable gate array (FPGA). Together, these comparisons demonstrate that the performance of a neuroimaging data processing algorithm is significantly impacted by its implementation. I also present a case study demonstrating the practical benefits of improving a neuroscientific data processing algorithm\u27s implementation, then conclude by addressing threats to the validity of the study and discussing future directions

    Optimización del rendimiento y la eficiencia energética en sistemas masivamente paralelos

    RESUMEN Los sistemas heterogéneos son cada vez más relevantes, debido a sus capacidades de rendimiento y eficiencia energética, estando presentes en todo tipo de plataformas de cómputo, desde dispositivos embebidos y servidores, hasta nodos HPC de grandes centros de datos. Su complejidad hace que sean habitualmente usados bajo el paradigma de tareas y el modelo de programación host-device. Esto penaliza fuertemente el aprovechamiento de los aceleradores y el consumo energético del sistema, además de dificultar la adaptación de las aplicaciones. La co-ejecución permite que todos los dispositivos cooperen para computar el mismo problema, consumiendo menos tiempo y energía. No obstante, los programadores deben encargarse de toda la gestión de los dispositivos, la distribución de la carga y la portabilidad del código entre sistemas, complicando notablemente su programación. Esta tesis ofrece contribuciones para mejorar el rendimiento y la eficiencia energética en estos sistemas masivamente paralelos. Se realizan propuestas que abordan objetivos generalmente contrapuestos: se mejora la usabilidad y la programabilidad, a la vez que se garantiza una mayor abstracción y extensibilidad del sistema, y al mismo tiempo se aumenta el rendimiento, la escalabilidad y la eficiencia energética. Para ello, se proponen dos motores de ejecución con enfoques completamente distintos. EngineCL, centrado en OpenCL y con una API de alto nivel, favorece la máxima compatibilidad entre todo tipo de dispositivos y proporciona un sistema modular extensible. Su versatilidad permite adaptarlo a entornos para los que no fue concebido, como aplicaciones con ejecuciones restringidas por tiempo o simuladores HPC de dinámica molecular, como el utilizado en un centro de investigación internacional. Considerando las tendencias industriales y enfatizando la aplicabilidad profesional, CoexecutorRuntime proporciona un sistema flexible centrado en C++/SYCL que dota de soporte a la co-ejecución a la tecnología oneAPI. Este runtime acerca a los programadores al dominio del problema, posibilitando la explotación de estrategias dinámicas adaptativas que mejoran la eficiencia en todo tipo de aplicaciones.ABSTRACT Heterogeneous systems are becoming increasingly relevant, due to their performance and energy efficiency capabilities, being present in all types of computing platforms, from embedded devices and servers to HPC nodes in large data centers. Their complexity implies that they are usually used under the task paradigm and the host-device programming model. This strongly penalizes accelerator utilization and system energy consumption, as well as making it difficult to adapt applications. Co-execution allows all devices to simultaneously compute the same problem, cooperating to consume less time and energy. However, programmers must handle all device management, workload distribution and code portability between systems, significantly complicating their programming. This thesis offers contributions to improve performance and energy efficiency in these massively parallel systems. The proposals address the following generally conflicting objectives: usability and programmability are improved, while ensuring enhanced system abstraction and extensibility, and at the same time performance, scalability and energy efficiency are increased. To achieve this, two runtime systems with completely different approaches are proposed. EngineCL, focused on OpenCL and with a high-level API, provides an extensible modular system and favors maximum compatibility between all types of devices. Its versatility allows it to be adapted to environments for which it was not originally designed, including applications with time-constrained executions or molecular dynamics HPC simulators, such as the one used in an international research center. Considering industrial trends and emphasizing professional applicability, CoexecutorRuntime provides a flexible C++/SYCL-based system that provides co-execution support for oneAPI technology. This runtime brings programmers closer to the problem domain, enabling the exploitation of dynamic adaptive strategies that improve efficiency in all types of applications.Funding: This PhD has been supported by the Spanish Ministry of Education (FPU16/03299 grant), the Spanish Science and Technology Commission under contracts TIN2016-76635-C2-2-R and PID2019-105660RB-C22. This work has also been partially supported by the Mont-Blanc 3: European Scalable and Power Efficient HPC Platform based on Low-Power Embedded Technology project (G.A. No. 671697) from the European Union’s Horizon 2020 Research and Innovation Programme (H2020 Programme). Some activities have also been funded by the Spanish Science and Technology Commission under contract TIN2016-81840-REDT (CAPAP-H6 network). The Integration II: Hybrid programming models of Chapter 4 has been partially performed under the Project HPC-EUROPA3 (INFRAIA-2016-1-730897), with the support of the EC Research Innovation Action under the H2020 Programme. In particular, the author gratefully acknowledges the support of the SPMT Department of the High Performance Computing Center Stuttgart (HLRS)