46 research outputs found

    An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics

    Full text link
    Near-sensor data analytics is a promising direction for IoT endpoints, as it minimizes energy spent on communication and reduces network load - but it also poses security concerns, as valuable data is stored or sent over the network at various stages of the analytics pipeline. Using encryption to protect sensitive data at the boundary of the on-chip analytics engine is a way to address data security issues. To cope with the combined workload of analytics and encryption in a tight power envelope, we propose Fulmine, a System-on-Chip based on a tightly-coupled multi-core cluster augmented with specialized blocks for compute-intensive data processing and encryption functions, supporting software programmability for regular computing tasks. The Fulmine SoC, fabricated in 65nm technology, consumes less than 20mW on average at 0.8V achieving an efficiency of up to 70pJ/B in encryption, 50pJ/px in convolution, or up to 25MIPS/mW in software. As a strong argument for real-life flexible application of our platform, we show experimental results for three secure analytics use cases: secure autonomous aerial surveillance with a state-of-the-art deep CNN consuming 3.16pJ per equivalent RISC op; local CNN-based face detection with secured remote recognition in 5.74pJ/op; and seizure detection with encrypted data collection from EEG within 12.7pJ/op.Comment: 15 pages, 12 figures, accepted for publication to the IEEE Transactions on Circuits and Systems - I: Regular Paper

    IXIAM: ISA EXtension for Integrated Accelerator Management

    Get PDF
    During the last few years, hardware accelerators have been gaining popularity thanks to their ability to achieve higher performance and efficiency than classic general-purpose solutions. They are fundamentally shaping the current generations of Systems-on-Chip (SoCs), which are becoming increasingly heterogeneous. However, despite their widespread use, a standard, general solution to manage them while providing speed and consistency has not yet been found. Common methodologies rely on OS mediation and a mix of user-space and kernel-space drivers, which can be inefficient, especially for fine-grained tasks. This paper addresses these sources of inefficiencies by proposing an ISA eXtension for Integrated Accelerator Management (IXIAM), a cost-effective HW-SW framework to control a wide variety of accelerators in a standard way, and directly from the cores. The proposed instructions include reservation, work offloading, data transfer, and synchronization. They can be wrapped in a high-level software API or even integrated into a compiler. IXIAM features also a user-space interrupt mechanism to signal events directly to the user process. We implement it as a RISC-V extension in the gem5 simulator and demonstrate detailed support for complex accelerators, as well as the ability to specify sequences of memory transfers and computations directly from the ISA and with significantly lower overhead than driver-based schemes. IXIAM provides a performance advantage that is more evident for small and medium workloads, reaching around 90x in the best case. This way, we enlarge the set of workloads that would benefit from hardware acceleration

    Remote Attacks on FPGA Hardware

    Get PDF
    Immer mehr Computersysteme sind weltweit miteinander verbunden und über das Internet zugänglich, was auch die Sicherheitsanforderungen an diese erhöht. Eine neuere Technologie, die zunehmend als Rechenbeschleuniger sowohl für eingebettete Systeme als auch in der Cloud verwendet wird, sind Field-Programmable Gate Arrays (FPGAs). Sie sind sehr flexible Mikrochips, die per Software konfiguriert und programmiert werden können, um beliebige digitale Schaltungen zu implementieren. Wie auch andere integrierte Schaltkreise basieren FPGAs auf modernen Halbleitertechnologien, die von Fertigungstoleranzen und verschiedenen Laufzeitschwankungen betroffen sind. Es ist bereits bekannt, dass diese Variationen die Zuverlässigkeit eines Systems beeinflussen, aber ihre Auswirkungen auf die Sicherheit wurden nicht umfassend untersucht. Diese Doktorarbeit befasst sich mit einem Querschnitt dieser Themen: Sicherheitsprobleme die dadurch entstehen wenn FPGAs von mehreren Benutzern benutzt werden, oder über das Internet zugänglich sind, in Kombination mit physikalischen Schwankungen in modernen Halbleitertechnologien. Der erste Beitrag in dieser Arbeit identifiziert transiente Spannungsschwankungen als eine der stärksten Auswirkungen auf die FPGA-Leistung und analysiert experimentell wie sich verschiedene Arbeitslasten des FPGAs darauf auswirken. In der restlichen Arbeit werden dann die Auswirkungen dieser Spannungsschwankungen auf die Sicherheit untersucht. Die Arbeit zeigt, dass verschiedene Angriffe möglich sind, von denen früher angenommen wurde, dass sie physischen Zugriff auf den Chip und die Verwendung spezieller und teurer Test- und Messgeräte erfordern. Dies zeigt, dass bekannte Isolationsmaßnahmen innerhalb FPGAs von böswilligen Benutzern umgangen werden können, um andere Benutzer im selben FPGA oder sogar das gesamte System anzugreifen. Unter Verwendung von Schaltkreisen zur Beeinflussung der Spannung innerhalb eines FPGAs zeigt diese Arbeit aktive Angriffe, die Fehler (Faults) in anderen Teilen des Systems verursachen können. Auf diese Weise sind Denial-of-Service Angriffe möglich, als auch Fault-Angriffe um geheime Schlüsselinformationen aus dem System zu extrahieren. Darüber hinaus werden passive Angriffe gezeigt, die indirekt die Spannungsschwankungen auf dem Chip messen. Diese Messungen reichen aus, um geheime Schlüsselinformationen durch Power Analysis Seitenkanalangriffe zu extrahieren. In einer weiteren Eskalationsstufe können sich diese Angriffe auch auf andere Chips auswirken die an dasselbe Netzteil angeschlossen sind wie der FPGA. Um zu beweisen, dass vergleichbare Angriffe nicht nur innerhalb FPGAs möglich sind, wird gezeigt, dass auch kleine IoT-Geräte anfällig für Angriffe sind welche die gemeinsame Spannungsversorgung innerhalb eines Chips ausnutzen. Insgesamt zeigt diese Arbeit, dass grundlegende physikalische Variationen in integrierten Schaltkreisen die Sicherheit eines gesamten Systems untergraben können, selbst wenn der Angreifer keinen direkten Zugriff auf das Gerät hat. Für FPGAs in ihrer aktuellen Form müssen diese Probleme zuerst gelöst werden, bevor man sie mit mehreren Benutzern oder mit Zugriff von Drittanbietern sicher verwenden kann. In Veröffentlichungen die nicht Teil dieser Arbeit sind wurden bereits einige erste Gegenmaßnahmen untersucht

    HyperFPGA: SoC-FPGA Cluster Architecture for Supercomputing and Scientific applications

    Get PDF
    Since their inception, supercomputers have addressed problems that far exceed those of a single computing device. Modern supercomputers are made up of tens of thousands of CPUs and GPUs in racks that are interconnected via elaborate and most of the time ad hoc networks. These large facilities provide scientists with unprecedented and ever-growing computing power capable of tackling more complex and larger problems. In recent years, the most powerful supercomputers have already reached megawatt power consumption levels, an important issue that challenges sustainability and shows the impossibility of maintaining this trend. With more pressure on energy efficiency, an alternative to traditional architectures is needed. Reconfigurable hardware, such as FPGAs, has repeatedly been shown to offer substantial advantages over the traditional supercomputing approach with respect to performance and power consumption. In fact, several works that advanced the field of heterogeneous supercomputing using FPGAs are described in this thesis \cite{survey-2002}. Each cluster and its architectural characteristics can be studied from three interconnected domains: network, hardware, and software tools, resulting in intertwined challenges that designers must take into account. The classification and study of the architectures illustrate the trade-offs of the solutions and help identify open problems and research lines, which in turn served as inspiration and background for the HyperFPGA. In this thesis, the HyperFPGA cluster is presented as a way to build scalable SoC-FPGA platforms to explore new architectures for improved performance and energy efficiency in high-performance computing, focusing on flexibility and openness. The HyperFPGA is a modular platform based on a SoM that includes power monitoring tools with high-speed general-purpose interconnects to offer a great level of flexibility and introspection. By exploiting the reconfigurability and programmability offered by the HyperFPGA infrastructure, which combines FPGAs and CPUs, with high-speed general-purpose connectors, novel computing paradigms can be implemented. A custom Linux OS and drivers, along with a custom script for hardware definition, provide a uniform interface from application to platform for a programmable framework that integrates existing tools. The development environment is demonstrated using the N-Queens problem, which is a classic benchmark for evaluating the performance of parallel computing systems. Overall, the results of the HyperFPGA using the N-Queens problem highlight the platform's ability to handle computationally intensive tasks and demonstrate its suitability for its use in supercomputing experiments.Since their inception, supercomputers have addressed problems that far exceed those of a single computing device. Modern supercomputers are made up of tens of thousands of CPUs and GPUs in racks that are interconnected via elaborate and most of the time ad hoc networks. These large facilities provide scientists with unprecedented and ever-growing computing power capable of tackling more complex and larger problems. In recent years, the most powerful supercomputers have already reached megawatt power consumption levels, an important issue that challenges sustainability and shows the impossibility of maintaining this trend. With more pressure on energy efficiency, an alternative to traditional architectures is needed. Reconfigurable hardware, such as FPGAs, has repeatedly been shown to offer substantial advantages over the traditional supercomputing approach with respect to performance and power consumption. In fact, several works that advanced the field of heterogeneous supercomputing using FPGAs are described in this thesis \cite{survey-2002}. Each cluster and its architectural characteristics can be studied from three interconnected domains: network, hardware, and software tools, resulting in intertwined challenges that designers must take into account. The classification and study of the architectures illustrate the trade-offs of the solutions and help identify open problems and research lines, which in turn served as inspiration and background for the HyperFPGA. In this thesis, the HyperFPGA cluster is presented as a way to build scalable SoC-FPGA platforms to explore new architectures for improved performance and energy efficiency in high-performance computing, focusing on flexibility and openness. The HyperFPGA is a modular platform based on a SoM that includes power monitoring tools with high-speed general-purpose interconnects to offer a great level of flexibility and introspection. By exploiting the reconfigurability and programmability offered by the HyperFPGA infrastructure, which combines FPGAs and CPUs, with high-speed general-purpose connectors, novel computing paradigms can be implemented. A custom Linux OS and drivers, along with a custom script for hardware definition, provide a uniform interface from application to platform for a programmable framework that integrates existing tools. The development environment is demonstrated using the N-Queens problem, which is a classic benchmark for evaluating the performance of parallel computing systems. Overall, the results of the HyperFPGA using the N-Queens problem highlight the platform's ability to handle computationally intensive tasks and demonstrate its suitability for its use in supercomputing experiments

    A quantum-resistant advanced metering infrastructure

    Get PDF
    This dissertation focuses on discussing and implementing a Quantum-Resistant Advanced Metering Infrastructure (QR-AMI) that employs quantum-resistant asymmetric and symmetric cryptographic schemes to withstand attacks from both quantum and classical computers. The proposed solution involves the integration of Quantum-Resistant Dedicated Cryptographic Modules (QR-DCMs) within Smart Meters (SMs). These QR-DCMs are designed to embed quantum-resistant cryptographic schemes suitable for AMI applications. In this sense, it investigates quantum-resistant asymmetric cryptographic schemes based on strong cryptographic principles and a lightweight approach for AMIs. In addition, it examines the practical deployment of quantum-resistant schemes in QR-AMIs. Two candidates from the National Institute of Standards and Technology (NIST) post-quantum cryptography (PQC) standardization process, FrodoKEM and CRYSTALS-Kyber, are assessed due to their adherence to strong cryptographic principles and lightweight approach. The feasibility of embedding these schemes within QRDCMs in an AMI context is evaluated through software implementations on low-cost hardware, such as microcontroller and processor, and hardware/software co-design implementations using System-on-a-Chip (SoC) devices with Field-Programmable Gate Array (FPGA) components. Experimental results show that the execution time for FrodoKEM and CRYSTALS-Kyber schemes on SoC FPGA devices is at least one-third faster than software implementations. Furthermore, the achieved execution time and resource usage demonstrate the viability of these schemes for AMI applications. The CRYSTALS-Kyber scheme appears to be a superior choice in all scenarios, except when strong cryptographic primitives are necessitated, at least theoretically. Due to the lack of off-the-shelf SMs supporting quantum-resistant asymmetric cryptographic schemes, a QRDCM embedding quantum-resistant scheme is implemented and evaluated. Regarding hardware selection for QR-DCMs, microcontrollers are preferable in situations requiring reduced processing power, while SoC FPGA devices are better suited for those demanding high processing power. The resource usage and execution time outcomes demonstrate the feasibility of implementing AMI based on QR-DCMs (i.e., QR-AMI) using microcontrollers or SoC FPGA devices.Esta tese de doutorado foca na discussão e implementação de uma Infraestrutura de Medição Avançada com Resistência Quântica (do inglês, Quantum-Resistant Advanced Metering Infrastructure - QR-AMI), que emprega esquemas criptográficos assimétricos e simétricos com resistência quântica para suportar ataques proveniente tanto de computadores quânticos, como clássicos. A solução proposta envolve a integração de um Módulo Criptográfico Dedicado com Resistência Quântica (do inglês, Quantum-Resistant Dedicated Cryptographic Modules - QR-DCMs) com Medidores Inteligentes (do inglês, Smart Meter - SM). Os QR-DCMs são projetados para embarcar esquemas criptográficos com resistência quântica adequados para aplicação em AMI. Nesse sentido, é investigado esquemas criptográficos assimétricos com resistência quântica baseado em fortes princípios criptográficos e abordagem com baixo uso de recursos para AMIs. Além disso, é analisado a implantação prática de um esquema com resistência quântica em QR-AMIs. Dois candidatos do processo de padronização da criptografia pós-quântica (do inglês, post-quantum cryptography - PQC) do Instituto Nacional de Padrões e Tecnologia (do inglês, National Institute of Standards and Technology - NIST), FrodoKEM e CRYSTALS-Kyber, são avaliados devido à adesão a fortes princípios criptográficos e abordagem com baixo uso de recursos. A viabilidade de embarcar esses esquemas em QR-DCMs em um contexto de AMI é avaliado por meio de implementação em software em hardwares de baixo custo, como um microcontrolador e processador, e implementações conjunta hardware/software usando um sistema em um chip (do inglês, System-on-a-Chip - SoC) com Arranjo de Porta Programável em Campo (do inglês, Field-Programmable Gate Array - FPGA). Resultados experimentais mostram que o tempo de execução para os esquemas FrodoKEM e CRYSTALSKyber em dispositivos SoC FPGA é, ao menos, um terço mais rápido que implementações em software. Além disso, os tempos de execuções atingidos e o uso de recursos demonstram a viabilidade desses esquemas para aplicações em AMI. O esquema CRYSTALS-Kyber parece ser uma escolha superior em todos os cenários, exceto quando fortes primitivas criptográficas são necessárias, ao menos teoricamente. Devido à falta de SMs no mercado que suportem esquemas criptográficos assimétricos com resistência quântica, um QR-DCM embarcando esquemas com resistência quântica é implementado e avaliado. Quanto à escolha do hardware para os QR-DCMs, microcontroladores são preferíveis em situações que requerem poder de processamento reduzido, enquanto dispositivos SoC FPGA são mais adequados para quando é demandado maior poder de processamento. O uso de recurso e o resultado do tempo de execução demonstram a viabilidade da implementação de AMI baseada em QR-DCMs, ou seja, uma QR-AMI, usando microcontroladores e dispositivos SoC FPGA

    A Comprehensive Survey on Non-Invasive Fault Injection Attacks

    Get PDF
    Non-invasive fault injection attacks have emerged as significant threats to a spectrum of microelectronic systems ranging from commodity devices to high-end customized processors. Unlike their invasive counterparts, these attacks are more affordable and can exploit system vulnerabilities without altering the hardware physically. Furthermore, certain non-invasive fault injection strategies allow for remote vulnerability exploitation without the requirement of physical proximity. However, existing studies lack extensive investigation into these attacks across diverse target platforms, threat models, emerging attack strategies, assessment frameworks, and mitigation approaches. In this paper, we provide a comprehensive overview of contemporary research on non-invasive fault injection attacks. Our objective is to consolidate and scrutinize the various techniques, methodologies, target systems susceptible to the attacks, and existing mitigation mechanisms advanced by the research community. Besides, we categorize attack strategies based on several aspects, present a detailed comparison among the categories, and highlight research challenges with future direction. By underlining and discussing the landscape of cutting-edge, non-invasive fault injection, we hope more researchers, designers, and security professionals examine the attacks further and take such threats into consideration while developing effective countermeasures

    GPU-based Private Information Retrieval for On-Device Machine Learning Inference

    Full text link
    On-device machine learning (ML) inference can enable the use of private user data on user devices without revealing them to remote servers. However, a pure on-device solution to private ML inference is impractical for many applications that rely on embedding tables that are too large to be stored on-device. In particular, recommendation models typically use multiple embedding tables each on the order of 1-10 GBs of data, making them impractical to store on-device. To overcome this barrier, we propose the use of private information retrieval (PIR) to efficiently and privately retrieve embeddings from servers without sharing any private information. As off-the-shelf PIR algorithms are usually too computationally intensive to directly use for latency-sensitive inference tasks, we 1) propose novel GPU-based acceleration of PIR, and 2) co-design PIR with the downstream ML application to obtain further speedup. Our GPU acceleration strategy improves system throughput by more than 20×20 \times over an optimized CPU PIR implementation, and our PIR-ML co-design provides an over 5×5 \times additional throughput improvement at fixed model quality. Together, for various on-device ML applications such as recommendation and language modeling, our system on a single V100 GPU can serve up to 100,000100,000 queries per second -- a >100×>100 \times throughput improvement over a CPU-based baseline -- while maintaining model accuracy

    Optimización del rendimiento y la eficiencia energética en sistemas masivamente paralelos

    Get PDF
    RESUMEN Los sistemas heterogéneos son cada vez más relevantes, debido a sus capacidades de rendimiento y eficiencia energética, estando presentes en todo tipo de plataformas de cómputo, desde dispositivos embebidos y servidores, hasta nodos HPC de grandes centros de datos. Su complejidad hace que sean habitualmente usados bajo el paradigma de tareas y el modelo de programación host-device. Esto penaliza fuertemente el aprovechamiento de los aceleradores y el consumo energético del sistema, además de dificultar la adaptación de las aplicaciones. La co-ejecución permite que todos los dispositivos cooperen para computar el mismo problema, consumiendo menos tiempo y energía. No obstante, los programadores deben encargarse de toda la gestión de los dispositivos, la distribución de la carga y la portabilidad del código entre sistemas, complicando notablemente su programación. Esta tesis ofrece contribuciones para mejorar el rendimiento y la eficiencia energética en estos sistemas masivamente paralelos. Se realizan propuestas que abordan objetivos generalmente contrapuestos: se mejora la usabilidad y la programabilidad, a la vez que se garantiza una mayor abstracción y extensibilidad del sistema, y al mismo tiempo se aumenta el rendimiento, la escalabilidad y la eficiencia energética. Para ello, se proponen dos motores de ejecución con enfoques completamente distintos. EngineCL, centrado en OpenCL y con una API de alto nivel, favorece la máxima compatibilidad entre todo tipo de dispositivos y proporciona un sistema modular extensible. Su versatilidad permite adaptarlo a entornos para los que no fue concebido, como aplicaciones con ejecuciones restringidas por tiempo o simuladores HPC de dinámica molecular, como el utilizado en un centro de investigación internacional. Considerando las tendencias industriales y enfatizando la aplicabilidad profesional, CoexecutorRuntime proporciona un sistema flexible centrado en C++/SYCL que dota de soporte a la co-ejecución a la tecnología oneAPI. Este runtime acerca a los programadores al dominio del problema, posibilitando la explotación de estrategias dinámicas adaptativas que mejoran la eficiencia en todo tipo de aplicaciones.ABSTRACT Heterogeneous systems are becoming increasingly relevant, due to their performance and energy efficiency capabilities, being present in all types of computing platforms, from embedded devices and servers to HPC nodes in large data centers. Their complexity implies that they are usually used under the task paradigm and the host-device programming model. This strongly penalizes accelerator utilization and system energy consumption, as well as making it difficult to adapt applications. Co-execution allows all devices to simultaneously compute the same problem, cooperating to consume less time and energy. However, programmers must handle all device management, workload distribution and code portability between systems, significantly complicating their programming. This thesis offers contributions to improve performance and energy efficiency in these massively parallel systems. The proposals address the following generally conflicting objectives: usability and programmability are improved, while ensuring enhanced system abstraction and extensibility, and at the same time performance, scalability and energy efficiency are increased. To achieve this, two runtime systems with completely different approaches are proposed. EngineCL, focused on OpenCL and with a high-level API, provides an extensible modular system and favors maximum compatibility between all types of devices. Its versatility allows it to be adapted to environments for which it was not originally designed, including applications with time-constrained executions or molecular dynamics HPC simulators, such as the one used in an international research center. Considering industrial trends and emphasizing professional applicability, CoexecutorRuntime provides a flexible C++/SYCL-based system that provides co-execution support for oneAPI technology. This runtime brings programmers closer to the problem domain, enabling the exploitation of dynamic adaptive strategies that improve efficiency in all types of applications.Funding: This PhD has been supported by the Spanish Ministry of Education (FPU16/03299 grant), the Spanish Science and Technology Commission under contracts TIN2016-76635-C2-2-R and PID2019-105660RB-C22. This work has also been partially supported by the Mont-Blanc 3: European Scalable and Power Efficient HPC Platform based on Low-Power Embedded Technology project (G.A. No. 671697) from the European Union’s Horizon 2020 Research and Innovation Programme (H2020 Programme). Some activities have also been funded by the Spanish Science and Technology Commission under contract TIN2016-81840-REDT (CAPAP-H6 network). The Integration II: Hybrid programming models of Chapter 4 has been partially performed under the Project HPC-EUROPA3 (INFRAIA-2016-1-730897), with the support of the EC Research Innovation Action under the H2020 Programme. In particular, the author gratefully acknowledges the support of the SPMT Department of the High Performance Computing Center Stuttgart (HLRS)

    On Information-centric Resiliency and System-level Security in Constrained, Wireless Communication

    Get PDF
    The Internet of Things (IoT) interconnects many heterogeneous embedded devices either locally between each other, or globally with the Internet. These things are resource-constrained, e.g., powered by battery, and typically communicate via low-power and lossy wireless links. Communication needs to be secured and relies on crypto-operations that are often resource-intensive and in conflict with the device constraints. These challenging operational conditions on the cheapest hardware possible, the unreliable wireless transmission, and the need for protection against common threats of the inter-network, impose severe challenges to IoT networks. In this thesis, we advance the current state of the art in two dimensions. Part I assesses Information-centric networking (ICN) for the IoT, a network paradigm that promises enhanced reliability for data retrieval in constrained edge networks. ICN lacks a lower layer definition, which, however, is the key to enable device sleep cycles and exclusive wireless media access. This part of the thesis designs and evaluates an effective media access strategy for ICN to reduce the energy consumption and wireless interference on constrained IoT nodes. Part II examines the performance of hardware and software crypto-operations, executed on off-the-shelf IoT platforms. A novel system design enables the accessibility and auto-configuration of crypto-hardware through an operating system. One main focus is the generation of random numbers in the IoT. This part of the thesis further designs and evaluates Physical Unclonable Functions (PUFs) to provide novel randomness sources that generate highly unpredictable secrets, on low-cost devices that lack hardware-based security features. This thesis takes a practical view on the constrained IoT and is accompanied by real-world implementations and measurements. We contribute open source software, automation tools, a simulator, and reproducible measurement results from real IoT deployments using off-the-shelf hardware. The large-scale experiments in an open access testbed provide a direct starting point for future research
    corecore