157 research outputs found

    Tiny Machine Learning Environment: Enabling Intelligence on Constrained Devices

    Get PDF
    Running machine learning algorithms (ML) on constrained devices at the extreme edge of the network is problematic due to the computational overhead of ML algorithms, available resources on the embedded platform, and application budget (i.e., real-time requirements, power constraints, etc.). This required the development of specific solutions and development tools for what is now referred to as TinyML. In this dissertation, we focus on improving the deployment and performance of TinyML applications, taking into consideration the aforementioned challenges, especially memory requirements. This dissertation contributed to the construction of the Edge Learning Machine environment (ELM), a platform-independent open-source framework that provides three main TinyML services, namely shallow ML, self-supervised ML, and binary deep learning on constrained devices. In this context, this work includes the following steps, which are reflected in the thesis structure. First, we present the performance analysis of state-of-the-art shallow ML algorithms including dense neural networks, implemented on mainstream microcontrollers. The comprehensive analysis in terms of algorithms, hardware platforms, datasets, preprocessing techniques, and configurations shows similar performance results compared to a desktop machine and highlights the impact of these factors on overall performance. Second, despite the assumption that TinyML only permits models inference provided by the scarcity of resources, we have gone a step further and enabled self-supervised on-device training on microcontrollers and tiny IoT devices by developing the Autonomous Edge Pipeline (AEP) system. AEP achieves comparable accuracy compared to the typical TinyML paradigm, i.e., models trained on resource-abundant devices and then deployed on microcontrollers. Next, we present the development of a memory allocation strategy for convolutional neural networks (CNNs) layers, that optimizes memory requirements. This approach reduces the memory footprint without affecting accuracy nor latency. Moreover, e-skin systems share the main requirements of the TinyML fields: enabling intelligence with low memory, low power consumption, and low latency. Therefore, we designed an efficient Tiny CNN architecture for e-skin applications. The architecture leverages the memory allocation strategy presented earlier and provides better performance than existing solutions. A major contribution of the thesis is given by CBin-NN, a library of functions for implementing extremely efficient binary neural networks on constrained devices. The library outperforms state of the art NN deployment solutions by drastically reducing memory footprint and inference latency. All the solutions proposed in this thesis have been implemented on representative devices and tested in relevant applications, of which results are reported and discussed. The ELM framework is open source, and this work is clearly becoming a useful, versatile toolkit for the IoT and TinyML research and development community

    Expectations and expertise in artificial intelligence: specialist views and historical perspectives on conceptualisation, promise, and funding

    Get PDF
    Artificial intelligence’s (AI) distinctiveness as a technoscientific field that imitates the ability to think went through a resurgence of interest post-2010, attracting a flood of scientific and popular expectations as to its utopian or dystopian transformative consequences. This thesis offers observations about the formation and dynamics of expectations based on documentary material from the previous periods of perceived AI hype (1960-1975 and 1980-1990, including in-between periods of perceived dormancy), and 25 interviews with UK-based AI specialists, directly involved with its development, who commented on the issues during the crucial period of uncertainty (2017-2019) and intense negotiation through which AI gained momentum prior to its regulation and relatively stabilised new rounds of long-term investment (2020-2021). This examination applies and contributes to longitudinal studies in the sociology of expectations (SoE) and studies of experience and expertise (SEE) frameworks, proposing a historical sociology of expertise and expectations framework. The research questions, focusing on the interplay between hype mobilisation and governance, are: (1) What is the relationship between AI practical development and the broader expectational environment, in terms of funding and conceptualisation of AI? (2) To what extent does informal and non-developer assessment of expectations influence formal articulations of foresight? (3) What can historical examinations of AI’s conceptual and promissory settings tell about the current rebranding of AI? The following contributions are made: (1) I extend SEE by paying greater attention to the interplay between technoscientific experts and wider collective arenas of discourse amongst non-specialists and showing how AI’s contemporary research cultures are overwhelmingly influenced by the hype environment but also contribute to it. This further highlights the interaction between competing rationales focusing on exploratory, curiosity-driven scientific research against exploitation-oriented strategies at formal and informal levels. (2) I suggest benefits of examining promissory environments in AI and related technoscientific fields longitudinally, treating contemporary expectations as historical products of sociotechnical trajectories through an authoritative historical reading of AI’s shifting conceptualisation and attached expectations as a response to availability of funding and broader national imaginaries. This comes with the benefit of better perceiving technological hype as migrating from social group to social group instead of fading through reductionist cycles of disillusionment; either by rebranding of technical operations, or by the investigation of a given field by non-technical practitioners. It also sensitises to critically examine broader social expectations as factors for shifts in perception about theoretical/basic science research transforming into applied technological fields. Finally, (3) I offer a model for understanding the significance of interplay between conceptualisations, promising, and motivations across groups within competing dynamics of collective and individual expectations and diverse sources of expertise

    Flexible Hardware-based Security-aware Mechanisms and Architectures

    Get PDF
    For decades, software security has been the primary focus in securing our computing platforms. Hardware was always assumed trusted, and inherently served as the foundation, and thus the root of trust, of our systems. This has been further leveraged in developing hardware-based dedicated security extensions and architectures to protect software from attacks exploiting software vulnerabilities such as memory corruption. However, the recent outbreak of microarchitectural attacks has shaken these long-established trust assumptions in hardware entirely, thereby threatening the security of all of our computing platforms and bringing hardware and microarchitectural security under scrutiny. These attacks have undeniably revealed the grave consequences of hardware/microarchitecture security flaws to the entire platform security, and how they can even subvert the security guarantees promised by dedicated security architectures. Furthermore, they shed light on the sophisticated challenges particular to hardware/microarchitectural security; it is more critical (and more challenging) to extensively analyze the hardware for security flaws prior to production, since hardware, unlike software, cannot be patched/updated once fabricated. Hardware cannot reliably serve as the root of trust anymore, unless we develop and adopt new design paradigms where security is proactively addressed and scrutinized across the full stack of our computing platforms, at all hardware design and implementation layers. Furthermore, novel flexible security-aware design mechanisms are required to be incorporated in processor microarchitecture and hardware-assisted security architectures, that can practically address the inherent conflict between performance and security by allowing that the trade-off is configured to adapt to the desired requirements. In this thesis, we investigate the prospects and implications at the intersection of hardware and security that emerge across the full stack of our computing platforms and System-on-Chips (SoCs). On one front, we investigate how we can leverage hardware and its advantages, in contrast to software, to build more efficient and effective security extensions that serve security architectures, e.g., by providing execution attestation and enforcement, to protect the software from attacks exploiting software vulnerabilities. We further propose that they are microarchitecturally configured at runtime to provide different types of security services, thus adapting flexibly to different deployment requirements. On another front, we investigate how we can protect these hardware-assisted security architectures and extensions themselves from microarchitectural and software attacks that exploit design flaws that originate in the hardware, e.g., insecure resource sharing in SoCs. More particularly, we focus in this thesis on cache-based side-channel attacks, where we propose sophisticated cache designs, that fundamentally mitigate these attacks, while still preserving performance by enabling that the performance security trade-off is configured by design. We also investigate how these can be incorporated into flexible and customizable security architectures, thus complementing them to further support a wide spectrum of emerging applications with different performance/security requirements. Lastly, we inspect our computing platforms further beneath the design layer, by scrutinizing how the actual implementation of these mechanisms is yet another potential attack surface. We explore how the security of hardware designs and implementations is currently analyzed prior to fabrication, while shedding light on how state-of-the-art hardware security analysis techniques are fundamentally limited, and the potential for improved and scalable approaches

    SafeBet: Secure, Simple, and Fast Speculative Execution

    Full text link
    Spectre attacks exploit microprocessor speculative execution to read and transmit forbidden data outside the attacker's trust domain and sandbox. Recent hardware schemes allow potentially-unsafe speculative accesses but prevent the secret's transmission by delaying most access-dependent instructions even in the predominantly-common, no-attack case, which incurs performance loss and hardware complexity. Instead, we propose SafeBet which allows only, and does not delay most, safe accesses, achieving both security and high performance. SafeBet is based on the key observation that speculatively accessing a destination location is safe if the location's access by the same static trust domain has been committed previously; and potentially unsafe, otherwise. We extend this observation to handle inter trust-domain code and data interactions. SafeBet employs the Speculative Memory Access Control Table (SMACT) to track non-speculative trust domain code region-destination pairs. Disallowed accesses wait until reaching commit to trigger well-known replay, with virtually no change to the pipeline. Software simulations using SpecCPU benchmarks show that SafeBet uses an 8.3-KB SMACT per core to perform within 6% on average (63% at worst) of the unsafe baseline behind which NDA-restrictive, a previous scheme of security and hardware complexity comparable to SafeBet's, lags by 83% on average

    Tiny Machine Learning Environment: Enabling Intelligence on Constrained Devices

    Get PDF
    Running machine learning algorithms (ML) on constrained devices at the extreme edge of the network is problematic due to the computational overhead of ML algorithms, available resources on the embedded platform, and application budget (i.e., real-time requirements, power constraints, etc.). This required the development of specific solutions and development tools for what is now referred to as TinyML. In this dissertation, we focus on improving the deployment and performance of TinyML applications, taking into consideration the aforementioned challenges, especially memory requirements. This dissertation contributed to the construction of the Edge Learning Machine environment (ELM), a platform-independent open source framework that provides three main TinyML services, namely shallow ML, self-supervised ML, and binary deep learning on constrained devices. In this context, this work includes the following steps, which are reflected in the thesis structure. First, we present the performance analysis of state of the art shallow ML algorithms including dense neural networks, implemented on mainstream microcontrollers. The comprehensive analysis in terms of algorithms, hardware platforms, datasets, pre-processing techniques, and configurations shows similar performance results compared to a desktop machine and highlights the impact of these factors on overall performance. Second, despite the assumption that TinyML only permits models inference provided by the scarcity of resources, we have gone a step further and enabled self-supervised on-device training on microcontrollers and tiny IoT devices by developing the Autonomous Edge Pipeline (AEP) system. AEP achieves comparable accuracy compared to the typical TinyML paradigm, i.e., models trained on resource-abundant devices and then deployed on microcontrollers. Next, we present the development of a memory allocation strategy for convolutional neural networks (CNNs) layers, that optimizes memory requirements. This approach reduces the memory footprint without affecting accuracy nor latency. Moreover, e-skin systems share the main requirements of the TinyML fields: enabling intelligence with low memory, low power consumption, and low latency. Therefore, we designed an efficient Tiny CNN architecture for e-skin applications. The architecture leverages the memory allocation strategy presented earlier and provides better performance than existing solutions. A major contribution of the thesis is given by CBin-NN, a library of functions for implementing extremely efficient binary neural networks on constrained devices. The library outperforms state of the art NN deployment solutions by drastically reducing memory footprint and inference latency. All the solutions proposed in this thesis have been implemented on representative devices and tested in relevant applications, of which results are reported and discussed. The ELM framework is open source, and this work is clearly becoming a useful, versatile toolkit for the IoT and TinyML research and development community

    Architectural Support for Optimizing Huge Page Selection Within the OS

    Get PDF
    © 2023 Copyright held by the owner/author(s). This document is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/ This document is the Accepted version of a Published Work that appeared in final form in 56th ACM/IEEE International Symposium on Microarchitecture (MICRO), Toronto, Canada. To access the final edited and published work see https://doi.org/10.1145/3613424.3614296Irregular, memory-intensive applications often incur high translation lookaside buffer (TLB) miss rates that result in significant address translation overheads. Employing huge pages is an effective way to reduce these overheads, however in real systems the number of available huge pages can be limited when system memory is nearly full and/or fragmented. Thus, huge pages must be used selectively to back application memory. This work demonstrates that choosing memory regions that incur the most TLB misses for huge page promotion best reduces address translation overheads. We call these regions High reUse TLB-sensitive data (HUBs). Unlike prior work which relies on expensive per-page software counters to identify promotion regions, we propose new architectural support to identify these regions dynamically at application runtime. We propose a promotion candidate cache (PCC) that identifies HUB candidates based on hardware page table walks after a lastlevel TLB miss. This small, fixed-size structure tracks huge pagealigned regions (consisting of base pages), ranks them based on observed page table walk frequency, and only keeps the most frequently accessed ones. Evaluated on applications of various memory intensity, our approach successfully identifies application pages incurring the highest address translation overheads. Our approach demonstrates that with the help of a PCC, the OS only needs to promote 4% of the application footprint to achieve more than 75% of the peak achievable performance, yielding 1.19-1.33× speedups over 4KB base pages alone. In real systems where memory is typically fragmented, the PCC outperforms Linux’s page promotion policy by 14% (when 50% of total memory is fragmented) and 16% (when 90% of total memory is fragmented) respectively

    Activity Report 2022

    Get PDF

    Towards the simulation of cooperative perception applications by leveraging distributed sensing infrastructures

    Get PDF
    With the rapid development of Automated Vehicles (AV), the boundaries of their function alities are being pushed and new challenges are being imposed. In increasingly complex and dynamic environments, it is fundamental to rely on more powerful onboard sensors and usually AI. However, there are limitations to this approach. As AVs are increasingly being integrated in several industries, expectations regarding their cooperation ability is growing, and vehicle-centric approaches to sensing and reasoning, become hard to integrate. The proposed approach is to extend perception to the environment, i.e. outside of the vehicle, by making it smarter, via the deployment of wireless sensors and actuators. This will vastly improve the perception capabilities in dynamic and unpredictable scenarios and often in a cheaper way, relying mostly in the use of lower cost sensors and embedded devices, which rely on their scale deployment instead of centralized sensing abilities. Consequently, to support the development and deployment of such cooperation actions in a seamless way, we require the usage of co-simulation frameworks, that can encompass multiple perspectives of control and communications for the AVs, the wireless sensors and actuators and other actors in the environment. In this work, we rely on ROS2 and micro-ROS as the underlying technologies for integrating several simulation tools, to construct a framework, capable of supporting the development, test and validation of such smart, cooperative environments. This endeavor was undertaken by building upon an existing simulation framework known as AuNa. We extended its capabilities to facilitate the simulation of cooperative scenarios by incorporat ing external sensors placed within the environment rather than just relying on vehicle-based sensors. Moreover, we devised a cooperative perception approach within this framework, showcasing its substantial potential and effectiveness. This will enable the demonstration of multiple cooperation scenarios and also ease the deployment phase by relying on the same software architecture.Com o rápido desenvolvimento dos Veículos Autónomos (AV), os limites das suas funcional idades estão a ser alcançados e novos desafios estão a surgir. Em ambientes complexos e dinâmicos, é fundamental a utilização de sensores de alta capacidade e, na maioria dos casos, inteligência artificial. Mas existem limitações nesta abordagem. Como os AVs estão a ser integrados em várias indústrias, as expectativas quanto à sua capacidade de cooperação estão a aumentar, e as abordagens de perceção e raciocínio centradas no veículo, tornam-se difíceis de integrar. A abordagem proposta consiste em extender a perceção para o ambiente, isto é, fora do veículo, tornando-a inteligente, através do uso de sensores e atuadores wireless. Isto irá melhorar as capacidades de perceção em cenários dinâmicos e imprevisíveis, reduzindo o custo, pois a abordagem será baseada no uso de sensores low-cost e sistemas embebidos, que dependem da sua implementação em grande escala em vez da capacidade de perceção centralizada. Consequentemente, para apoiar o desenvolvimento e implementação destas ações em cooperação, é necessária a utilização de frameworks de co-simulação, que abranjam múltiplas perspetivas de controlo e comunicação para os AVs, sensores e atuadores wireless, e outros atores no ambiente. Neste trabalho será utilizado ROS2 e micro-ROS como as tecnologias subjacentes para a integração das ferramentas de simulação, de modo a construir uma framework capaz de apoiar o desenvolvimento, teste e validação de ambientes inteligentes e cooperativos. Esta tarefa foi realizada com base numa framework de simulação denominada AuNa. Foram expandidas as suas capacidades para facilitar a simulação de cenários cooperativos através da incorporação de sensores externos colocados no ambiente, em vez de depender apenas de sensores montados nos veículos. Além disso, concebemos uma abordagem de perceção cooperativa usando a framework, demonstrando o seu potencial e eficácia. Isto irá permitir a demonstração de múltiplos cenários de cooperação e também facilitar a fase de implementação, utilizando a mesma arquitetura de software

    Real-time high-performance computing for embedded control systems

    Get PDF
    The real-time control systems industry is moving towards the consolidation of multiple computing systems into fewer and more powerful ones, aiming for a reduction in size, weight, and power. The increasing demand for higher performance in other critical domains like autonomous driving has led the industry to recently include embedded GPUs for the implementation of advanced functionalities. The highly parallel architecture of GPUs could also be leveraged in the control systems industry to develop more advanced, energy-efficient, and scalable control systems. However, the closed-source and non-deterministic nature of GPUs complicates the resource provisioning analysis required for the implementation of critical real-time systems. On the other hand, there is no indication of the integration of GPUs in the traditional development cycle of control systems, which is oriented to the use of a model-based design approach. Recently, some model-based design tools vendors have extended their development frameworks with GPU code generation capabilities targeting hybrid computing platforms, so that the model-based design environment now enables the concurrent analysis of more complex and diverse functions by simulation and automating the deployment to the final target. However, there is no indication whether these tools are well-suited for the design and development of time-sensitive systems. Motivated by these challenges, in this thesis, we contribute to the state of the art of real-time control systems towards the adoption of embedded GPUs by providing tools to facilitate the resource provisioning analysis and the integration in the model-based design development cycle. First, we present a methodology and an automated tool to extract the properties of GPU memory allocators. This tool allows the computation of the real amount of memory used by GPU applications, facilitating a correct resource provisioning analysis. Then, we present a library which allows the characterization of the use of dynamic memory in GPU applications. We use this library to characterize GPU benchmarks and we identify memory allocation patterns that could be modified to improve performance and memory consumption when targeting embedded GPUs. Based on these results, we present a tool to optimize the use of dynamic memory in legacy GPU applications executed on embedded platforms. This tool allows us to minimize the memory consumption and memory management overhead of GPU applications without rewriting them. Afterwards, we analyze the timing of control algorithms executed in embedded GPUs and we identify techniques to achieve an acceptable real-time behavior. Finally, we evaluate model-based design tools in terms of integration with GPU hardware and GPU code generation, and we propose improvements for the model-based generated GPU code. Then, we present a source-to-source transformation tool to automatically apply the proposed improvements.La industria de los sistemas de control en tiempo real avanza hacia la consolidación de múltiples sistemas informáticos en menos y más potentes sistemas, con el objetivo de reducir el tamaño, el peso y el consumo. La creciente demanda de un mayor rendimiento en otros dominios críticos, como la conducción autónoma, ha llevado a la industria a incluir recientemente GPU embebidas para la implementación de funcionalidades avanzadas. La arquitectura altamente paralela de las GPU también podría aprovecharse en la industria de los sistemas de control para desarrollar sistemas de control más avanzados, eficientes energéticamente y escalables. Sin embargo, la naturaleza privativa y no determinista de las GPUs complica el análisis de aprovisionamiento de recursos requerido para la implementación de sistemas críticos en tiempo real. Por otro lado, no hay indicios de la integración de las GPU en el ciclo de desarrollo tradicional de los sistemas de control, que está orientado al uso de un enfoque de diseño basado en modelos. Recientemente, algunos proveedores de herramientas de diseño basado en modelos han ampliado sus entornos de desarrollo con capacidades de generación de código de GPU dirigidas a plataformas informáticas híbridas, de modo que el entorno de diseño basado en modelos ahora permite el análisis simultáneo de funciones más complejas y diversas mediante la simulación y la automatización de la implementación para el objetivo final. Sin embargo, no hay indicación de si estas herramientas son adecuadas para el diseño y desarrollo de sistemas sensibles al tiempo. Motivados por estos desafíos, en esta tesis contribuimos al estado del arte de los sistemas de control en tiempo real hacia la adopción de GPUs integradas al proporcionar herramientas para facilitar el análisis de aprovisionamiento de recursos y la integración en el ciclo de desarrollo de diseño basado en modelos. Primero, presentamos una metodología y una herramienta automatizada para extraer las propiedades de los asignadores de memoria en GPUs. Esta herramienta permite el cómputo de la cantidad real de memoria utilizada por las aplicaciones GPU, facilitando un correcto análisis del aprovisionamiento de recursos. Luego, presentamos una librería que permite la caracterización del uso de memoria dinámica en aplicaciones de GPU. Usamos esta librería para caracterizar una serie de benchmarks GPU e identificamos patrones de asignación de memoria que podrían modificarse para mejorar el rendimiento y el consumo de memoria al utilizar GPUs embebidas. Con base en estos resultados, presentamos también una herramienta para optimizar el uso de la memoria dinámica en aplicaciones de GPU heredadas al ser ejecutadas en plataformas embebidas. Esta herramienta nos permite minimizar el consumo de memoria y la sobrecarga de administración de memoria de las aplicaciones GPU sin necesidad de reescribirlas. Posteriormente, analizamos el tiempo de los algoritmos de control ejecutados en GPUs embebidas e identificamos técnicas para lograr un comportamiento de tiempo real aceptable. Finalmente, evaluamos las herramientas de diseño basadas en modelos en términos de integración con hardware GPU y generación de código GPU, y proponemos mejoras para el código GPU generado por las herramientas basadas en modelos. Luego, presentamos una herramienta de transformación de código fuente para aplicar automáticamente al código generado las mejoras propuestas.Postprint (published version

    Machine Learning for Microcontroller-Class Hardware -- A Review

    Full text link
    The advancements in machine learning opened a new opportunity to bring intelligence to the low-end Internet-of-Things nodes such as microcontrollers. Conventional machine learning deployment has high memory and compute footprint hindering their direct deployment on ultra resource-constrained microcontrollers. This paper highlights the unique requirements of enabling onboard machine learning for microcontroller class devices. Researchers use a specialized model development workflow for resource-limited applications to ensure the compute and latency budget is within the device limits while still maintaining the desired performance. We characterize a closed-loop widely applicable workflow of machine learning model development for microcontroller class devices and show that several classes of applications adopt a specific instance of it. We present both qualitative and numerical insights into different stages of model development by showcasing several use cases. Finally, we identify the open research challenges and unsolved questions demanding careful considerations moving forward.Comment: Accepted for publication at IEEE Sensors Journa
    corecore