    HyT-NAS: Hybrid Transformers Neural Architecture Search for Edge Devices

    Vision Transformers have enabled recent attention-based Deep Learning (DL) architectures to achieve remarkable results in Computer Vision (CV) tasks. However, due to the extensive computational resources required, these architectures are rarely implemented on resource-constrained platforms. Current research investigates hybrid handcrafted convolution-based and attention-based models for CV tasks such as image classification and object detection. In this paper, we propose HyT-NAS, an efficient Hardware-aware Neural Architecture Search (HW-NAS) including hybrid architectures targeting vision tasks on tiny devices. HyT-NAS improves state-of-the-art HW-NAS by enriching the search space and enhancing the search strategy as well as the performance predictors. Our experiments show that HyT-NAS achieves a similar hypervolume with less than ~5x training evaluations. Our resulting architecture outperforms MLPerf MobileNetV1 by 6.3% accuracy improvement with 3.5x less number of parameters on Visual Wake Words.Comment: CODAI 2022 Workshop - Embedded System Week (ESWeek

    Grassroots Operator Search for Model Edge Adaptation

    Hardware-aware Neural Architecture Search (HW-NAS) is increasingly being used to design efficient deep learning architectures. An efficient and flexible search space is crucial to the success of HW-NAS. Current approaches focus on designing a macro-architecture and searching for the architecture's hyperparameters based on a set of possible values. This approach is biased by the expertise of deep learning (DL) engineers and standard modeling approaches. In this paper, we present a Grassroots Operator Search (GOS) methodology. Our HW-NAS adapts a given model for edge devices by searching for efficient operator replacement. We express each operator as a set of mathematical instructions that capture its behavior. The mathematical instructions are then used as the basis for searching and selecting efficient replacement operators that maintain the accuracy of the original model while reducing computational complexity. Our approach is grassroots since it relies on the mathematical foundations to construct new and efficient operators for DL architectures. We demonstrate on various DL models, that our method consistently outperforms the original models on two edge devices, namely Redmi Note 7S and Raspberry Pi3, with a minimum of 2.2x speedup while maintaining high accuracy. Additionally, we showcase a use case of our GOS approach in pulse rate estimation on wristband devices, where we achieve state-of-the-art performance, while maintaining reduced computational complexity, demonstrating the effectiveness of our approach in practical applications

    SONATA: Self-adaptive Evolutionary Framework for Hardware-aware Neural Architecture Search

    Recent advancements in Artificial Intelligence (AI), driven by Neural Networks (NN), demand innovative neural architecture designs, particularly within the constrained environments of Internet of Things (IoT) systems, to balance performance and efficiency. HW-aware Neural Architecture Search (HW-aware NAS) emerges as an attractive strategy to automate the design of NN using multi-objective optimization approaches, such as evolutionary algorithms. However, the intricate relationship between NN design parameters and HW-aware NAS optimization objectives remains an underexplored research area, overlooking opportunities to effectively leverage this knowledge to guide the search process accordingly. Furthermore, the large amount of evaluation data produced during the search holds untapped potential for refining the optimization strategy and improving the approximation of the Pareto front. Addressing these issues, we propose SONATA, a self-adaptive evolutionary algorithm for HW-aware NAS. Our method leverages adaptive evolutionary operators guided by the learned importance of NN design parameters. Specifically, through tree-based surrogate models and a Reinforcement Learning agent, we aspire to gather knowledge on 'How' and 'When' to evolve NN architectures. Comprehensive evaluations across various NAS search spaces and hardware devices on the ImageNet-1k dataset have shown the merit of SONATA with up to 0.25% improvement in accuracy and up to 2.42x gains in latency and energy. Our SONATA has seen up to sim$93.6% Pareto dominance over the native NSGA-II, further stipulating the importance of self-adaptive evolution operators in HW-aware NAS.Comment: 13 pages, 9 figure

    Harmonic-NAS: Hardware-Aware Multimodal Neural Architecture Search on Resource-constrained Devices

    The recent surge of interest surrounding Multimodal Neural Networks (MM-NN) is attributed to their ability to effectively process and integrate multiscale information from diverse data sources. MM-NNs extract and fuse features from multiple modalities using adequate unimodal backbones and specific fusion networks. Although this helps strengthen the multimodal information representation, designing such networks is labor-intensive. It requires tuning the architectural parameters of the unimodal backbones, choosing the fusing point, and selecting the operations for fusion. Furthermore, multimodality AI is emerging as a cutting-edge option in Internet of Things (IoT) systems where inference latency and energy consumption are critical metrics in addition to accuracy. In this paper, we propose Harmonic-NAS, a framework for the joint optimization of unimodal backbones and multimodal fusion networks with hardware awareness on resource-constrained devices. Harmonic-NAS involves a two-tier optimization approach for the unimodal backbone architectures and fusion strategy and operators. By incorporating the hardware dimension into the optimization, evaluation results on various devices and multimodal datasets have demonstrated the superiority of Harmonic-NAS over state-of-the-art approaches achieving up to 10.9% accuracy improvement, 1.91x latency reduction, and 2.14x energy efficiency gain.Comment: Accepted to the 15th Asian Conference on Machine Learning (ACML 2023

    FLASH-RL: Federated Learning Addressing System and Static Heterogeneity using Reinforcement Learning

    Federated Learning (FL) has emerged as a promising Machine Learning paradigm, enabling multiple users to collaboratively train a shared model while preserving their local data. To minimize computing and communication costs associated with parameter transfer, it is common practice in FL to select a subset of clients in each training round. This selection must consider both system and static heterogeneity. Therefore, we propose FLASH-RL, a framework that utilizes Double Deep QLearning (DDQL) to address both system and static heterogeneity in FL. FLASH-RL introduces a new reputation-based utility function to evaluate client contributions based on their current and past performances. Additionally, an adapted DDQL algorithm is proposed to expedite the learning process. Experimental results on MNIST and CIFAR-10 datasets have shown FLASH-RL's effectiveness in achieving a balanced trade-off between model performance and end-to-end latency against existing solutions. Indeed, FLASH-RL reduces latency by up to 24.83% compared to FedAVG and 24.67% compared to FAVOR. It also reduces the training rounds by up to 60.44% compared to FedAVG and +76% compared to FAVOR. In fall detection using the MobiAct dataset, FLASH-RL outperforms FedAVG by up to 2.82% in model's performance and reduces latency by up to 34.75%. Additionally, FLASH-RL achieves the target performance faster, with up to a 45.32% reduction in training rounds compared to FedAVG.Comment: Accepted in the 41st IEEE International Conference on Computer Design (ICCD 2023

    Placement autonomique de machines virtuelles sur un système de stockage hybride dans un cloud IaaS

    IaaS cloud providers offer virtualized resources (CPU, storage, and network) as Virtual Machines(VM). The growth and highly competitive nature of this economy has compelled them to optimize the use of their data centers, in order to offer attractive services at a lower cost. In addition to investments related to infrastructure purchase and cost of use, energy efficiency is a major point of expenditure (2% of world consumption) and is constantly increasing. Its control represents a vital opportunity. From a technical point of view, the control of energy consumption is mainly based on consolidation approaches. These approaches, which exclusively take into account the CPU use of physical machines (PM) for the VM placement, present however many drawbacks. Indeed, recent studies have shown that storage systems and disk I/O represent a significant part of the data center energy consumption (between 14% and 40%).In this thesis we propose a new autonomic model for VM placement optimization based on MAPEK (Monitor, Analyze, Plan, Execute, Knowledge) whereby in addition to CPU, VM I/O and related storage systems are considered. Our first contribution proposes a multilevel VM I/O tracer which overcomes the limitations of existing I/O monitoring tools. In the Analyze step, the collected I/O traces are introduced in a cost model which takes into account the VM I/O profile, the storage system characteristics, and the cloud environment constraints. We also analyze the complementarity between the two main storage classes, resulting in a hybrid storage model exploiting the advantages of each. Indeed, Hard Disk Drives (HDD) represent energy-intensive and inefficient devices compared to compute units. However, their low cost per gigabyte and their long lifetime may constitute positive arguments. Unlike HDD, flash-based Solid-State Disks (SSD) are more efficient and consume less power, but their high cost per gigabyte and their short lifetime (compared to HDD) represent major constraints. The Plan phase has initially resulted in an extension of CloudSim to take into account VM I/O, the hybrid nature of the storage system, as well as the implementation of the previously proposed cost model. Secondly, we proposed several heuristics based on our cost model, integrated and evaluated using CloudSim. Finally, we showed that our contribution improves existing approaches of VM placement optimization by a factor of three.Les opérateurs de cloud IaaS (Infrastructure as a Service) proposent à leurs clients des ressources virtualisées (CPU, stockage et réseau) sous forme de machines virtuelles (VM).     Autonomic virtual machines placement on hybrid storage system in IaaS cloud

These approaches, which exclusively take into account the CPU use of physical machines (PM) for the VM placement, present however many drawbacks. Indeed, recent studies have shown that storage systems and disk I/O represent a significant part of the data center energy consumption (between 14% and 40%).In this thesis we propose a new autonomic model for VM placement optimization based on MAPEK (Monitor, Analyze, Plan, Execute, Knowledge) whereby in addition to CPU, VM I/O and related storage systems are considered. Our first contribution proposes a multilevel VM I/O tracer which overcomes the limitations of existing I/O monitoring tools. In the Analyze step, the collected I/O traces are introduced in a cost model which takes into account the VM I/O profile, the storage system characteristics, and the cloud environment constraints. We also analyze the complementarity between the two main storage classes, resulting in a hybrid storage model exploiting the advantages of each. Indeed, Hard Disk Drives (HDD) represent energy-intensive and inefficient devices compared to compute units. However, their low cost per gigabyte and their long lifetime may constitute positive arguments. Unlike HDD, flash-based Solid-State Disks (SSD) are more efficient and consume less power, but their high cost per gigabyte and their short lifetime (compared to HDD) represent major constraints. The Plan phase has initially resulted in an extension of CloudSim to take into account VM I/O, the hybrid nature of the storage system, as well as the implementation of the previously proposed cost model. Secondly, we proposed several heuristics based on our cost model, integrated and evaluated using CloudSim. Finally, we showed that our contribution improves existing approaches of VM placement optimization by a factor of three

    Integrating I/Os in Cloudsim for Performance and Energy Estimation

    International audienceThis article presents an extension of the IaaS Cloud simulator CloudSim. This extension takes into account the processing of i/o workload generated by virtual machines within a data center, and evaluates the overall performance and energy consumption. Indeed, according to state-of-the-art mstudies, storage systems energy consumption may account for as much as 40% in a data center. So, we modified the time computation model of CloudSim to consider i/o operations. Additionally, we designed several models of storage system devices including Hard Disk Drives and Solid-State Drives. We also modeled cpu utilization to compute the energy consumptions related to i/o request processing. This was achieved through machine learning techniques. Our storage system extensions have been evaluated using video encoding traces. The simulation results show that a significant amount of energy, around 25%, is consumed due to i/o workload execution. This corroborates the soundness of our CloudSim extensions

    A Multilevel I/O Tracer for Timing and Performance Analysis of Storage Systems in IaaS Cloud

    International audienceData centers are more and more relying on hybrid storage systems consisting of flash memory based storage devices and traditional hard disk drives. Optimal data placement in such hybrid storage systems is a very important issue in the domain of cloud computing and virtualization. This is specially the case when users need that storage systems enforce Quality of Service requirementson I/Os performed, for example for multimedia applications.To characterize Virtual Machine (VM) I/O workload properties such as timing predictability or throughput, monitoring services are necessary on such new architectures. This article presents a multilevel I/O tracer for virtual machines that relies on and complement different state-of-the-art tools. It produces I/O traces at different levels of the Linux I/O software stack. The I/O tracer gives an exhaustive information that allows administrators to precisely characterizevirtual machine I/O behavior in terms of percentage of read/write I/Os, percentage of random/sequential, I/O request inter-arrival time, etc. This tool is the first piece towards a middleware whose purpose is to meet user QoS requirements thanks to optimal data placement and migration policies in a hybrid storage system in the context of an IaaS Cloud

    A Cost Model for Virtual Machine Storage in Cloud IaaS Context

    International audienceThis paper proposes a storage system cost model for Infrastructure as a Service (IaaS) Cloud. The proposed cost model takes into account the virtualization environment, the storage system characteristics in addition to energy and QoS related parameters (Service Level Agreement and penalties). We show that those parameters are relevant and allow us to predict an accurate estimation of the overall cost of the IaaS infrastructure. We validate this cost model against real measures and we show less than 10% of error in most cases. Designers and administrators can use this cost model to perform optimization, load balancing, configuration and pricing of the Cloud infrastructure