394 research outputs found

    A federated learning framework for the next-generation machine learning systems

    Get PDF
    Dissertação de mestrado em Engenharia Eletrónica Industrial e Computadores (especialização em Sistemas Embebidos e Computadores)The end of Moore's Law aligned with rising concerns about data privacy is forcing machine learning (ML) to shift from the cloud to the deep edge, near to the data source. In the next-generation ML systems, the inference and part of the training process will be performed right on the edge, while the cloud will be responsible for major ML model updates. This new computing paradigm, referred to by academia and industry researchers as federated learning, alleviates the cloud and network infrastructure while increasing data privacy. Recent advances have made it possible to efficiently execute the inference pass of quantized artificial neural networks on Arm Cortex-M and RISC-V (RV32IMCXpulp) microcontroller units (MCUs). Nevertheless, the training is still confined to the cloud, imposing the transaction of high volumes of private data over a network. To tackle this issue, this MSc thesis makes the first attempt to run a decentralized training in Arm Cortex-M MCUs. To port part of the training process to the deep edge is proposed L-SGD, a lightweight version of the stochastic gradient descent optimized for maximum speed and minimal memory footprint on Arm Cortex-M MCUs. The L-SGD is 16.35x faster than the TensorFlow solution while registering a memory footprint reduction of 13.72%. This comes at the cost of a negligible accuracy drop of only 0.12%. To merge local model updates returned by edge devices this MSc thesis proposes R-FedAvg, an implementation of the FedAvg algorithm that reduces the impact of faulty model updates returned by malicious devices.O fim da Lei de Moore aliado às crescentes preocupações sobre a privacidade dos dados gerou a necessidade de migrar as aplicações de Machine Learning (ML) da cloud para o edge, perto da fonte de dados. Na próxima geração de sistemas ML, a inferência e parte do processo de treino será realizada diretamente no edge, enquanto que a cloud será responsável pelas principais atualizações do modelo ML. Este novo paradigma informático, referido pelos investigadores académicos e industriais como treino federativo, diminui a sobrecarga na cloud e na infraestrutura de rede, ao mesmo tempo que aumenta a privacidade dos dados. Avanços recentes tornaram possível a execução eficiente do processo de inferência de redes neurais artificiais quantificadas em microcontroladores Arm Cortex-M e RISC-V (RV32IMCXpulp). No entanto, o processo de treino continua confinado à cloud, impondo a transação de grandes volumes de dados privados sobre uma rede. Para abordar esta questão, esta dissertação faz a primeira tentativa de realizar um treino descentralizado em microcontroladores Arm Cortex-M. Para migrar parte do processo de treino para o edge é proposto o L-SGD, uma versão lightweight do tradicional método stochastic gradient descent (SGD), otimizada para uma redução de latência do processo de treino e uma redução de recursos de memória nos microcontroladores Arm Cortex-M. O L-SGD é 16,35x mais rápido do que a solução disponibilizada pelo TensorFlow, ao mesmo tempo que regista uma redução de utilização de memória de 13,72%. O custo desta abordagem é desprezível, sendo a perda de accuracy do modelo de apenas 0,12%. Para fundir atualizações de modelos locais devolvidas por dispositivos do edge, é proposto o RFedAvg, uma implementação do algoritmo FedAvg que reduz o impacto de atualizações de modelos não contributivos devolvidos por dispositivos maliciosos

    Compressed Distributed Gradient Descent: Communication-Efficient Consensus over Networks

    Get PDF
    Network consensus optimization has received increasing attention in recent years and has found important applications in many scientific and engineering fields. To solve network consensus optimization problems, one of the most well-known approaches is the distributed gradient descent method (DGD). However, in networks with slow communication rates, DGD's performance is unsatisfactory for solving high-dimensional network consensus problems due to the communication bottleneck. This motivates us to design a communication-efficient DGD-type algorithm based on compressed information exchanges. Our contributions in this paper are three-fold: i) We develop a communication-efficient algorithm called amplified-differential compression DGD (ADC-DGD) and show that it converges under {\em any} unbiased compression operator; ii) We rigorously prove the convergence performances of ADC-DGD and show that they match with those of DGD without compression; iii) We reveal an interesting phase transition phenomenon in the convergence speed of ADC-DGD. Collectively, our findings advance the state-of-the-art of network consensus optimization theory.Comment: 11 pages, 11 figures, IEEE INFOCOM 201

    Multiple description image and video coding for P2P transmissions

    Get PDF
    Peer-to-Peer (P2P) media streaming is, nowadays, a very attractive topic due to the bandwidth available to serve demanding content scales. A key challenge, however, is making content distribution robust to peer transience. Multiple description coding (MDC) has, indeed, proven to be very effective with problems concerning the packets’ losses, since it generates several descriptions and may reconstruct the original information with any number of descriptions that may reach the decoder. Therefore multiple descriptions may be effective for robust peer-to-peer media streaming. In this dissertation, it will not only be showed that, but also that varying the redundancy level of description on the fly may lead to a better performance than the one obtained without varying this parameter. Besides that, it is shown, as well, that varying the Bitrate on the fly outperforms the redundancy on it. Furthermore, the redundancy and the Bitrate were varied simultaneously. Thus, it is shown that this variation is more efficient when the packet loss is high. The experiments reported above were done using an experimental test bed developed for this purpose at the NMCG lab of the University of Beira Interior. It was also used the REGPROT, a video encoder developed by our research team, to splitted the video into multiple descriptions, which were, later, distributed among the peers in the test bed. After the request of the client, the referred encoder decoded the descriptions as they were being received.Fundação para a Ciência e a Tecnologia (FCT

    Beyond Massive MIMO : Trade-offs and Opportunities with Large Multi-Antenna Systems

    Get PDF
    After the commercial emergence of 5G, the research community is already putting its focus on proposing innovative solutions to enable the upcoming 6G. One important lesson put forth by 5G research was that scaling up the conventional multiple-input-multiple-output (MIMO) technology by increasing the number of antennas could be extremely beneficial for effectively multiplexing data streams in the spatial domain. This idea was embodied in massive MIMO, which constitutes one of the major technical advancements included in 5G. Consequently, 6G research efforts have been largely directed towards studying ways to further scale up wireless systems, as can be seen in some of the proposed 6G enabling technologies like large intelligent surface (LIS), cell-free massive MIMO, or even reconfigurable intelligent surface (RIS). This thesis studies the possibilities offered by some of these technologies, as well as the trade-offs that may naturally arise when scaling up such wireless systems.An important part of this thesis deals with decentralized solutions for base station (BS) technologies including a large number of antennas. Already in the initial massive MIMO prototypes, the increased number of BS antennas led to scalability issues due to the high interconnection bandwidths required to send the received signals---as well as the channel state information (CSI)---to a central processing unit (CPU) in charge of the data processing. These issues can only be exacerbated if we consider novel system proposals like LIS, where the number of BS antennas may be increased by an order of magnitude with respect to massive MIMO, or cell-free massive MIMO, where the BS antennas may be located far from each other. We provide a number of decentralized schemes to process the received data while restricting the information that has to be shared with a CPU. We also provide a framework to study architectures with an arbitrary level of decentralization, showing that there exists a direct trade-off between the interconnection bandwidth to a CPU and the complexity of the decentralized processing required for fixed user rates.Another part of this thesis studies RIS-based solutions to enhance the multiplexing performance of wireless communication systems. RIS constitutes one of the most attractive 6G enabling technologies since it provides a cost- and energy-efficient solution to improve the wireless propagation links by generating favorable reflections. We extend the concept of RIS by considering reconfigurable surfaces (RSs) with different processing capabilities, and we show how these surfaces may be employed for achieving perfect spatial multiplexing at reduced processing complexity in general multi-antenna communication settings. We also show that these surfaces can exploit the available degrees of freedom---e.g., due to excess of BS antennas---to embed their own data into the enhanced channel

    Proof of Deep Learning: Approaches, Challenges, and Future Directions

    Full text link
    The rise of computational power has led to unprecedented performance gains for deep learning models. As more data becomes available and model architectures become more complex, the need for more computational power increases. On the other hand, since the introduction of Bitcoin as the first cryptocurrency and the establishment of the concept of blockchain as a distributed ledger, many variants and approaches have been proposed. However, many of them have one thing in common, which is the Proof of Work (PoW) consensus mechanism. PoW is mainly used to support the process of new block generation. While PoW has proven its robustness, its main drawback is that it requires a significant amount of processing power to maintain the security and integrity of the blockchain. This is due to applying brute force to solve a hashing puzzle. To utilize the computational power available in useful and meaningful work while keeping the blockchain secure, many techniques have been proposed, one of which is known as Proof of Deep Learning (PoDL). PoDL is a consensus mechanism that uses the process of training a deep learning model as proof of work to add new blocks to the blockchain. In this paper, we survey the various approaches for PoDL. We discuss the different types of PoDL algorithms, their advantages and disadvantages, and their potential applications. We also discuss the challenges of implementing PoDL and future research directions
    corecore