The advent of serverless computing has ushered in notable advancements in
distributed machine learning, particularly within parameter server-based
architectures. Yet, the integration of serverless features within peer-to-peer
(P2P) distributed networks remains largely uncharted. In this paper, we
introduce SPIRT, a fault-tolerant, reliable, and secure serverless P2P ML
training architecture. designed to bridge this existing gap.
Capitalizing on the inherent robustness and reliability innate to P2P
systems, SPIRT employs RedisAI for in-database operations, leading to an 82\%
reduction in the time required for model updates and gradient averaging across
a variety of models and batch sizes. This architecture showcases resilience
against peer failures and adeptly manages the integration of new peers, thereby
highlighting its fault-tolerant characteristics and scalability. Furthermore,
SPIRT ensures secure communication between peers, enhancing the reliability of
distributed machine learning tasks. Even in the face of Byzantine attacks, the
system's robust aggregation algorithms maintain high levels of accuracy. These
findings illuminate the promising potential of serverless architectures in P2P
distributed machine learning, offering a significant stride towards the
development of more efficient, scalable, and resilient applications