9 research outputs found
Distributed Learning over Unreliable Networks
Most of today's distributed machine learning systems assume {\em reliable
networks}: whenever two machines exchange information (e.g., gradients or
models), the network should guarantee the delivery of the message. At the same
time, recent work exhibits the impressive tolerance of machine learning
algorithms to errors or noise arising from relaxed communication or
synchronization. In this paper, we connect these two trends, and consider the
following question: {\em Can we design machine learning systems that are
tolerant to network unreliability during training?} With this motivation, we
focus on a theoretical problem of independent interest---given a standard
distributed parameter server architecture, if every communication between the
worker and the server has a non-zero probability of being dropped, does
there exist an algorithm that still converges, and at what speed? The technical
contribution of this paper is a novel theoretical analysis proving that
distributed learning over unreliable network can achieve comparable convergence
rate to centralized or distributed learning over reliable networks. Further, we
prove that the influence of the packet drop rate diminishes with the growth of
the number of \textcolor{black}{parameter servers}. We map this theoretical
result onto a real-world scenario, training deep neural networks over an
unreliable network layer, and conduct network simulation to validate the system
improvement by allowing the networks to be unreliable
Performance Evaluation of Networked Systems
Networked systems are tasked with processing large amounts of data and timely responding to requests coming in at high rates. These tasks involve the collaboration and communication between many nodes to achieve high throughput and low latency. Moreover, the system must communicate with any end-user to receive the request and relay the response. The effective and efficient use of the network is of paramount importance to good operation in both cases: within the data center and across the Internet. Networked systems require network services both expressive enough and with sufficient guarantees to enable performant communication of the system over the network. Moreover, it is desirable for its network service to be continuously validated by a monitoring system, to confirm it operates in accordance with its guarantees and satisfies the application's communication needs -- even in challenging scenarios such as in the presence of adversaries. Through the development of models and tools, we gain understanding of the performance of networked systems. This enables us to identify bottlenecks and effective improvements, as well as provision these networked systems to deliver sufficiently good performance in a cost-efficient manner, eschewing intolerable performance due to under-provisioning and excessive cost due to over-provisioning.
In this dissertation, we first consider a new network service primitive which permits bounded degradation of delivery and performance in order to speed-up co-located network flows. Second, we provide a novel perspective on performance by considering how to make a networked system along with its monitoring system robust even in the face of an in-network programmable adversary. Third, we develop a simulator for low Earth orbit constellations of satellites which enables convenient performance analysis of such highly dynamic and constantly evolving networked systems. Fourth, we investigate the tradeoff of cost and performance, and build an advisor to cost-efficiently provision a networked system of serverless functions designed to process interactive queries on cold data. With this varied set of contributions, we improve the performance, resilience, analyzability and efficiency of networked systems
Resource Allocation in Serverless Query Processing
Data lakes hold a growing amount of cold data that is infrequently accessed, yet require interactive response times. Serverless functions are seen as a way to address this use case since they offer an appealing alternative to maintaining (and paying for) a fixed infrastructure. Recent research has analyzed the potential of serverless for data processing. In this paper, we expand on such work by looking into the question of serverless resource allocation to data processing tasks (number and size of the functions). We formulate a general model to roughly estimate completion time and financial cost, which we apply to augment an existing serverless data processing system with an advisory tool that automatically identifies configurations striking a good balance -- which we define as being close to the "knee" of their Pareto frontier. The model takes into account key aspects of serverless: start-up, computation, network transfers, and overhead as a function of the input sizes and intermediate result exchanges. Using (micro)benchmarks and parts of TPC-H, we show that this advisor is capable of pinpointing configurations desirable to the user. Moreover, we identify and discuss several aspects of data processing on serverless affecting efficiency. By using an automated tool to configure the resources, the barrier to using serverless for data processing is lowered and the narrow window where it is cost effective can be expanded by using a more optimal allocation instead of having to over-provision the design
Distributed learning over unreliable networks
Most of today's distributed machine learning systems assume reliable networks: whenever two machines exchange information (e.g., gradients or models), the network should guarantee the delivery of the message. At the same time, recent work exhibits the impressive tolerance of machine learning algorithms to errors or noise arising from relaxed communication or synchronization. In this paper, we connect these two trends, and consider the following question: Can we design machine learning systems that are tolerant to network unreliability during training? With this motivation, we focus on a theoretical problem of independent interest-given a standard distributed parameter server architecture, if every communication between the worker and the server has a non-zero probability p of being dropped, does there exist an algorithm that still converges, and at what speed? The technical contribution of this paper is a novel theoretical analysis proving that distributed learning over unreliable network can achieve comparable convergence rate to centralized or distributed learning over reliable networks. Further, we prove that the influence of the packet drop rate diminishes with the growth of the number of parameter servers. We map this theoretical result onto a real-world scenario, training deep neural networks over an unreliable network layer, and conduct network simulation to validate the system improvement by allowing the networks to be unreliable
“You’ll Never Tweet Alone”: Managing Sports Brands through Social Media
The emergence of social media has had a profound impact on the way companies communicate and connect with their customers. Indeed, brands across different industries have started utilizing social media as part of their marketing strategies. However, the way in which businesses using social media for branding purposes has been limited. This study aims to address the above-mentioned gap by drawing on the professional sports industry. Employing a case study approach, the study analyzes the use of Twitter by a professional football organization in order to examine brand attributes (both product-related and non-product-related) and their relation to Twitter’s key engagement features (Reply, Retweet, Favorite). The results extend the current knowledge base of social media brand management in the sports industry, while offering significant insights for practitioners as far as the interaction of online followers to the communicated brand attributes is concerned