54 research outputs found

    Coding for Privacy in Distributed Computing

    Get PDF
    I et distribuert datanettverk samarbeider flere enheter for å løse et problem. Slik kan vi oppnå mer enn summen av delene: samarbeid gjør at problemet kan løses mer effektivt, og samtidig blir det mulig å løse problemer som hver enkelt enhet ikke kan løse på egen hånd. På den annen side kan enheter som bruker veldig lang tid på å fullføre sin oppgave øke den totale beregningstiden betydelig. Denne såkalte straggler-effekten kan oppstå som følge av tilfeldige hendelser som minnetilgang og oppgaver som kjører i bakgrunnen på de ulike enhetene. Straggler-problemet blokkerer vanligvis hele beregningen siden alle enhetene må vente på at de treigeste enhetene blir ferdige. Videre kan deling av data og delberegninger mellom de ulike enhetene belaste kommunikasjonsnettverket betydelig. Spesielt i et trådløst nettverk hvor enhetene må dele en enkelt kommunikasjonskanal, for eksempel ved beregninger langs kanten av et nettverk (såkalte kantberegninger) og ved føderert læring, blir kommunikasjonen ofte flaskehalsen. Sist men ikke minst gir deling av data med upålitelige enheter økt bekymring for personvernet. En som ønsker å bruke et distribuert datanettverk kan være skeptisk til å dele personlige data med andre enheter uten å beskytte sensitiv informasjon tilstrekkelig. Denne avhandlingen studerer hvordan ideer fra kodeteori kan dempe straggler-problemet, øke effektiviteten til kommunikasjonen og garantere datavern i distribuert databehandling. Spesielt gir del A en innføring i kantberegning og føderert læring, to populære instanser av distribuert databehandling, lineær regresjon, et vanlig problem som kan løses ved distribuert databehandling, og relevante ideer fra kodeteori. Del B består av forskningsartikler skrevet innenfor rammen av denne avhandlingen. Artiklene presenterer metoder som utnytter ideer fra kodeteori for å redusere beregningstiden samtidig som datavernet ivaretas ved kantberegninger og ved føderert læring. De foreslåtte metodene gir betydelige forbedringer sammenlignet med tidligere metoder i litteraturen. For eksempel oppnår en metode fra artikkel I en 8%-hastighetsforbedring for kantberegninger sammenlignet med en nylig foreslått metode. Samtidig ivaretar vår metode datavernet, mens den metoden som vi sammenligner med ikke gjør det. Artikkel II presenterer en metode som for noen brukstilfeller er opp til 18 ganger raskere for føderert læring sammenlignet med tidligere metoder i litteraturen.In a distributed computing network, multiple devices combine their resources to solve a problem. Thereby the network can achieve more than the sum of its parts: cooperation of the devices can enable the devices to compute more efficiently than each device on its own could and even enable the devices to solve a problem neither of them could solve on its own. However, devices taking exceptionally long to finish their tasks can exacerbate the overall latency of the computation. This so-called straggler effect can arise from random effects such as memory access and tasks running in the background of the devices. The effect typically stalls the whole network because most devices must wait for the stragglers to finish. Furthermore, sharing data and results among devices can severely strain the communication network. Especially in a wireless network where devices have to share a common channel, e.g., in edge computing and federated learning, the communication links often become the bottleneck. Last but not least, offloading data to untrusted devices raises privacy concerns. A participant in the distributed computing network might be weary of sharing personal data with other devices without adequately protecting sensitive information. This thesis analyses how ideas from coding theory can mitigate the straggler effect, reduce the communication load, and guarantee data privacy in distributed computing. In particular, Part A gives background on edge computing and federated learning, two popular instances of distributed computing, linear regression, a common problem to be solved by distributed computing, and the specific ideas from coding theory that are proposed to tackle the problems arising in distributed computing. Part B contains papers on the research performed in the framework of this thesis. The papers propose schemes that combine the introduced coding theory ideas to minimize the overall latency while preserving data privacy in edge computing and federated learning. The proposed schemes significantly outperform state-of-the-art schemes. For example, a scheme from Paper I achieves an 8% speed-up for edge computing compared to a recently proposed non-private scheme while guaranteeing data privacy, whereas the schemes from Paper II achieve a speed-up factor of up to 18 for federated learning compared to current schemes in the literature for considered scenarios.Doktorgradsavhandlin

    Sequential Gradient Coding For Straggler Mitigation

    Full text link
    In distributed computing, slower nodes (stragglers) usually become a bottleneck. Gradient Coding (GC), introduced by Tandon et al., is an efficient technique that uses principles of error-correcting codes to distribute gradient computation in the presence of stragglers. In this paper, we consider the distributed computation of a sequence of gradients {g(1),g(2),…,g(J)}\{g(1),g(2),\ldots,g(J)\}, where processing of each gradient g(t)g(t) starts in round-tt and finishes by round-(t+T)(t+T). Here T≥0T\geq 0 denotes a delay parameter. For the GC scheme, coding is only across computing nodes and this results in a solution where T=0T=0. On the other hand, having T>0T>0 allows for designing schemes which exploit the temporal dimension as well. In this work, we propose two schemes that demonstrate improved performance compared to GC. Our first scheme combines GC with selective repetition of previously unfinished tasks and achieves improved straggler mitigation. In our second scheme, which constitutes our main contribution, we apply GC to a subset of the tasks and repetition for the remainder of the tasks. We then multiplex these two classes of tasks across workers and rounds in an adaptive manner, based on past straggler patterns. Using theoretical analysis, we demonstrate that our second scheme achieves significant reduction in the computational load. In our experiments, we study a practical setting of concurrently training multiple neural networks over an AWS Lambda cluster involving 256 worker nodes, where our framework naturally applies. We demonstrate that the latter scheme can yield a 16\% improvement in runtime over the baseline GC scheme, in the presence of naturally occurring, non-simulated stragglers

    Randomized Polar Codes for Anytime Distributed Machine Learning

    Full text link
    We present a novel distributed computing framework that is robust to slow compute nodes, and is capable of both approximate and exact computation of linear operations. The proposed mechanism integrates the concepts of randomized sketching and polar codes in the context of coded computation. We propose a sequential decoding algorithm designed to handle real valued data while maintaining low computational complexity for recovery. Additionally, we provide an anytime estimator that can generate provably accurate estimates even when the set of available node outputs is not decodable. We demonstrate the potential applications of this framework in various contexts, such as large-scale matrix multiplication and black-box optimization. We present the implementation of these methods on a serverless cloud computing system and provide numerical results to demonstrate their scalability in practice, including ImageNet scale computations

    Preserving Sparsity and Privacy in Straggler-Resilient Distributed Matrix Computations

    Full text link
    Existing approaches to distributed matrix computations involve allocating coded combinations of submatrices to worker nodes, to build resilience to stragglers and/or enhance privacy. In this study, we consider the challenge of preserving input sparsity in such approaches to retain the associated computational efficiency enhancements. First, we find a lower bound on the weight of coding, i.e., the number of submatrices to be combined to obtain coded submatrices to provide the resilience to the maximum possible number of stragglers (for given number of nodes and their storage constraints). Next we propose a distributed matrix computation scheme which meets this exact lower bound on the weight of the coding. Further, we develop controllable trade-off between worker computation time and the privacy constraint for sparse input matrices in settings where the worker nodes are honest but curious. Numerical experiments conducted in Amazon Web Services (AWS) validate our assertions regarding straggler mitigation and computation speed for sparse matrices

    Coded matrix computation with gradient coding

    Full text link
    Polynomial based approaches, such as the Mat-Dot and entangled polynomial (EP) codes have been used extensively within coded matrix computations to obtain schemes with good thresholds. However, these schemes are well-recognized to suffer from poor numerical stability in decoding. Moreover, the encoding process in these schemes involves linearly combining a large number of input submatrices, i.e., the encoding weight is high. For the practically relevant case of sparse input matrices, this can have the undesirable effect of significantly increasing the worker node computation time. In this work, we propose a generalization of the EP scheme by combining the idea of gradient coding along with the basic EP encoding. Our scheme allows us to reduce the weight of the encoding and arrive at schemes that exhibit much better numerical stability; this is achieved at the expense of a worse threshold. By appropriately setting parameters in our scheme, we recover several well-known schemes in the literature. Simulation results show that our scheme provides excellent numerical stability and fast computation speed (for sparse input matrices) as compared to EPC and Mat-Dot codes
    • …
    corecore