371 research outputs found

    Balanced allocation: Memory performance tradeoffs

    Full text link
    Suppose we sequentially put nn balls into nn bins. If we put each ball into a random bin then the heaviest bin will contain āˆ¼logā”n/logā”logā”n{\sim}\log n/\log\log n balls with high probability. However, Azar, Broder, Karlin and Upfal [SIAM J. Comput. 29 (1999) 180--200] showed that if each time we choose two bins at random and put the ball in the least loaded bin among the two, then the heaviest bin will contain only āˆ¼logā”logā”n{\sim}\log\log n balls with high probability. How much memory do we need to implement this scheme? We need roughly logā”logā”logā”n\log\log\log n bits per bin, and nlogā”logā”logā”nn\log\log\log n bits in total. Let us assume now that we have limited amount of memory. For each ball, we are given two random bins and we have to put the ball into one of them. Our goal is to minimize the load of the heaviest bin. We prove that if we have n1āˆ’Ī“n^{1-\delta} bits then the heaviest bin will contain at least Ī©(Ī“logā”n/logā”logā”n)\Omega(\delta\log n/\log\log n) balls with high probability. The bound is tight in the communication complexity model.Comment: Published in at http://dx.doi.org/10.1214/11-AAP804 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Load Balancing with Dynamic Set of Balls and Bins

    Full text link
    In dynamic load balancing, we wish to distribute balls into bins in an environment where both balls and bins can be added and removed. We want to minimize the maximum load of any bin but we also want to minimize the number of balls and bins affected when adding or removing a ball or a bin. We want a hashing-style solution where we given the ID of a ball can find its bin efficiently. We are given a balancing parameter c=1+Ļµc=1+\epsilon, where Ļµāˆˆ(0,1)\epsilon\in (0,1). With nn and mm the current numbers of balls and bins, we want no bin with load above C=āŒˆcn/māŒ‰C=\lceil c n/m\rceil, referred to as the capacity of the bins. We present a scheme where we can locate a ball checking 1+O(logā”1/Ļµ)1+O(\log 1/\epsilon) bins in expectation. When inserting or deleting a ball, we expect to move O(1/Ļµ)O(1/\epsilon) balls, and when inserting or deleting a bin, we expect to move O(C/Ļµ)O(C/\epsilon) balls. Previous bounds were off by a factor 1/Ļµ1/\epsilon. These bounds are best possible when C=O(1)C=O(1) but for larger CC, we can do much better: Let f=ĻµCf=\epsilon C if Cā‰¤logā”1/ĻµC\leq \log 1/\epsilon, f=ĻµCā‹…logā”(1/(ĻµC))f=\epsilon\sqrt{C}\cdot \sqrt{\log(1/(\epsilon\sqrt{C}))} if logā”1/Ļµā‰¤C<12Ļµ2\log 1/\epsilon\leq C<\tfrac{1}{2\epsilon^2}, and C=1C=1 if Cā‰„12Ļµ2C\geq \tfrac{1}{2\epsilon^2}. We show that we expect to move O(1/f)O(1/f) balls when inserting or deleting a ball, and O(C/f)O(C/f) balls when inserting or deleting a bin. For the bounds with larger CC, we first have to resolve a much simpler probabilistic problem. Place nn balls in mm bins of capacity CC, one ball at the time. Each ball picks a uniformly random non-full bin. We show that in expectation and with high probability, the fraction of non-full bins is Ī˜(f)\Theta(f). Then the expected number of bins that a new ball would have to visit to find one that is not full is Ī˜(1/f)\Theta(1/f). As it turns out, we obtain the same complexity in our more complicated scheme where both balls and bins can be added and removed.Comment: Accepted at STOC'2

    Communication Patterns for Randomized Algorithms

    Get PDF
    Examples of large scale networks include the Internet, peer-to-peer networks, parallel computing systems, cloud computing systems, sensor networks, and social networks. Efficient dissemination of information in large networks such as these is a funda- mental problem. In many scenarios the gathering of information by a centralised controller can be impractical. When designing and analysing distributed algorithms we must consider the limitations imposed by the heterogeneity of devices in the networks. Devices may have limited computational ability or space. This makes randomised algorithms attractive solutions. Randomised algorithms can often be simpler and easier to implement than their deterministic counterparts. This thesis analyses the effect of communication patterns on the performance of distributed randomised algorithms. We study randomized algorithms with application to three different areas. Firstly, we study a generalization of the balls-into-bins game. Balls into bins games have been used to analyse randomised load balancing. Under the Greedy[d] allocation scheme each ball queries the load of d random bins and is then allocated to the least loaded of them. We consider an infinite, parallel setting where expectedly Ī»n balls are allocated in parallel according to the Greedy[d] allocation scheme in to n bins and subsequently each non-empty bin removes a ball. Our results show that for d = 1,2, the Greedy[d] allocation scheme is self-stabilizing and that in any round the maximum system load for high arrival rates is exponentially smaller for d = 2 compared to d = 1 (w.h.p). Secondly, we introduce protocols that solve the plurality consensus problem on arbitrary graphs for arbitrarily small bias. Typically, protocols depend heavily on the employed communication mechanism. Our protocols are based on an interest- ing relationship between plurality consensus and distributed load balancing. This relationship allows us to design protocols that are both time and space efficient and generalize the state of the art for a large range of problem parameters. Finally, we investigate the effect of restricting the communication of the classical PULL algorithm for randomised rumour spreading. Rumour spreading (broadcast) is a fundamental task in distributed computing. Under the classical PULL algo- rithm, a node with the rumour that receives multiple requests is able to respond to all of them in a given round. Our model restricts nodes such that they can re- spond to at most one request per round. Our results show that the restricted PULL algorithm is optimal for several graph classes such as complete graphs, expanders, random graphs and several Cayley graphs

    Oblivious RAM with O((log N)3) worst-case cost

    Get PDF
    LNCS v. 7073 entitled: Advances in Cryptology ā€“ ASIACRYPT 2011Oblivious RAM is a useful primitive that allows a client to hide its data access patterns from an untrusted server in storage outsourcing applications. Until recently, most prior works on Oblivious RAM aim to optimize its amortized cost, while suffering from linear or even higher worst-case cost. Such poor worst-case behavior renders these schemes impractical in realistic settings, since a data access request can occasionally be blocked waiting for an unreasonably large number of operations to complete. This paper proposes novel Oblivious RAM constructions that achieves poly-logarithmic worst-case cost, while consuming constant client-side storage. To achieve the desired worst-case asymptotic performance, we propose a novel technique in which we organize the O-RAM storage into a binary tree over data buckets, while moving data blocks obliviously along tree edges.postprin

    Device Tracking via Linux's New TCP Source Port Selection Algorithm (Extended Version)

    Full text link
    We describe a tracking technique for Linux devices, exploiting a new TCP source port generation mechanism recently introduced to the Linux kernel. This mechanism is based on an algorithm, standardized in RFC 6056, for boosting security by better randomizing port selection. Our technique detects collisions in a hash function used in the said algorithm, based on sampling TCP source ports generated in an attacker-prescribed manner. These hash collisions depend solely on a per-device key, and thus the set of collisions forms a device ID that allows tracking devices across browsers, browser privacy modes, containers, and IPv4/IPv6 networks (including some VPNs). It can distinguish among devices with identical hardware and software, and lasts until the device restarts. We implemented this technique and then tested it using tracking servers in two different locations and with Linux devices on various networks. We also tested it on an Android device that we patched to introduce the new port selection algorithm. The tracking technique works in real-life conditions, and we report detailed findings about it, including its dwell time, scalability, and success rate in different network types. We worked with the Linux kernel team to mitigate the exploit, resulting in a security patch introduced in May 2022 to the Linux kernel, and we provide recommendations for better securing the port selection algorithm in the paper.Comment: This is an extended version of a paper with the same name that will be presented in the 32nd Usenix Security Symposium (USENIX 2023). UPDATE (2022-10-08): We revised some bibliography entries and clarified some aspects of the mathematical analysi

    Rethinking Distributed Caching Systems Design and Implementation

    Get PDF
    Distributed caching systems based on in-memory key-value stores have become a crucial aspect of fast and efficient content delivery in modern web-applications. However, due to the dynamic and skewed execution environments and workloads, under which such systems typically operate, several problems arise in the form of load imbalance. This thesis addresses the sources of load imbalance in caching systems, mainly: i) data placement, which relates to distribution of data items across servers and ii) data item access frequency, which describes amount of requests each server has to process, and how each server is able to cope with it. Thus, providing several strategies to overcome the sources of imbalance in isolation. As a use case, we analyse Memcached, its variants, and propose a novel solution for distributed caching systems. Our solution revolves around increasing parallelism through load segregation, and solutions to overcome the load discrepancies when reaching high saturation scenarios, mostly through access re-arrangement, and internal replication.Os sistemas de cache distribuĆ­dos baseados em armazenamento de pares chave-valor em RAM, tornaram-se um aspecto crucial em aplicaƧƵes web modernas para o fornecimento rĆ”pido e eficiente de conteĆŗdo. No entanto, estes sistemas normalmente estĆ£o sujeitos a ambientes muito dinĆ¢micos e irregulares. Este tipo de ambientes e irregularidades, causa vĆ”rios problemas, que emergem sob a forma de desequilĆ­brios de carga. Esta tese aborda as diferentes origens de desequilĆ­brio de carga em sistemas de caching distribuĆ­do, principalmente: i) colocaĆ§Ć£o de dados, que se relaciona com a distribuiĆ§Ć£o dos dados pelos servidores e a ii) frequĆŖncia de acesso aos dados, que reflete a quantidade de pedidos que cada servidor deve processar e como cada servidor lida com a sua carga. Desta forma, demonstramos vĆ”rias estratĆ©gias para reduzir o impacto proveniente das fontes de desequilĆ­brio, quando analizadas em isolamento. Como caso de uso, analisamos o sistema Memcached, as suas variantes, e propomos uma nova soluĆ§Ć£o para sistemas de caching distribuĆ­dos. A nossa soluĆ§Ć£o gira em torno de aumento de paralelismo atraves de segregaĆ§Ć£o de carga e em como superar superar as discrepĆ¢ncias de carga a quando de sistema entra em grande saturaĆ§Ć£o, principalmente atraves de reorganizaĆ§Ć£o de acesso e de replicaĆ§Ć£o intern

    Efficient data reconfiguration for today's cloud systems

    Get PDF
    Performance of big data systems largely relies on efficient data reconfiguration techniques. Data reconfiguration operations deal with changing configuration parameters that affect data layout in a system. They could be user-initiated like changing shard key, block size in NoSQL databases, or system-initiated like changing replication in distributed interactive analytics engine. Current data reconfiguration schemes are heuristics at best and often do not scale well as data volume grows. As a result, system performance suffers. In this thesis, we show that {\it data reconfiguration mechanisms can be done in the background by using new optimal or near-optimal algorithms coupling them with performant system designs}. We explore four different data reconfiguration operations affecting three popular types of systems -- storage, real-time analytics and batch analytics. In NoSQL databases (storage), we explore new strategies for changing table-level configuration and for compaction as they improve read/write latencies. In distributed interactive analytics engines, a good replication algorithm can save costs by judiciously using memory that is sufficient to provide the highest throughput and low latency for queries. Finally, in batch processing systems, we explore prefetching and caching strategies that can improve the number of production jobs meeting their SLOs. All these operations happen in the background without affecting the fast path. Our contributions in each of the problems are two-fold -- 1) we model the problem and design algorithms inspired from well-known theoretical abstractions, 2) we design and build a system on top of popular open source systems used in companies today. Finally, using real-life workloads, we evaluate the efficacy of our solutions. Morphus and Parqua provide several 9s of availability while changing table level configuration parameters in databases. By halving memory usage in distributed interactive analytics engine, Getafix reduces cost of deploying the system by 10 million dollars annually and improves query throughput. We are the first to model the problem of compaction and provide formal bounds on their runtime. Finally, NetCachier helps 30\% more production jobs to meet their SLOs compared to existing state-of-the-art
    • ā€¦
    corecore