37 research outputs found

    Design of a Multi-Host Shared Memory Services System

    Get PDF
    Memory cache is one kind of memory, through which data and objects are stored, thereby reducing the time required to access the database and hard disk I/O, and achieving accelerated technology effects by a significant application in large-scale web systems. In this paper, we design Memcahed Helper (MH), based on a set of memcached with the scalability of a distributed memory cache system, in line with the progress of the cloud environment. The experimental results show that this system and the more efficient use of memory, provides better performance and speed

    Efficient Identification of Equivalences in Dynamic Graphs and Pedigree Structures

    Full text link
    We propose a new framework for designing test and query functions for complex structures that vary across a given parameter such as genetic marker position. The operations we are interested in include equality testing, set operations, isolating unique states, duplication counting, or finding equivalence classes under identifiability constraints. A motivating application is locating equivalence classes in identity-by-descent (IBD) graphs, graph structures in pedigree analysis that change over genetic marker location. The nodes of these graphs are unlabeled and identified only by their connecting edges, a constraint easily handled by our approach. The general framework introduced is powerful enough to build a range of testing functions for IBD graphs, dynamic populations, and other structures using a minimal set of operations. The theoretical and algorithmic properties of our approach are analyzed and proved. Computational results on several simulations demonstrate the effectiveness of our approach.Comment: Code for paper available at http://www.stat.washington.edu/~hoytak/code/hashreduc

    A Hybrid Web Caching Design Model for Internet-Content Delivery

    Get PDF
    The need for online contents (or resources) to be shared and distributed in a large and sophisticated networks of users, geographical dispersed location of servers and their clients, time taken to fulfil clients requests pose major challenge. Therefore the choice of suitable architecture forInternet-based content delivery (ICD) technologies readily comes to mind. To achieve this, Akamai and Gnutella Web technologies are extensively reviewed to identify their strengths and weakness because of their popularity across the world for delivering contents. This new design for Internet-based content distribution is called AkaGnu because of the extra layer (Gnutella network)inserted into Akamai architecture, which provides greater Internet edge over each technology deployed independently. The paper presents a new ICD technology that performs better than Akamai system as a result of new features and behaviours introduced that reduce network traffic, more clients Internet connectivity, increase file sharing, improved speed of contents deliveries, andenhanced network security.Keywords/Index Terms- ICD, Akamai, Gnutella, peer-to-peer, AkaGnu, network traffic, security, architecture, technolog

    FlipHash: A Constant-Time Consistent Range-Hashing Algorithm

    Full text link
    Consistent range-hashing is a technique used in distributed systems, either directly or as a subroutine for consistent hashing, commonly to realize an even and stable data distribution over a variable number of resources. We introduce FlipHash, a consistent range-hashing algorithm with constant time complexity and low memory requirements. Like Jump Consistent Hash, FlipHash is intended for applications where resources can be indexed sequentially. Under this condition, it ensures that keys are hashed evenly across resources and that changing the number of resources only causes keys to be remapped from a removed resource or to an added one, but never shuffled across persisted ones. FlipHash differentiates itself with its low computational cost, achieving constant-time complexity. We show that FlipHash beats Jump Consistent Hash's cost, which is logarithmic in the number of resources, both theoretically and in experiments over practical settings.Comment: 16 pages, 3 figures, 4 table

    Ubicrawler: a scalable fully distributed web crawler

    Get PDF
    We present the design and implementation of UbiCrawler, a scalable distributed web crawler, and we analyze its performance. The main features of UbiCrawler are platform independence, fault tolerance, a very effective assignment function for partitioning the domain to crawl, and more in general the complete decentralization of every task

    Asymptotic Miss Ratio of LRU Caching with Consistent Hashing

    Full text link
    To efficiently scale data caching infrastructure to support emerging big data applications, many caching systems rely on consistent hashing to group a large number of servers to form a cooperative cluster. These servers are organized together according to a random hash function. They jointly provide a unified but distributed hash table to serve swift and voluminous data item requests. Different from the single least-recently-used (LRU) server that has already been extensively studied, theoretically characterizing a cluster that consists of multiple LRU servers remains yet to be explored. These servers are not simply added together; the random hashing complicates the behavior. To this end, we derive the asymptotic miss ratio of data item requests on a LRU cluster with consistent hashing. We show that these individual cache spaces on different servers can be effectively viewed as if they could be pooled together to form a single virtual LRU cache space parametrized by an appropriate cache size. This equivalence can be established rigorously under the condition that the cache sizes of the individual servers are large. For typical data caching systems this condition is common. Our theoretical framework provides a convenient abstraction that can directly apply the results from the simpler single LRU cache to the more complex LRU cluster with consistent hashing.Comment: 11 pages, 4 figure

    Revisiting Consistent Hashing with Bounded Loads

    Full text link
    Dynamic load balancing lies at the heart of distributed caching. Here, the goal is to assign objects (load) to servers (computing nodes) in a way that provides load balancing while at the same time dynamically adjusts to the addition or removal of servers. One essential requirement is that the addition or removal of small servers should not require us to recompute the complete assignment. A popular and widely adopted solution is the two-decade-old Consistent Hashing (CH). Recently, an elegant extension was provided to account for server bounds. In this paper, we identify that existing methodologies for CH and its variants suffer from cascaded overflow, leading to poor load balancing. This cascading effect leads to decreasing performance of the hashing procedure with increasing load. To overcome the cascading effect, we propose a simple solution to CH based on recent advances in fast minwise hashing. We show, both theoretically and empirically, that our proposed solution is significantly superior for load balancing and is optimal in many senses. On the AOL search dataset and Indiana University Clicks dataset with real user activity, our proposed solution reduces cache misses by several magnitudes
    corecore