1,389 research outputs found
Reliable Messaging to Millions of Users with MigratoryData
Web-based notification services are used by a large range of businesses to
selectively distribute live updates to customers, following the
publish/subscribe (pub/sub) model. Typical deployments can involve millions of
subscribers expecting ordering and delivery guarantees together with low
latencies. Notification services must be vertically and horizontally scalable,
and adopt replication to provide a reliable service. We report our experience
building and operating MigratoryData, a highly-scalable notification service.
We discuss the typical requirements of MigratoryData customers, and describe
the architecture and design of the service, focusing on scalability and fault
tolerance. Our evaluation demonstrates the ability of MigratoryData to handle
millions of concurrent connections and support a reliable notification service
despite server failures and network disconnections
DOH: A Content Delivery Peer-to-Peer Network
Many SMEs and non-pro¯t organizations su®er when their Web
servers become unavailable due to °ash crowd e®ects when their web site
becomes popular. One of the solutions to the °ash-crowd problem is to place
the web site on a scalable CDN (Content Delivery Network) that replicates
the content and distributes the load in order to improve its response time.
In this paper, we present our approach to building a scalable Web Hosting
environment as a CDN on top of a structured peer-to-peer system of collaborative
web-servers integrated to share the load and to improve the overall
system performance, scalability, availability and robustness. Unlike clusterbased
solutions, it can run on heterogeneous hardware, over geographically
dispersed areas. To validate and evaluate our approach, we have developed a
system prototype called DOH (DKS Organized Hosting) that is a CDN implemented
on top of the DKS (Distributed K-nary Search) structured P2P
system with DHT (Distributed Hash table) functionality [9]. The prototype
is implemented in Java, using the DKS middleware, the Jetty web-server, and
a modi¯ed JavaFTP server. The proposed design of CDN has been evaluated
by simulation and by evaluation experiments on the prototype
Distributed paged Hash tables
In this paper we present the design and implementation of DPH, a storage layer for cluster environments. DPH is a Distributed Data Structure (DDS) based on the distribution of a paged hash table. It combines main memory with file system resources across the cluster in order to implement a distributed dictionary that can be used for the storage of very large data sets with key based addressing techniques. The DPH storage layer is supported by a collection of cluster-aware utilities and services. Access to the DPH interface is provided by a user-level API. A preliminary performance evaluation shows promising results.Supported by PRODEP III (grant 5.3/N/199.006/00) and SAPIENS (grant 41739/CHS/2001
Scalable Storage for Digital Libraries
I propose a storage system optimised for digital libraries. Its key features are its heterogeneous scalability; its integration and exploitation of rich semantic metadata associated with digital objects; its use of a name space; and its aggressive performance optimisation in the digital library domain
Design of a Multi-Host Shared Memory Services System
Memory cache is one kind of memory, through which data and objects are stored, thereby reducing the time required to access the database and hard disk I/O, and achieving accelerated technology effects by a significant application in large-scale web systems. In this paper, we design Memcahed Helper (MH), based on a set of memcached with the scalability of a distributed memory cache system, in line with the progress of the cloud environment. The experimental results show that this system and the more efficient use of memory, provides better performance and speed
Scalable and Adaptive Load Balancing on IBM PowerNP
Web and other Internet-based server farms are a critical company resource. A solution to the increased complexity of server farms and to the need to improve the server performance in terms of scalability, fault tolerance and management is to implement a load balancing technique. It consists of a front-end machine which intelligently redirects the traffic to several Real Servers. We discuss the feasibility of implementing adaptive load balancing with minimal flow disruption on the IBM PowerNP Network Processor. We focus our attention on the steady-state part of the algorithm and propose a PowerNP-tailored mapping algorithm derived from Robust Hash Mapping. We propose and show a fast algorithm solution (despite the simple arithmetical logic of the PowerNP), as well as a scalable approach (aiming at minimizing the packet processing time) and, finally, we present some initial performance results
Revisiting Consistent Hashing with Bounded Loads
Dynamic load balancing lies at the heart of distributed caching. Here, the
goal is to assign objects (load) to servers (computing nodes) in a way that
provides load balancing while at the same time dynamically adjusts to the
addition or removal of servers. One essential requirement is that the addition
or removal of small servers should not require us to recompute the complete
assignment. A popular and widely adopted solution is the two-decade-old
Consistent Hashing (CH). Recently, an elegant extension was provided to account
for server bounds. In this paper, we identify that existing methodologies for
CH and its variants suffer from cascaded overflow, leading to poor load
balancing. This cascading effect leads to decreasing performance of the hashing
procedure with increasing load. To overcome the cascading effect, we propose a
simple solution to CH based on recent advances in fast minwise hashing. We
show, both theoretically and empirically, that our proposed solution is
significantly superior for load balancing and is optimal in many senses. On the
AOL search dataset and Indiana University Clicks dataset with real user
activity, our proposed solution reduces cache misses by several magnitudes
Recommended from our members
DotSlash: A Scalable and Efficient Rescue System for Handling Web Hotspots
This paper describes DotSlash, a scalable and efficient rescue system for handling web hotspots. DotSlash allows different web sites to form a mutual-aid community, and use spare capacity in the community to relieve web hotspots experienced by any individual site. As a rescue system, DotSlash intervenes when a web site becomes heavily loaded, and is phased out once the workload returns to normal. It aims to complement existing web server infrastructure such as CDNs to handle short-term load spikes effectively, but is not intended to support a request load constantly higher than a web site's planned capacity. DotSlash is scalable, cost-effective, easy to use, self-configuring, and transparent to clients. It targets small web sites, although large web site can also benefit from it. We have implemented a prototype of DotSlash on top of Apache. Experiments show that DotSlash can provide an order of magnitude improvement for a web server in terms of the request rate supported and the data rate delivered to clients even if only HTTP redirect is used. Parts of this work may be applicable to other services such as the Grid computational services and media streaming
Scalable Persistent Storage for Erlang
The many core revolution makes scalability a key property. The RELEASE project aims to improve the scalability of Erlang on emergent commodity architectures with 100,000 cores. Such architectures require scalable and available persistent storage on up to 100 hosts. We enumerate the requirements for scalable and available persistent storage, and evaluate four popular Erlang DBMSs against these requirements. This analysis shows that Mnesia and CouchDB are not suitable persistent storage at our target scale, but Dynamo-like NoSQL DataBase Management Systems (DBMSs) such as Cassandra and Riak potentially are. We investigate the current scalability limits of the Riak 1.1.1 NoSQL DBMS in practice on a 100-node cluster. We establish for the first time scientifically the scalability limit of Riak as 60 nodes on the Kalkyl cluster, thereby confirming developer folklore. We show that resources like memory, disk, and network do not limit the scalability of Riak. By instrumenting Erlang/OTP and Riak libraries we identify a specific Riak functionality that limits scalability. We outline how later releases of Riak are refactored to eliminate the scalability bottlenecks. We conclude that Dynamo-style NoSQL DBMSs provide scalable and available persistent storage for Erlang in general, and for our RELEASE target architecture in particular
- …