8,720 research outputs found
DOH: A Content Delivery Peer-to-Peer Network
Many SMEs and non-pro¯t organizations su®er when their Web
servers become unavailable due to °ash crowd e®ects when their web site
becomes popular. One of the solutions to the °ash-crowd problem is to place
the web site on a scalable CDN (Content Delivery Network) that replicates
the content and distributes the load in order to improve its response time.
In this paper, we present our approach to building a scalable Web Hosting
environment as a CDN on top of a structured peer-to-peer system of collaborative
web-servers integrated to share the load and to improve the overall
system performance, scalability, availability and robustness. Unlike clusterbased
solutions, it can run on heterogeneous hardware, over geographically
dispersed areas. To validate and evaluate our approach, we have developed a
system prototype called DOH (DKS Organized Hosting) that is a CDN implemented
on top of the DKS (Distributed K-nary Search) structured P2P
system with DHT (Distributed Hash table) functionality [9]. The prototype
is implemented in Java, using the DKS middleware, the Jetty web-server, and
a modi¯ed JavaFTP server. The proposed design of CDN has been evaluated
by simulation and by evaluation experiments on the prototype
ElasTraS: An Elastic Transactional Data Store in the Cloud
Over the last couple of years, "Cloud Computing" or "Elastic Computing" has
emerged as a compelling and successful paradigm for internet scale computing.
One of the major contributing factors to this success is the elasticity of
resources. In spite of the elasticity provided by the infrastructure and the
scalable design of the applications, the elephant (or the underlying database),
which drives most of these web-based applications, is not very elastic and
scalable, and hence limits scalability. In this paper, we propose ElasTraS
which addresses this issue of scalability and elasticity of the data store in a
cloud computing environment to leverage from the elastic nature of the
underlying infrastructure, while providing scalable transactional data access.
This paper aims at providing the design of a system in progress, highlighting
the major design choices, analyzing the different guarantees provided by the
system, and identifying several important challenges for the research community
striving for computing in the cloud.Comment: 5 Pages, In Proc. of USENIX HotCloud 200
Boosting Multi-Core Reachability Performance with Shared Hash Tables
This paper focuses on data structures for multi-core reachability, which is a
key component in model checking algorithms and other verification methods. A
cornerstone of an efficient solution is the storage of visited states. In
related work, static partitioning of the state space was combined with
thread-local storage and resulted in reasonable speedups, but left open whether
improvements are possible. In this paper, we present a scaling solution for
shared state storage which is based on a lockless hash table implementation.
The solution is specifically designed for the cache architecture of modern
CPUs. Because model checking algorithms impose loose requirements on the hash
table operations, their design can be streamlined substantially compared to
related work on lockless hash tables. Still, an implementation of the hash
table presented here has dozens of sensitive performance parameters (bucket
size, cache line size, data layout, probing sequence, etc.). We analyzed their
impact and compared the resulting speedups with related tools. Our
implementation outperforms two state-of-the-art multi-core model checkers (SPIN
and DiVinE) by a substantial margin, while placing fewer constraints on the
load balancing and search algorithms.Comment: preliminary repor
Scalable Persistent Storage for Erlang
The many core revolution makes scalability a key property. The RELEASE project aims to improve the scalability of Erlang on emergent commodity architectures with 100,000 cores. Such architectures require scalable and available persistent storage on up to 100 hosts. We enumerate the requirements for scalable and available persistent storage, and evaluate four popular Erlang DBMSs against these requirements. This analysis shows that Mnesia and CouchDB are not suitable persistent storage at our target scale, but Dynamo-like NoSQL DataBase Management Systems (DBMSs) such as Cassandra and Riak potentially are. We investigate the current scalability limits of the Riak 1.1.1 NoSQL DBMS in practice on a 100-node cluster. We establish for the first time scientifically the scalability limit of Riak as 60 nodes on the Kalkyl cluster, thereby confirming developer folklore. We show that resources like memory, disk, and network do not limit the scalability of Riak. By instrumenting Erlang/OTP and Riak libraries we identify a specific Riak functionality that limits scalability. We outline how later releases of Riak are refactored to eliminate the scalability bottlenecks. We conclude that Dynamo-style NoSQL DBMSs provide scalable and available persistent storage for Erlang in general, and for our RELEASE target architecture in particular
Observations on Factors Affecting Performance of MapReduce based Apriori on Hadoop Cluster
Designing fast and scalable algorithm for mining frequent itemsets is always
being a most eminent and promising problem of data mining. Apriori is one of
the most broadly used and popular algorithm of frequent itemset mining.
Designing efficient algorithms on MapReduce framework to process and analyze
big datasets is contemporary research nowadays. In this paper, we have focused
on the performance of MapReduce based Apriori on homogeneous as well as on
heterogeneous Hadoop cluster. We have investigated a number of factors that
significantly affects the execution time of MapReduce based Apriori running on
homogeneous and heterogeneous Hadoop Cluster. Factors are specific to both
algorithmic and non-algorithmic improvements. Considered factors specific to
algorithmic improvements are filtered transactions and data structures.
Experimental results show that how an appropriate data structure and filtered
transactions technique drastically reduce the execution time. The
non-algorithmic factors include speculative execution, nodes with poor
performance, data locality & distribution of data blocks, and parallelism
control with input split size. We have applied strategies against these factors
and fine tuned the relevant parameters in our particular application.
Experimental results show that if cluster specific parameters are taken care of
then there is a significant reduction in execution time. Also we have discussed
the issues regarding MapReduce implementation of Apriori which may
significantly influence the performance.Comment: 8 pages, 8 figures, International Conference on Computing,
Communication and Automation (ICCCA2016
- …