23,558 research outputs found
Object Replication Algorithms for World Wide Web
Object replication is a well-known technique to improve the accessibility of the Web sites. It generally offers reduced client latencies and increases a site's availability. However, applying replication techniques is not trivial and a large number of heuristics have been proposed to decide the number of replicas of an object and their placement in a distributed web server system. This paper presents three object placement and replication algorithms. The first two heuristics are centralized in the sense that a central site determines the number of replicas and their placement. Due to the dynamic nature of the Internet traffic and the rapid change in the access pattern of the World-Wide Web, we also propose a distributed algorithm where each site relies on some locally collected information to decide what objects should be replicated at that site. The performance of the proposed algorithms is evaluated through a simulation study. Also, the performance of the proposed algorithms has been compared with that of three other well-known algorithms and the results are presented. The simulation results demonstrate the effectiveness and superiority of the proposed algorithms
A Literature Survey of Cooperative Caching in Content Distribution Networks
Content distribution networks (CDNs) which serve to deliver web objects
(e.g., documents, applications, music and video, etc.) have seen tremendous
growth since its emergence. To minimize the retrieving delay experienced by a
user with a request for a web object, caching strategies are often applied -
contents are replicated at edges of the network which is closer to the user
such that the network distance between the user and the object is reduced. In
this literature survey, evolution of caching is studied. A recent research
paper [15] in the field of large-scale caching for CDN was chosen to be the
anchor paper which serves as a guide to the topic. Research studies after and
relevant to the anchor paper are also analyzed to better evaluate the
statements and results of the anchor paper and more importantly, to obtain an
unbiased view of the large scale collaborate caching systems as a whole.Comment: 5 pages, 5 figure
On the Evaluation of RDF Distribution Algorithms Implemented over Apache Spark
Querying very large RDF data sets in an efficient manner requires a
sophisticated distribution strategy. Several innovative solutions have recently
been proposed for optimizing data distribution with predefined query workloads.
This paper presents an in-depth analysis and experimental comparison of five
representative and complementary distribution approaches. For achieving fair
experimental results, we are using Apache Spark as a common parallel computing
framework by rewriting the concerned algorithms using the Spark API. Spark
provides guarantees in terms of fault tolerance, high availability and
scalability which are essential in such systems. Our different implementations
aim to highlight the fundamental implementation-independent characteristics of
each approach in terms of data preparation, load balancing, data replication
and to some extent to query answering cost and performance. The presented
measures are obtained by testing each system on one synthetic and one
real-world data set over query workloads with differing characteristics and
different partitioning constraints.Comment: 16 pages, 3 figure
Simulating Wde-area Replication
We describe our experiences with simulating replication algorithms for use in far flung distributed systems. The algorithms under scrutiny mimic epidemics. Epidemic algorithms seem to scale and adapt to change (such as varying replica sets) well. The loose consistency guarantees they make seem more useful in applications where availability strongly outweighs correctness; e.g., distributed name service
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
Cooperative Caching for Multimedia Streaming in Overlay Networks
Traditional data caching, such as web caching, only focuses on how to boost the hit rate of requested objects in caches, and therefore, how to reduce the initial delay for object retrieval. However, for multimedia objects, not only reducing the delay of object retrieval, but also provisioning reasonably stable network bandwidth to clients, while the fetching of the cached objects goes on, is important as well. In this paper, we propose our cooperative caching scheme for a multimedia delivery scenario, supporting a large number of peers over peer-to-peer overlay networks. In order to facilitate multimedia streaming and downloading service from servers, our caching scheme (1) determines the appropriate availability of cached stream segments in a cache community, (2) determines the appropriate peer for cache replacement, and (3) performs bandwidth-aware and availability-aware cache replacement. By doing so, it achieves (1) small delay of stream retrieval, (2) stable bandwidth provisioning during retrieval session, and (3) load balancing of clients' requests among peers
Reproducible Econometric Research. A Critical Review of the State of the Art.
Recent software developments are reviewed from the vantage point of reproducible econometric research. We argue that the emergence of new tools, particularly in the open-source community, have greatly eased the burden of documenting and archiving both empirical and simulation work in econometrics. Some of these tools are highlighted in the discussion of three small replication exercises.Series: Research Report Series / Department of Statistics and Mathematic
- …