Search CORE

2,100 research outputs found

Online Social Network Data Placement over Clouds

Author: Jiao Lei
Publication venue
Publication date: 10/07/2014
Field of study

Scalable Reliable SD Erlang Design

Author: Chechina Natalia
Ghaffari Amir
Green Rickard
Lundin Kenneth
Trinder Phil
Virding Robert
Publication venue: Glasgow University
Publication date: 23/12/2014
Field of study

This technical report presents the design of Scalable Distributed (SD) Erlang: a set of language-level changes that aims to enable Distributed Erlang to scale for server applications on commodity hardware with at most 100,000 cores. We cover a number of aspects, specifically anticipated architecture, anticipated failures, scalable data structures, and scalable computation. Other two components that guided us in the design of SD Erlang are design principles and typical Erlang applications. The design principles summarise the type of modifications we aim to allow Erlang scalability. Erlang exemplars help us to identify the main Erlang scalability issues and hypothetically validate the SD Erlang design

Enlighten

Exploring heterogeneity of unreliable machines for p2p backup

Author: Rzadca Krzysztof
Skowron Piotr
Publication venue
Publication date: 20/04/2013
Field of study

P2P architecture is a viable option for enterprise backup. In contrast to dedicated backup servers, nowadays a standard solution, making backups directly on organization's workstations should be cheaper (as existing hardware is used), more efficient (as there is no single bottleneck server) and more reliable (as the machines are geographically dispersed). We present the architecture of a p2p backup system that uses pairwise replication contracts between a data owner and a replicator. In contrast to standard p2p storage systems using directly a DHT, the contracts allow our system to optimize replicas' placement depending on a specific optimization strategy, and so to take advantage of the heterogeneity of the machines and the network. Such optimization is particularly appealing in the context of backup: replicas can be geographically dispersed, the load sent over the network can be minimized, or the optimization goal can be to minimize the backup/restore time. However, managing the contracts, keeping them consistent and adjusting them in response to dynamically changing environment is challenging. We built a scientific prototype and ran the experiments on 150 workstations in the university's computer laboratories and, separately, on 50 PlanetLab nodes. We found out that the main factor affecting the quality of the system is the availability of the machines. Yet, our main conclusion is that it is possible to build an efficient and reliable backup system on highly unreliable machines (our computers had just 13% average availability)

arXiv.org e-Print Archive

Crossref

A Holistic Approach to Lowering Latency in Geo-distributed Web Applications

Author: Narayanan Shankaranarayanan Puzhavakath
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2016
Field of study

User perceived end-to-end latency of web applications have a huge impact on the revenue for many businesses. The end-to-end latency of web applications is impacted by: (i) User to Application server (front-end) latency which includes downloading and parsing web pages, retrieving further objects requested by javascript executions; and (ii) Application and storage server(back-end) latency which includes retrieving meta-data required for an initial rendering, and subsequent content based on user actions. Improving the user-perceived performance of web applications is challenging, given their complex operating environments involving user-facing web servers, content distribution network (CDN) servers, multi-tiered application servers, and storage servers. Further, the application and storage servers are often deployed on multi-tenant cloud platforms that show high performance variability. While many novel approaches like SPDY and geo-replicated datastores have been developed to improve their performance, many of these solutions are specific to certain layers, and may have different impact on user-perceived performance. The primary goal of this thesis is to address the above challenges in a holistic manner, focusing specifically on improving the end-to-end latency of geo-distributed multi-tiered web applications. This thesis makes the following contributions: (i) First, it reduces user-facing latency by helping CDNs identify and map objects that are more critical for page-load latency to the faster CDN cache layers. Through controlled experiments on real-world web pages, we show the potential of our approach to reduce hundreds of milliseconds in latency without affecting overall CDN miss rates. (ii) Next, it reduces back-end latency by optimally adapting the datastore replication policies (including number and location of replicas) to the heterogeneity in workloads. We show the benefits of our replication models using real-world traces of Twitter, Wikipedia and Gowalla on a 8 datacenter Cassandra cluster deployed on EC2. (iii) Finally, it makes multi-tier applications resilient to the inherent performance variability in the cloud through fine-grained request redirection. We highlight the benefits of our approach by deploying three real-world applications on commercial cloud platforms

Purdue E-Pubs

Scalable Strong Consistency for Web Applications

Author: Pierre G.E.O.
Sivasubramanian S.
Steen M.R. van
Publication venue: New York, NY
Publication date: 01/01/2004
Field of study

VU Research Portal

Crossref

Recommended from our members

From Controlled Data-Center Environments to Open Distributed Environments: Scalable, Efficient, and Robust Systems with Extended Functionality

Author: Zakhary Victor
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

The past two decades have witnessed several paradigm shifts in computing environments. Starting from cloud computing which offers on-demand allocation of storage, network, compute, and memory resources, as well as other services, in a pay-as-you-go billingmodel. Ending with the rise of permissionless blockchain technology, a decentralized computing paradigm with lower trust assumptions and limitless number of participants. Unlike in the cloud, where all the computing resources are owned by some trusted cloud provider, permissionless blockchains allow computing resources owned by possibly malicious parties to join and leave their network without obtaining permission from some centralized trusted authority. Still, in the presence of malicious parties, permissionlessblockchain networks can perform general computations and make progress. Cloud computing is powered by geographically distributed data-centers controlled and managed by trusted cloud service providers and promises theoretically infinite computing resources. On the other hand, permissionless blockchains are powered by open networks of geographically distributed computing nodes owned by entities that are not necessarily known or trusted. This paradigm shift requires a reconsideration of distributed data management protocols and distributed system designs that assume low latency across system components, inelastic computing resources, or fully trusted computing resources.In this dissertation, we propose new system designs and optimizations that address scalability and efficiency of distributed data management systems in cloud environments. We also propose several protocols and new programming paradigms to extend the functionality and enhance the robustness of permissionless blockchains. The work presented spans global-scale transaction processing, large-scale stream processing, atomic transaction processing across permissionless blockchains, and extending the functionality and the use-cases of permissionless blockchains. In all these directions, the focus is on rethinking system and protocol designs to account for novel cloud and permissionless blockchain assumptions. For global-scale transaction processing, we propose GPlacer, a placement optimization framework that decides replica placement of fully and partial geo-replicated databases. For large-scale stream processing, we propose Cache-on-Track (CoT) an adaptive and elastic client-side cache that addresses server-side load-imbalances that occur in large-scale distributed storage layers. In permissionless blockchain transaction processing, we propose AC3WN, the first correct cross-chain commitment protocol that guarantees atomicity of cross-chain transactions. Also, we propose TXSC, a transactional smart contract programming framework. TXSC provides smart contract developers with transaction primitives. These primitives allow developers to write smart contracts without the need to reason about the anomalies that can arise due to concurrent smart contract function executions. In addition, we propose a forward-looking architecture that unifies both permissioned and permissionless blockchains and exploits the running infrastructure of permissionless blockchains to build global asset management systems

eScholarship - University of California

Recommended from our members

The Multicast Policy and Its Relationship to Replicated Data Placement

Author: Milo Amir
Wolfson Ouri
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1989
Field of study

In this paper we consider the communication complexity of maintaining the replicas of a logical data-item, in a database distributed over a computer network. We propose a new method, called the minimum spanning tree write, by which a processor in the network should multicast a write of a logical data-item, to all the processors that store replicas of the item. Then we show that the minimum spanning tree write is optimal from the communication cost point of view. We also demonstrate that the method by which a write is multicast to all the replicas of a data-item, affects the optimal replication scheme of the item, i.e., at which processors in the network the replicas should be located. Therefore, next we consider the problem of determining an optimal replication scheme for a data item, assuming that each processor employs the minimum spanning tree write at run-time. The problem for general networks is shown NP-Complete, but we provide efficient algorithms to obtain an optimal allocation scheme for three common types of network topologies. They are completely-connected, tree, and ring networks. For these topologies, efficient algorithms are also provided for the case in which reliability considerations dictate a minimum number of replicas

Columbia University Academic Commons