17 research outputs found

    Bounded version vectors

    Get PDF
    Version vectors play a central role in update tracking under optimistic distributed systems, allowing the detection of obsolete or inconsistent versions of replicated data. Version vectors do not have a bounded representation; they are based on integer counters that grow indefinitely as updates occur. Existing approaches to this problem are scarce; the mechanisms proposed are either unbounded or operate only under specific settings. This paper examines version vectors as a mechanism for data causality tracking and clarifies their role with respect to vector clocks. Then, it introduces bounded stamps and proves them to be a correct alternative to integer counters in version vectors. The resulting mechanism, bounded version vectors, represents the first bounded solution to data causality tracking between replicas subject to local updates and pairwise symmetrical synchronization.FCT project POSI/ICHS/44304/2002, FCT under grant BSAB/390/2003

    Decentralized Access Control in Networked File Systems

    Get PDF
    The Internet enables global sharing of data across organizational boundaries. Traditional access control mechanisms are intended for one or a small number of machines under common administrative control, and rely on maintaining a centralized database of user identities. They fail to scale to a large user base distributed across multiple organizations. This survey provides a taxonomy of decentralized access control mechanisms intended for large scale, in both administrative domains and users. We identify essential properties of such access control mechanisms. We analyze popular networked file systems in the context of our taxonomy

    A Dependency Tracking Storage System for Optimistic Execution of Serverless Applications

    Get PDF
    Serverless computing has become an increasingly popular paradigm for building cloud applications. There has been a recent trend of building stateful applications on top of serverless platforms in the form of workflows composed of individual functions. As functions are short-lived and state is not recoverable across function invocations, these applications typically store state that is used between functions in an external storage system. Such storage systems should enforce concurrency control, as different workflow instances may update overlapping state simultaneously. However, existing concurrency control algorithms typically incur significant latency due to locking or read/write set validation. This is undesirable, since execution latency is an important performance metric for workflow applications as each stage is executed sequentially. Furthermore, they can abort transactions in a manner that is oblivious to application preferences. In this thesis, we present Arbor, a sharded dependency-tracking storage system designed for optimistic execution of serverless workflows while ensuring serializability. Arbor introduces a two-round commit model where submitted client transactions are organized in a dependency graph. Transactions are then processed in batches, off the critical path of client execution, allowing clients to continue executing quickly without having to wait for Arbor to validate each transaction. As Arbor processes transactions, it organizes them into a tree where each branch is a serialized execution and conflicts result in new branches being created. It then commits one branch from this tree and prunes the rest. To minimize re-executions, Arbor chooses the longest branch by default, but application developers can implement their own policies. Pruning branches is simple with Arbor, since it can re-execute the corresponding transactions by invoking the respective functions from the serverless platform. Furthermore, Arbor is designed to be scalable. Data is partitioned by key, but the metadata of its dependency graph is replicated. This design allows single-shard transactions in each batch to be processed independently, while multi-shard transactions are replicated and processed by each shard. Our evaluation on a cluster of machines shows that Arbor’s two-round commit model reduces transaction execution latency by a median value of 1.26x when compared to a system that uses OCC and commits transactions synchronously

    User-activity aware strategies for mobile information access

    Get PDF
    Information access suffers tremendously in wireless networks because of the low correlation between content transferred across low-bandwidth wireless links and actual data used to serve user requests. As a result, conventional content access mechanisms face such problems as unnecessary bandwidth consumption and large response times, and users experience significant performance degradation. In this dissertation, we analyze the cause of those problems and find that the major reason for inefficient information access in wireless networks is the absence of any user-activity awareness in current mechanisms. To solve these problems, we propose three user-activity aware strategies for mobile information access. Through simulations and implementations, we show that our strategies can outperform conventional information access schemes in terms of bandwidth consumption and user-perceived response times.Ph.D.Committee Chair: Raghupathy Sivakumar; Committee Member: Chuanyi Ji; Committee Member: George Riley; Committee Member: Magnus Egerstedt; Committee Member: Umakishore Ramachandra

    DECOUPLING CONSISTENCY DETERMINATION AND TRUST FROM THE UNDERLYING DISTRIBUTED DATA STORES

    Get PDF
    Building applications on cloud services is cost-effective and allows for rapid development and release cycles. However, relying on cloud services can severely limit applications’ ability to control their own consistency policies, and their ability to control data visibility during replication. To understand the tension between strong consistency and security guarantees on one hand and high availability, flexible replication, and performance on the other, it helps to consider two questions. First, is it possible for an application to achieve stricter consistency guarantees than what the cloud provider offers? If we solely rely on the provider service interface, the answer is no. However, if we allow the applications to determine the implementation and the execution of the consistency protocols, then we can achieve much more. The second question is, can an application relay updates over untrusted replicas without revealing sensitive information while maintaining the desired consistency guarantees? Simply encrypting the data is not enough. Encryption does not eliminate information leakage that comes from the meta-data needed for the execution of any consistency protocol. The alternative to encryption—allowing the flow of updates only through trusted replicas— leads to predefined communication patterns. This approach is prone to failures that can cause partitioning in the system. One way to answer “yes” to this question is to allow trust relationships, defined at the application level, to guide the synchronization protocol. My goal in this thesis is to build systems that take advantage of the performance, scalability, and availability of the cloud storage services while, at the same time, bypassing the limitations imposed by cloud service providers’ design choices. The key to achieving this is pushing application-specific decisions where they belong: the application. I defend the following thesis statement: By decoupling consistency determination and trust from the underlying distributed data store, it is possible to (1) support application-specific consistency guarantees; (2) allow for topology independent replication protocols that do not compromise application privacy. First I design and implement Shell, a system architecture for supporting strict consistency guarantees over eventually consistent data stores. Shell is a software layer designed to isolate consistency implementations and cloud-provider APIs from the application code. Shell consists of four internal modules and an application store, which together abstract consistency-related operations and encapsulate communication with the underlying storage layers. Apart from consistency protocols tailored to application needs, Shell provides application-aware conflict resolution without relying on generic heuristics such as the “last write wins.” Shell does not require the application to maintain dependency-tracking in- formation for the execution of the consistency protocols as other existing approaches do. I experimentally evaluate Shell over two different data-stores using real-application traces. I found that using Shell can reduce the inconsistent updates by 10%. I also measure and show the overheads that come from introducing the Shell layer. Second, I design and implement T.Rex, a system for supporting topology-independent replication without the assumption of trust between all the participating replicas. T.Rex uses role-based access control to enable flexible and secure sharing among users with widely varying collaboration types: both users and data items are assigned roles, and a user can access data only if it shares at least one role. Building on top of this abstraction, T.Rex includes several novel mechanisms: I introduce role proofs to prove role membership to others in the role without leaking information to those not in the role. Additionally, I introduce role coherence to prevent updates from leaking across roles. Finally, I use Bloom filters as opaque digests to enable querying of remote cache state without being able to enumerate it. I combine these mechanisms to develop a novel, cryptographically secure, and efficient anti-entropy protocol, T.Rex-Sync. I evaluate T.Rex on a local test-bed, and I show that it achieves security with modest computational and storage overheads

    Device-transparent personal storage

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 83-87).Users increasingly store data collections such as digital photographs on multiple personal devices, each of which typically presents the user with a storage management interface isolated from the contents of all other devices. The result is that collections easily become disorganized and drift out of sync. This thesis presents Eyo, a novel personal storage system that provides device transparency: a user can think in terms of "file X", rather than "file X on device Y", and will see the same set of files on all personal devices. Eyo allows a user to view and manage the entire collection of objects from any of their devices, even from disconnected devices and devices with too little storage to hold all the object content. Eyo separates metadata (application-specific attributes of objects) from the content of objects, allowing even storage-limited devices to store all metadata and thus provide device transparency. Fully replicated metadata allows any set of Eyo devices to efficiently synchronize updates. Applications can specify flexible placement rules to guide Eyo's partial replication of object contents across devices. Eyo's application interface provides first-class access to object version history. If multiple disconnected devices update an object concurrently, Eyo preserves each resulting divergent version of that object. Applications can then examine the history and either coalesce the conflicting versions without user direction, or incorporate these versions naturally into their existing user interfaces. Experiments using Eyo for storage in several example applications-media players, a photo editor, podcast manager, and an email interface-show that device transparency can be had with minor application changes, and within the storage and bandwidth capabilities of typical portable devices.by Jacob Alo Strauss.Ph.D

    LSFS: sistema de ficheiros tolerante a faltas para armazenamento em larga escala

    Get PDF
    Dissertação de mestrado integrado em Engenharia InformáticaA necessidade de armazenar quantidades de informação cada vez maiores tem vindo a acentuar-se nos dias correntes. Conceitos como Internet das Coisas (IoT) e Big Data, atualmente em voga, vem normalmente associados a muito dados motivando a procura por novas formas de armazenar e aceder a informação. Atualmente, milhares de aplicações recorrem a interfaces de sistemas de ficheiros para assegurar a persistência e o rápido acesso aos dados que geram. No entanto, as soluções de sistemas de ficheiros existentes apresentam configurações centralizadas ou orientadas a poucos nodos em redes controladas que se refletem numa escala e disponibilidade limitada. De forma a abordar estes desafios, esta dissertação propõe o sistema LSFS, Large Scale Filesystem, que se trata de um sistema de ficheiros distribuído, compatível com a interface POSIX, capaz de escalar para redes de centenas a milhares de nodos heterogéneos, e ainda garantir elevada resiliência à falha dos seus nodos. Estas propriedades decorrem da sua arquitetura completamente descentralizada, peer-to-peer, e da utilização de protocolos de natureza epidémica. A aplicação destes protocolos no contexto de um sistema de ficheiros é nova, constituindo a principal contribuição desta dissertação. Como outras contribuições, propomos um protótipo do sistema e uma avaliação experimental extensa conduzida com um caso de estudo real e num ambiente com 500 nodos. Os resultados mostram que o sistema LSFS consegue tolerar falhas de caracter catastrófico (p.ex. que contemplam 25% dos nodos totais) mantendo um desempenho de armazenamento estável ao longo do tempo.The need to store ever-increasing amounts of data has become more pronounced in current days. Concepts such as the Internet of Things (IoT) and Big Data, currently in vogue, come usually associated with large amounts of data motivating the search for new ways to store and access it. Today, thousands of applications use file system interfaces to ensure persistence and fast access to the data they manage. However, existing file systems present centralized solutions or targeted to a reduced number of nodes, limiting scale and availability. In order to tackle these challenges, this dissertation proposes LSFS, Large Scale Filesystem, which is a distributed file system, compatible with the POSIX interface, capable of scaling to networks of hundreds to thousands of heterogeneous devices, while ensuring high resilience to node-failure. These properties derive from its completely decentralized peer-to-peer architecture, and the use of epidemic protocols. The application of these protocols in the context of a file system is new and constitutes the main contribution of this thesis. As further contributions, we propose a prototype of the system and an extensive experi mental evaluation conducted with a real case study and in an environment of 500 network nodes. The results show that the LSFS system can tolerate catastrophic failures (e.g. that account for 25% of total nodes) while maintaining stable storage performance over time

    Low-overhead distributed transaction coordination

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 167-173).This thesis presents Granola, a transaction coordination infrastructure for building reliable distributed storage applications. Granola provides a strong consistency model, while significantly reducing transaction coordination overhead. Granola supports general atomic operations, enabling it to be used as a platform on which to build various storage systems, e.g., databases or object stores. We introduce specific support for independent transactions, a new type of distributed transaction, that we can serialize with no locking overhead and no aborts due to write conflicts. Granola uses a novel timestamp-based coordination mechanism to serialize distributed transactions, offering lower latency and higher throughput than previous systems that offer strong consistency. Our experiments show that Granola has low overhead, is scalable and has high throughput. We used Granola to deploy an existing single-node database application, creating a distributed database application with minimal code modifications. We run the TPC-C benchmark on this platform, and achieve 3 x the throughput of existing lock-based approaches.by James Cowling.Ph.D
    corecore