2,077 research outputs found

    Transactional Consistency and Automatic Management in an Application Data Cache

    Get PDF
    http://www.usenix.org/events/osdi10/tech/techAbstracts.html#PortsDistributed in-memory application data caches like memcached are a popular solution for scaling database-driven web sites. These systems are easy to add to existing deployments, and increase performance significantly by reducing load on both the database and application servers. Unfortunately, such caches do not integrate well with the database or the application. They cannot maintain transactional consistency across the entire system, violating the isolation properties of the underlying database. They leave the application responsible for locating data in the cache and keeping it up to date, a frequent source of application complexity and programming errors. Addressing both of these problems, we introduce a transactional cache, TxCache, with a simple programming model. TxCache ensures that any data seen within a transaction, whether it comes from the cache or the database, reflects a slightly stale but consistent snapshot of the database. TxCache makes it easy to add caching to an application by simply designating functions as cacheable; it automatically caches their results, and invalidates the cached data as the underlying database changes. Our experiments found that adding TxCache increased the throughput of a web application by up to 5.2×, only slightly less than a non-transactional cache, showing that consistency does not have to come at the price of performance

    Cache Serializability: Reducing Inconsistency in Edge Transactions

    Full text link
    Read-only caches are widely used in cloud infrastructures to reduce access latency and load on backend databases. Operators view coherent caches as impractical at genuinely large scale and many client-facing caches are updated in an asynchronous manner with best-effort pipelines. Existing solutions that support cache consistency are inapplicable to this scenario since they require a round trip to the database on every cache transaction. Existing incoherent cache technologies are oblivious to transactional data access, even if the backend database supports transactions. We propose T-Cache, a novel caching policy for read-only transactions in which inconsistency is tolerable (won't cause safety violations) but undesirable (has a cost). T-Cache improves cache consistency despite asynchronous and unreliable communication between the cache and the database. We define cache-serializability, a variant of serializability that is suitable for incoherent caches, and prove that with unbounded resources T-Cache implements this new specification. With limited resources, T-Cache allows the system manager to choose a trade-off between performance and consistency. Our evaluation shows that T-Cache detects many inconsistencies with only nominal overhead. We use synthetic workloads to demonstrate the efficacy of T-Cache when data accesses are clustered and its adaptive reaction to workload changes. With workloads based on the real-world topologies, T-Cache detects 43-70% of the inconsistencies and increases the rate of consistent transactions by 33-58%.Comment: Ittay Eyal, Ken Birman, Robbert van Renesse, "Cache Serializability: Reducing Inconsistency in Edge Transactions," Distributed Computing Systems (ICDCS), IEEE 35th International Conference on, June~29 2015--July~2 201

    Lock-free Concurrent Data Structures

    Full text link
    Concurrent data structures are the data sharing side of parallel programming. Data structures give the means to the program to store data, but also provide operations to the program to access and manipulate these data. These operations are implemented through algorithms that have to be efficient. In the sequential setting, data structures are crucially important for the performance of the respective computation. In the parallel programming setting, their importance becomes more crucial because of the increased use of data and resource sharing for utilizing parallelism. The first and main goal of this chapter is to provide a sufficient background and intuition to help the interested reader to navigate in the complex research area of lock-free data structures. The second goal is to offer the programmer familiarity to the subject that will allow her to use truly concurrent methods.Comment: To appear in "Programming Multi-core and Many-core Computing Systems", eds. S. Pllana and F. Xhafa, Wiley Series on Parallel and Distributed Computin

    HeTM: Transactional Memory for Heterogeneous Systems

    Full text link
    Modern heterogeneous computing architectures, which couple multi-core CPUs with discrete many-core GPUs (or other specialized hardware accelerators), enable unprecedented peak performance and energy efficiency levels. Unfortunately, though, developing applications that can take full advantage of the potential of heterogeneous systems is a notoriously hard task. This work takes a step towards reducing the complexity of programming heterogeneous systems by introducing the abstraction of Heterogeneous Transactional Memory (HeTM). HeTM provides programmers with the illusion of a single memory region, shared among the CPUs and the (discrete) GPU(s) of a heterogeneous system, with support for atomic transactions. Besides introducing the abstract semantics and programming model of HeTM, we present the design and evaluation of a concrete implementation of the proposed abstraction, which we named Speculative HeTM (SHeTM). SHeTM makes use of a novel design that leverages on speculative techniques and aims at hiding the inherently large communication latency between CPUs and discrete GPUs and at minimizing inter-device synchronization overhead. SHeTM is based on a modular and extensible design that allows for easily integrating alternative TM implementations on the CPU's and GPU's sides, which allows the flexibility to adopt, on either side, the TM implementation (e.g., in hardware or software) that best fits the applications' workload and the architectural characteristics of the processing unit. We demonstrate the efficiency of the SHeTM via an extensive quantitative study based both on synthetic benchmarks and on a porting of a popular object caching system.Comment: The current work was accepted in the 28th International Conference on Parallel Architectures and Compilation Techniques (PACT'19

    Providing Transaction Class-Based QoS in In-Memory Data Grids via Machine Learning

    Get PDF
    Elastic architectures and the ”pay-as-you-go” resource pricing model offered by many cloud infrastructure providers may seem the right choice for companies dealing with data centric applications characterized by high variable workload. In such a context, in-memory transactional data grids have demonstrated to be particularly suited for exploiting advantages provided by elastic computing platforms, mainly thanks to their ability to be dynamically (re-)sized and tuned. Anyway, when specific QoS requirements have to be met, this kind of architectures have revealed to be complex to be managed by humans. Particularly, their management is a very complex task without the stand of mechanisms supporting run-time automatic sizing/tuning of the data platform and the underlying (virtual) hardware resources provided by the cloud. In this paper, we present a neural network-based architecture where the system is constantly and automatically re-configured, particularly in terms of computing resources

    Software caching techniques and hardware optimizations for on-chip local memories

    Get PDF
    Despite the fact that the most viable L1 memories in processors are caches, on-chip local memories have been a great topic of consideration lately. Local memories are an interesting design option due to their many benefits: less area occupancy, reduced energy consumption and fast and constant access time. These benefits are especially interesting for the design of modern multicore processors since power and latency are important assets in computer architecture today. Also, local memories do not generate coherency traffic which is important for the scalability of the multicore systems. Unfortunately, local memories have not been well accepted in modern processors yet, mainly due to their poor programmability. Systems with on-chip local memories do not have hardware support for transparent data transfers between local and global memories, and thus ease of programming is one of the main impediments for the broad acceptance of those systems. This thesis addresses software and hardware optimizations regarding the programmability, and the usage of the on-chip local memories in the context of both single-core and multicore systems. Software optimizations are related to the software caching techniques. Software cache is a robust approach to provide the user with a transparent view of the memory architecture; but this software approach can suffer from poor performance. In this thesis, we start optimizing traditional software cache by proposing a hierarchical, hybrid software-cache architecture. Afterwards, we develop few optimizations in order to speedup our hybrid software cache as much as possible. As the result of the software optimizations we obtain that our hybrid software cache performs from 4 to 10 times faster than traditional software cache on a set of NAS parallel benchmarks. We do not stop with software caching. We cover some other aspects of the architectures with on-chip local memories, such as the quality of the generated code and its correspondence with the quality of the buffer management in local memories, in order to improve performance of these architectures. Therefore, we run our research till we reach the limit in software and start proposing optimizations on the hardware level. Two hardware proposals are presented in this thesis. One is about relaxing alignment constraints imposed in the architectures with on-chip local memories and the other proposal is about accelerating the management of local memories by providing hardware support for the majority of actions performed in our software cache.Malgrat les memòries cau encara son el component basic pel disseny del subsistema de memòria, les memòries locals han esdevingut una alternativa degut a les seves característiques pel que fa a l’ocupació d’àrea, el seu consum energètic i el seu rendiment amb un temps d’accés ràpid i constant. Aquestes característiques son d’especial interès quan les properes arquitectures multi-nucli estan limitades pel consum de potencia i la latència del subsistema de memòria.Les memòries locals pateixen de limitacions respecte la complexitat en la seva programació, fet que dificulta la seva introducció en arquitectures multi-nucli, tot i els avantatges esmentats anteriorment. Aquesta tesi presenta un seguit de solucions basades en programari i maquinari específicament dissenyat per resoldre aquestes limitacions.Les optimitzacions del programari estan basades amb tècniques d'emmagatzematge de memòria cau suportades per llibreries especifiques. La memòria cau per programari és un sòlid mètode per proporcionar a l'usuari una visió transparent de l'arquitectura, però aquest enfocament pot patir d'un rendiment deficient. En aquesta tesi, es proposa una estructura jeràrquica i híbrida. Posteriorment, desenvolupem optimitzacions per tal d'accelerar l’execució del programari que suporta el disseny de la memòria cau. Com a resultat de les optimitzacions realitzades, obtenim que el nostre disseny híbrid es comporta de 4 a 10 vegades més ràpid que una implementació tradicional de memòria cau sobre un conjunt d’aplicacions de referencia, com son els “NAS parallel benchmarks”.El treball de tesi inclou altres aspectes de les arquitectures amb memòries locals, com ara la qualitat del codi generat i la seva correspondència amb la qualitat de la gestió de memòria intermèdia en les memòries locals, per tal de millorar el rendiment d'aquestes arquitectures. La tesi desenvolupa propostes basades estrictament en el disseny de nou maquinari per tal de millorar el rendiment de les memòries locals quan ja no es possible realitzar mes optimitzacions en el programari. En particular, la tesi presenta dues propostes de maquinari: una relaxa les restriccions imposades per les memòries locals respecte l’alineament de dades, l’altra introdueix maquinari específic per accelerar les operacions mes usuals sobre les memòries locals

    A trigger-based middleware cache for ORMs

    Get PDF
    ACM/IFIP/USENIX 12th International Middleware Conference, Lisbon, Portugal, December 12-16, 2011. ProceedingsCaching is an important technique in scaling storage for high-traffic web applications. Usually, building caching mechanisms involves significant effort from the application developer to maintain and invalidate data in the cache. In this work we present CacheGenie, a caching middleware which makes it easy for web application developers to use caching mechanisms in their applications. CacheGenie provides high-level caching abstractions for common query patterns in web applications based on Object-RelationalMapping (ORM) frameworks. Using these abstractions, the developer does not have to worry about managing the cache (e.g., insertion and deletion) or maintaining consistency (e.g., invalidation or updates) when writing application code. We design and implement CacheGenie in the popular Django web application framework, with PostgreSQL as the database backend and memcached as the caching layer. To automatically invalidate or update cached data, we use triggers inside the database. CacheGenie requires no modifications to PostgreSQL or memcached. To evaluate our prototype, we port several Pinax web applications to use our caching abstractions. Our results show that it takes little effort for application developers to use CacheGenie, and that CacheGenie improves throughput by 2-2.5× for read-mostly workloads in Pinax.Quanta Computer (Firm

    Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

    Full text link
    Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for optimized solution to a specific real world problem, big data system are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL seems to be the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. This paper presents feature and use case analysis and comparison of the four main data models namely document oriented, key value, graph and wide column. Moreover, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice. Typically, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings forth second facet of big data storage, big data file formats, into picture. The second half of the research paper compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Decentralized storage and blockchain are seen as the next generation of big data storage and its challenges and future prospects have also been discussed
    corecore