20 research outputs found

    Efficient Management of Short-Lived Data

    Full text link
    Motivated by the increasing prominence of loosely-coupled systems, such as mobile and sensor networks, which are characterised by intermittent connectivity and volatile data, we study the tagging of data with so-called expiration times. More specifically, when data are inserted into a database, they may be tagged with time values indicating when they expire, i.e., when they are regarded as stale or invalid and thus are no longer considered part of the database. In a number of applications, expiration times are known and can be assigned at insertion time. We present data structures and algorithms for online management of data tagged with expiration times. The algorithms are based on fully functional, persistent treaps, which are a combination of binary search trees with respect to a primary attribute and heaps with respect to a secondary attribute. The primary attribute implements primary keys, and the secondary attribute stores expiration times in a minimum heap, thus keeping a priority queue of tuples to expire. A detailed and comprehensive experimental study demonstrates the well-behavedness and scalability of the approach as well as its efficiency with respect to a number of competitors.Comment: switched to TimeCenter latex styl

    Towards an Efficient Evaluation of General Queries

    Get PDF
    Database applications often require to evaluate queries containing quantifiers or disjunctions, e.g., for handling general integrity constraints. Existing efficient methods for processing quantifiers depart from the relational model as they rely on non-algebraic procedures. Looking at quantified query evaluation from a new angle, we propose an approach to process quantifiers that makes use of relational algebra operators only. Our approach performs in two phases. The first phase normalizes the queries producing a canonical form. This form permits to improve the translation into relational algebra performed during the second phase. The improved translation relies on a new operator - the complement-join - that generalizes the set difference, on algebraic expressions of universal quantifiers that avoid the expensive division operator in many cases, and on a special processing of disjunctions by means of constrained outer-joins. Our method achieves an efficiency at least comparable with that of previous proposals, better in most cases. Furthermore, it is considerably simpler to implement as it completely relies on relational data structures and operators

    LogBase: A Scalable Log-structured Database System in the Cloud

    Full text link
    Numerous applications such as financial transactions (e.g., stock trading) are write-heavy in nature. The shift from reads to writes in web applications has also been accelerating in recent years. Write-ahead-logging is a common approach for providing recovery capability while improving performance in most storage systems. However, the separation of log and application data incurs write overheads observed in write-heavy environments and hence adversely affects the write throughput and recovery time in the system. In this paper, we introduce LogBase - a scalable log-structured database system that adopts log-only storage for removing the write bottleneck and supporting fast system recovery. LogBase is designed to be dynamically deployed on commodity clusters to take advantage of elastic scaling property of cloud environments. LogBase provides in-memory multiversion indexes for supporting efficient access to data maintained in the log. LogBase also supports transactions that bundle read and write operations spanning across multiple records. We implemented the proposed system and compared it with HBase and a disk-based log-structured record-oriented system modeled after RAMCloud. The experimental results show that LogBase is able to provide sustained write throughput, efficient data access out of the cache, and effective system recovery.Comment: VLDB201

    Advance of the Access Methods

    Get PDF
    The goal of this paper is to outline the advance of the access methods in the last ten years as well as to make review of all available in the accessible bibliography methods

    Yellow Tree: A Distributed Main-memory Spatial Index Structure for Moving Objects

    Get PDF
    Mobile devices equipped with wireless technologies to communicate and positioning systems to locate objects of interest are common place today, providing the impetus to develop location-aware applications. At the heart of location-aware applications are moving objects or objects that continuously change location over time, such as cars in transportation networks or pedestrians or postal packages. Location-aware applications tend to support the tracking of very large numbers of such moving objects as well as many users that are interested in finding out about the locations of other moving objects. Such location-aware applications rely on support from database management systems to model, store, and query moving object data. The management of moving object data exposes the limitations of traditional (spatial) database management systems as well as their index structures designed to keep track of objects\u27 locations. Spatial index structures that have been designed for geographic objects in the past primarily assume data are foremost of static nature (e.g., land parcels, road networks, or airport locations), thus requiring a limited amount of index structure updates and reorganization over a period of time. While handling moving objects however, there is an incumbent need for continuous reorganization of spatial index structures to remain up to date with constantly and rapidly changing object locations. This research addresses some of the key issues surrounding the efficient database management of moving objects whose location update rate to the database system varies from 1 to 30 minutes. Furthermore, we address the design of a highly scaleable and efficient spatial index structure to support location tracking and querying of large amounts of moving objects. We explore the possible architectural and the data structure level changes that are required to handle large numbers of moving objects. We focus specifically on the index structures that are needed to process spatial range queries and object-based queries on constantly changing moving object data. We argue for the case of main memory spatial index structures that dynamically adapt to continuously changing moving object data and concurrently answer spatial range queries efficiently. A proof-of concept implementation called the yellow tree, which is a distributed main-memory index structure, and a simulated environment to generate moving objects is demonstrated. Using experiments conducted on simulated moving object data, we conclude that a distributed main-memory based spatial index structure is required to handle dynamic location updates and efficiently answer spatial range queries on moving objects. Future work on enhancing the query processing performance of yellow tree is also discussed

    Long-term Information Preservation and Access

    Get PDF
    An unprecedented amount of information encompassing almost every facet of human activities across the world is generated daily in the form of zeros and ones, and that is often the only form in which such information is recorded. A good fraction of this information needs to be preserved for periods of time ranging from a few years to centuries. Consequently, the problem of preserving digital information over a long-term has attracted the attention of many organizations, including libraries, government agencies, scientific communities, and individual researchers. In this dissertation, we address three issues that are critical to ensure long-term information preservation and access. The first concerns the core requirement of how to guarantee the integrity of preserved contents. Digital information is in general very fragile because of the many ways errors can be introduced, such as errors introduced because of hardware and media degradation, hardware and software malfunction, operational errors, security breaches, and malicious alterations. To address this problem, we develop a new approach based on efficient and rigorous cryptographic techniques, which will guarantee the integrity of preserved contents with extremely high probability even in the presence of malicious attacks. Our prototype implementation of this approach has been deployed and actively used in the past years in several organizations, including the San Diego Super Computer Center, the Chronopolis Consortium, North Carolina State University, and more recently the Government Printing Office. Second, we consider another crucial component in any preservation system - searching and locating information. The ever-growing size of a long-term archive and the temporality of each preserved item introduce a new set of challenges to providing a fast retrieval of content based on a temporal query. The widely-used cataloguing scheme has serious scalability problems. The standard full-text search approach has serious limitations since it does not deal appropriately with the temporal dimension, and, in particular, is incapable of performing relevancy scoring according to the temporal context. To address these problems, we introduce two types of indexing schemes - a location indexing scheme, and a full-text search indexing scheme. Our location indexing scheme provides optimal operations for inserting and locating a specific version of a preserved item given an item ID and a time point, and our full-text search indexing scheme efficiently handles the scalability problem, supporting relevancy scoring within the temporal context at the same time. Finally, we address the problem of organizing inter-related data, so that future accesses and data exploration can be quickly performed. We, in particular, consider web contents, where we combine a link-analysis scheme with a graph partitioning scheme to put together more closely related contents in the same standard web archive container. We conduct experiments that simulate random browsing of preserved contents, and show that our data organization scheme greatly minimizes the number of containers needed to be accessed for a random browsing session. Our schemes have been tested against real-world data of significant scale, and validated through extensive empirical evaluations
    corecore