20 research outputs found

    A database system with amnesia

    Get PDF
    Big Data comes with huge challenges. Its volume and velocity makes handling, curating, and analytical processing a costly affair. Even to simply “look at” the data within an a priori defined budget and with a guaranteed interactive response time might be impossible to achieve. Commonly applied scale-out approaches will hit the technology and monetary wall soon, if not done so already. Likewise, blindly rejecting data when the channels are full, or reducing the data resolution at the source, might lead to loss of valuable observations. An army of well-educated database administrators or full software stack architects might deal with these challenges albeit at substantial cost. This calls for a mostly knobless DBMS with a fundamental change in database management. Data rotting has been proposed as a direction to find a solution [10, 11]. For the sake of storage management and responsiveness, it lets the DBMS semi-autonomously rot away data. Rotting is based on the systems own unwillingness to keep old data as easily accessible as fresh data. This paper sheds more light on the opportunities and potential impacts of this radical departure in data management. Specifically, we study the case where a DBMS selectively forgets tuples (by marking them inactive) under various amnesia scenarios and with different implementation strategies. Our ultimate goal is to use the findings of this study to morph an existing data management engine to serve demanding big data scientific applications with well-chosen built-in data a

    A Learning Based Framework for Improving Querying on Web Interfaces of Curated Knowledge Bases

    Get PDF
    Knowledge Bases (KBs) are widely used as one of the fundamental components in Semantic Web applications as they provide facts and relationships that can be automatically understood by machines. Curated knowledge bases usually use Resource Description Framework (RDF) as the data representation model. To query the RDF-presented knowledge in curated KBs, Web interfaces are built via SPARQL Endpoints. Currently, querying SPARQL Endpoints has problems like network instability and latency, which affect the query efficiency. To address these issues, we propose a client-side caching framework, SPARQL Endpoint Caching Framework (SECF), aiming at accelerating the overall querying speed over SPARQL Endpoints. SECF identifies the potential issued queries by leveraging the querying patterns learned from clients’ historical queries and prefecthes/caches these queries. In particular, we develop a distance function based on graph edit distance to measure the similarity of SPARQL queries. We propose a feature modelling method to transform SPARQL queries to vector representation that are fed into machine-learning algorithms. A time-aware smoothing-based method, Modified Simple Exponential Smoothing (MSES), is developed for cache replacement. Extensive experiments performed on real-world queries showcase the effectiveness of our approach, which outperforms the state-of-the-art work in terms of the overall querying speed

    Growth of relational model: Interdependence and complementary to big data

    Get PDF
    A database management system is a constant application of science that provides a platform for the creation, movement, and use of voluminous data. The area has witnessed a series of developments and technological advancements from its conventional structured database to the recent buzzword, bigdata. This paper aims to provide a complete model of a relational database that is still being widely used because of its well known ACID properties namely, atomicity, consistency, integrity and durability. Specifically, the objective of this paper is to highlight the adoption of relational model approaches by bigdata techniques. Towards addressing the reason for this in corporation, this paper qualitatively studied the advancements done over a while on the relational data model. First, the variations in the data storage layout are illustrated based on the needs of the application. Second, quick data retrieval techniques like indexing, query processing and concurrency control methods are revealed. The paper provides vital insights to appraise the efficiency of the structured database in the unstructured environment, particularly when both consistency and scalability become an issue in the working of the hybrid transactional and analytical database management system

    Fast transactions for multicore in-memory databases

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 55-57).Though modern multicore machines have sufficient RAM and processors to manage very large in-memory databases, it is not clear what the best strategy for dividing work among cores is. Should each core handle a data partition, avoiding the overhead of concurrency control for most transactions (at the cost of increasing it for cross-partition transactions)? Or should cores access a shared data structure instead? We investigate this question in the context of a fast in-memory database. We describe a new transactionally consistent database storage engine called MAFLINGO. Its cache-centered data structure design provides excellent base key-value store performance, to which we add a new, cache-friendly serializable protocol and support for running large, read-only transactions on a recent snapshot. On a key-value workload, the resulting system introduces negligible performance overhead as compared to a version of our system with transactional support stripped out, while achieving linear scalability versus the number of cores. It also exhibits linear scalability on TPC-C, a popular transactional benchmark. In addition, we show that a partitioning-based approach ceases to be beneficial if the database cannot be partitioned such that only a small fraction of transactions access multiple partitions, making our shared-everything approach more relevant. Finally, based on a survey of results from the literature, we argue that our implementation substantially outperforms previous main-memory databases on TPC-C benchmarks.by Stephen Lyle Tu.S.M

    Enabling efficient OS paging for main-memory OLTP databases

    Full text link
    corecore