2,562 research outputs found

    Querying Visible and Invisible Information

    Get PDF
    We provide a wide-ranging study of the scenario where a subset of the relations in the schema are visible - that is, their complete contents are known - while the remaining relations are invisible. We also have integrity constraints (invariants given by logical sentences) which may relate the visible relations to the invisible ones. We want to determine which information about a query (a positive existential sentence) can be inferred from the visible instance and the constraints. We consider both positive and negative query information, that is, whether the query or its negation holds. We consider the instance-level version of the problem, where both the query and the visible instance are given, as well as the schema-level version, where we want to know whether truth or falsity of the query can be inferred in some instance of the schema

    A Critical Comparison of NOSQL Databases in the Context of Acid and Base

    Get PDF
    This starred paper will discuss two major types of databases – Relational and NOSQL – and analyze the different models used by these databases. In particular, it will focus on the choice of the ACID or BASE model to be more appropriate for the NOSQL databases. NOSQL databases use the BASE model because they do not usually comply with ACID model, something used by relational databases. However, some NOSQL databases adopt additional approaches and techniques to make the database comply with ACID model. In this light, this paper will explore some of these approaches and explain why NOSQL databases cannot simply follow the ACID model. What are the reasons behind the extensive use of the BASE model? What are some of the advantages and disadvantages of not using ACID? Particular attention will be paid to analyze if one model is better or superior to the other. These questions will be answered by reviewing existing research conducted on some of the NOSQL databases such as Cassandra, DynamoDB, MongoDB and Neo4j

    The Next Generation of EMPRESS: A Metadata Management System For Accelerated Scientific Discovery at Exascale

    Get PDF
    Scientific data sets have grown rapidly in recent years, outpacing the growth in memory and network bandwidths. This I/O bottleneck has made it increasingly difficult for scientists to read and search outputted datasets in an attempt to find features of interest. In this paper, we will present the next generation of EMPRESS, a scalable metadata management service that offers the following solution: users can tag features of interest and search these tags without having to read in the associated datasets. EMPRESS provides, in essence, a digital scientific notebook where scientists can write down observations and highlight interesting results, and an efficient way to search these annotations. EMPRESS also provides storage-system independent physical metadata, providing a portable way for users to read both metadata and the associated data. EMPRESS offers scalability through two different deployment modes: local , which runs on the compute nodes and dedicated, which uses a set of dedicated, shared-nothing servers. EMPRESS also provides robust fault tolerance and transaction management, which is crucial to supporting workflows

    Public Commons for Geospatial Data: A Conceptual Model

    Get PDF
    A wide variety of spatial data collection efforts are ongoing throughout local, state and federal agencies, private firms and non-profit organizations. Each effort is established for a different purpose but organizations and individuals often collect and maintain the same or similar information. The United States federal government has undertaken many initiatives such as the National Spatial Data Infrastructure, the National Map and Geospatial One-Stop to reduce duplicative spatial data collection and promote the coordinated use, sharing, and dissemination of spatial data nationwide. A key premise in most of these initiatives is that no national government will be able to gather and maintain more than a small percentage of the geographic data that users want and desire. Thus, national initiatives depend typically on the cooperation of those already gathering spatial data and those using GIs to meet specific needs to help construct and maintain these spatial data infrastructures and geo-libraries for their nations (Onsrud 2001). Some of the impediments to widespread spatial data sharing are well known from directly asking GIs data producers why they are not currently involved in creating datasets that are of common or compatible formats, documenting their datasets in a standardized metadata format or making their datasets more readily available to others through Data Clearinghouses or geo-libraries. The research described in this thesis addresses the impediments to wide-scale spatial data sharing faced by GIs data producers and explores a new conceptual data-sharing approach, the Public Commons for Geospatial Data, that supports user-friendly metadata creation, open access licenses, archival services and documentation of parent lineage of the contributors and value- adders of digital spatial data sets

    S-Store: Streaming Meets Transaction Processing

    Get PDF
    Stream processing addresses the needs of real-time applications. Transaction processing addresses the coordination and safety of short atomic computations. Heretofore, these two modes of operation existed in separate, stove-piped systems. In this work, we attempt to fuse the two computational paradigms in a single system called S-Store. In this way, S-Store can simultaneously accommodate OLTP and streaming applications. We present a simple transaction model for streams that integrates seamlessly with a traditional OLTP system. We chose to build S-Store as an extension of H-Store, an open-source, in-memory, distributed OLTP database system. By implementing S-Store in this way, we can make use of the transaction processing facilities that H-Store already supports, and we can concentrate on the additional implementation features that are needed to support streaming. Similar implementations could be done using other main-memory OLTP platforms. We show that we can actually achieve higher throughput for streaming workloads in S-Store than an equivalent deployment in H-Store alone. We also show how this can be achieved within H-Store with the addition of a modest amount of new functionality. Furthermore, we compare S-Store to two state-of-the-art streaming systems, Spark Streaming and Storm, and show how S-Store matches and sometimes exceeds their performance while providing stronger transactional guarantees

    Confidential remote computing

    Get PDF
    Since their market launch in late 2015, trusted hardware enclaves have revolutionised the computing world with data-in-use protections. Their security features of confidentiality, integrity and attestation attract many application developers to move their valuable assets, such as cryptographic keys, password managers, private data, secret algorithms and mission-critical operations, into them. The potential security issues have not been well explored yet, and the quick integration movement into these widely available hardware technologies has created emerging problems. Today system and application designers utilise enclave-based protections for critical assets; however, the gap within the area of hardware-software co-design causes these applications to fail to benefit from strong hardware features. This research presents hands-on experiences, techniques and models on the correct utilisation of hardware enclaves in real-world systems. We begin with designing a generic template for scalable many-party applications processing private data with mutually agreed public code. Many-party applications can vary from smart-grid systems to electronic voting infrastructures and block-chain smart contracts to internet-of-things deployments. Next, our research extensively examines private algorithms executing inside trusted hardware enclaves. We present practical use cases for protecting intellectual property, valuable algorithms and business or game logic besides private data. Our mechanisms allow querying private algorithms on rental services, querying private data with privacy filters such as differential privacy budgets, and integrity-protected computing power as a service. These experiences lead us to consolidate the disparate research into a unified Confidential Remote Computing (CRC) model. CRC consists of three main areas: the trusted hardware, the software development and the attestation domains. It resolves the ambiguity of trust in relevant fields and provides a systematic view of the field from past to future. Lastly, we examine the questions and misconceptions about malicious software profiting from security features offered by the hardware. The more popular idea of confidential computing focuses on servers managed by major technology vendors and cloud infrastructures. In contrast, CRC focuses on practices in a more decentralised setting for end-users, system designers and developers

    Antidote SQL: SQL for Weakly Consistent Databases

    Get PDF
    Distributed storage systems, such as NoSQL databases, employ weakly consistent semantics for providing good scalability and availability forweb services and applications. NoSQL offers scalability for the applications but it lacks functionality that could be used by programmers for reasoning about the correctness of their applications. On the other hand, SQL databases provide strongly consistent semantics that hurt availability. The goal of this work is to provide efficient support for SQL features on top of AntidoteDB NoSQL database, a weakly consistent data store. To that end, we extend AQL, a SQL interface for the AntidoteDB database, with new designed solutions for supporting common SQL features. We focus on improving support for the SQL invariant known as referential integrity, by providing solutions with relaxed and strict consistency models, this latter at the cost of requiring coordination among replicas; on allowing to apply partitioning to tables; and on developing an indexing system for managing primary and secondary indexes on the database, and for improving query processing inspired on SQL syntax. We evaluate our solution using the Basho Bench benchmarking tool and compare the performance of both systems, AQL and AntidoteDB, regarding the cost introduced by the referential integrity mechanism, the benefits of using secondary indexes on the query processing, and the performance of partitioning the database. Our results show that our referential integrity solution is optimal in performance when delete operations are not issued to the database, although noticeably slower with deletes on cascade, and implementing secondary indexes on the NoSQL data store shows improvements on query performance. We also show that partitioning does not harm the performance of the system

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    Routing information and presentation

    Get PDF
    corecore