2,562 research outputs found
Querying Visible and Invisible Information
We provide a wide-ranging study of the scenario where a subset of the relations in the schema are visible - that is, their complete contents are known - while the remaining relations are invisible. We also have integrity constraints (invariants given by logical sentences) which may relate the visible relations to the invisible ones. We want to determine which information about a query (a positive existential sentence) can be inferred from the visible instance and the constraints. We consider both positive and negative query information, that is, whether the query or its negation holds. We consider the instance-level version of the problem, where both the query and the visible instance are given, as well as the schema-level version, where we want to know whether truth or falsity of the query can be inferred in some instance of the schema
A Critical Comparison of NOSQL Databases in the Context of Acid and Base
This starred paper will discuss two major types of databases – Relational and NOSQL – and analyze the different models used by these databases. In particular, it will focus on the choice of the ACID or BASE model to be more appropriate for the NOSQL databases. NOSQL databases use the BASE model because they do not usually comply with ACID model, something used by relational databases. However, some NOSQL databases adopt additional approaches and techniques to make the database comply with ACID model. In this light, this paper will explore some of these approaches and explain why NOSQL databases cannot simply follow the ACID model. What are the reasons behind the extensive use of the BASE model? What are some of the advantages and disadvantages of not using ACID? Particular attention will be paid to analyze if one model is better or superior to the other. These questions will be answered by reviewing existing research conducted on some of the NOSQL databases such as Cassandra, DynamoDB, MongoDB and Neo4j
The Next Generation of EMPRESS: A Metadata Management System For Accelerated Scientific Discovery at Exascale
Scientific data sets have grown rapidly in recent years, outpacing the growth in memory and network bandwidths. This I/O bottleneck has made it increasingly difficult for scientists to read and search outputted datasets in an attempt to find features of interest. In this paper, we will present the next generation of EMPRESS, a scalable metadata management service that offers the following solution: users can tag features of interest and search these tags without having to read in the associated datasets. EMPRESS provides, in essence, a digital scientific notebook where scientists can write down observations and highlight interesting results, and an efficient way to search these annotations. EMPRESS also provides storage-system independent physical metadata, providing a portable way for users to read both metadata and the associated data. EMPRESS offers scalability through two different deployment modes: local , which runs on the compute nodes and dedicated, which uses a set of dedicated, shared-nothing servers. EMPRESS also provides robust fault tolerance and transaction management, which is crucial to supporting workflows
Public Commons for Geospatial Data: A Conceptual Model
A wide variety of spatial data collection efforts are ongoing throughout local, state and federal agencies, private firms and non-profit organizations. Each effort is established for a different purpose but organizations and individuals often collect and maintain the same or similar information. The United States federal government has undertaken many initiatives such as the National Spatial Data Infrastructure, the National Map and Geospatial One-Stop to reduce duplicative spatial data collection and promote the coordinated use, sharing, and dissemination of spatial data nationwide. A key premise in most of these initiatives is that no national government will be able to gather and maintain more than a small percentage of the geographic data that users want and desire. Thus, national initiatives depend typically on the cooperation of those already gathering spatial data and those using GIs to meet specific needs to help construct and maintain these spatial data infrastructures and geo-libraries for their nations (Onsrud 2001). Some of the impediments to widespread spatial data sharing are well known from directly asking GIs data producers why they are not currently involved in creating datasets that are of common or compatible formats, documenting their datasets in a standardized metadata format or making their datasets more readily available to others through Data Clearinghouses or geo-libraries. The research described in this thesis addresses the impediments to wide-scale spatial data sharing faced by GIs data producers and explores a new conceptual data-sharing approach, the Public Commons for Geospatial Data, that supports user-friendly metadata creation, open access licenses, archival services and documentation of parent lineage of the contributors and value- adders of digital spatial data sets
S-Store: Streaming Meets Transaction Processing
Stream processing addresses the needs of real-time applications. Transaction
processing addresses the coordination and safety of short atomic computations.
Heretofore, these two modes of operation existed in separate, stove-piped
systems. In this work, we attempt to fuse the two computational paradigms in a
single system called S-Store. In this way, S-Store can simultaneously
accommodate OLTP and streaming applications. We present a simple transaction
model for streams that integrates seamlessly with a traditional OLTP system. We
chose to build S-Store as an extension of H-Store, an open-source, in-memory,
distributed OLTP database system. By implementing S-Store in this way, we can
make use of the transaction processing facilities that H-Store already
supports, and we can concentrate on the additional implementation features that
are needed to support streaming. Similar implementations could be done using
other main-memory OLTP platforms. We show that we can actually achieve higher
throughput for streaming workloads in S-Store than an equivalent deployment in
H-Store alone. We also show how this can be achieved within H-Store with the
addition of a modest amount of new functionality. Furthermore, we compare
S-Store to two state-of-the-art streaming systems, Spark Streaming and Storm,
and show how S-Store matches and sometimes exceeds their performance while
providing stronger transactional guarantees
Confidential remote computing
Since their market launch in late 2015, trusted hardware enclaves have revolutionised the computing world with data-in-use protections. Their security features of confidentiality, integrity and attestation attract many application developers to move their valuable assets, such as cryptographic keys, password managers, private data, secret algorithms and mission-critical operations, into them. The potential security issues have not been well explored yet, and the quick integration movement into these widely available hardware technologies has created emerging problems. Today system and application designers utilise enclave-based protections for critical assets; however, the gap within the area of hardware-software co-design causes these applications to fail to benefit from strong hardware features. This research presents hands-on experiences, techniques and models on the correct utilisation of hardware enclaves in real-world systems.
We begin with designing a generic template for scalable many-party applications processing private data with mutually agreed public code. Many-party applications can vary from smart-grid systems to electronic voting infrastructures and block-chain smart contracts to internet-of-things deployments. Next, our research extensively examines private algorithms executing inside trusted hardware enclaves. We present practical use cases for protecting intellectual property, valuable algorithms and business or game logic besides private data. Our mechanisms allow querying private algorithms on rental services, querying private data with privacy filters such as differential privacy budgets, and integrity-protected computing power as a service. These experiences lead us to consolidate the disparate research into a unified Confidential Remote Computing (CRC) model. CRC consists of three main areas: the trusted hardware, the software development and the attestation domains. It resolves the ambiguity of trust in relevant fields and provides a systematic view of the field from past to future. Lastly, we examine the questions and misconceptions about malicious software profiting from security features offered by the hardware.
The more popular idea of confidential computing focuses on servers managed by major technology vendors and cloud infrastructures. In contrast, CRC focuses on practices in a more decentralised setting for end-users, system designers and developers
Antidote SQL: SQL for Weakly Consistent Databases
Distributed storage systems, such as NoSQL databases, employ weakly consistent
semantics for providing good scalability and availability forweb services and applications.
NoSQL offers scalability for the applications but it lacks functionality that could be used
by programmers for reasoning about the correctness of their applications. On the other
hand, SQL databases provide strongly consistent semantics that hurt availability.
The goal of this work is to provide efficient support for SQL features on top of AntidoteDB
NoSQL database, a weakly consistent data store. To that end, we extend AQL, a
SQL interface for the AntidoteDB database, with new designed solutions for supporting
common SQL features. We focus on improving support for the SQL invariant known as
referential integrity, by providing solutions with relaxed and strict consistency models,
this latter at the cost of requiring coordination among replicas; on allowing to apply
partitioning to tables; and on developing an indexing system for managing primary and
secondary indexes on the database, and for improving query processing inspired on SQL
syntax.
We evaluate our solution using the Basho Bench benchmarking tool and compare
the performance of both systems, AQL and AntidoteDB, regarding the cost introduced
by the referential integrity mechanism, the benefits of using secondary indexes on the
query processing, and the performance of partitioning the database. Our results show
that our referential integrity solution is optimal in performance when delete operations
are not issued to the database, although noticeably slower with deletes on cascade, and
implementing secondary indexes on the NoSQL data store shows improvements on query
performance. We also show that partitioning does not harm the performance of the
system
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
- …