7,825 research outputs found
AsterixDB: A Scalable, Open Source BDMS
AsterixDB is a new, full-function BDMS (Big Data Management System) with a
feature set that distinguishes it from other platforms in today's open source
Big Data ecosystem. Its features make it well-suited to applications like web
data warehousing, social data storage and analysis, and other use cases related
to Big Data. AsterixDB has a flexible NoSQL style data model; a query language
that supports a wide range of queries; a scalable runtime; partitioned,
LSM-based data storage and indexing (including B+-tree, R-tree, and text
indexes); support for external as well as natively stored data; a rich set of
built-in types; support for fuzzy, spatial, and temporal types and queries; a
built-in notion of data feeds for ingestion of data; and transaction support
akin to that of a NoSQL store.
Development of AsterixDB began in 2009 and led to a mid-2013 initial open
source release. This paper is the first complete description of the resulting
open source AsterixDB system. Covered herein are the system's data model, its
query language, and its software architecture. Also included are a summary of
the current status of the project and a first glimpse into how AsterixDB
performs when compared to alternative technologies, including a parallel
relational DBMS, a popular NoSQL store, and a popular Hadoop-based SQL data
analytics platform, for things that both technologies can do. Also included is
a brief description of some initial trials that the system has undergone and
the lessons learned (and plans laid) based on those early "customer"
engagements
S-Store: Streaming Meets Transaction Processing
Stream processing addresses the needs of real-time applications. Transaction
processing addresses the coordination and safety of short atomic computations.
Heretofore, these two modes of operation existed in separate, stove-piped
systems. In this work, we attempt to fuse the two computational paradigms in a
single system called S-Store. In this way, S-Store can simultaneously
accommodate OLTP and streaming applications. We present a simple transaction
model for streams that integrates seamlessly with a traditional OLTP system. We
chose to build S-Store as an extension of H-Store, an open-source, in-memory,
distributed OLTP database system. By implementing S-Store in this way, we can
make use of the transaction processing facilities that H-Store already
supports, and we can concentrate on the additional implementation features that
are needed to support streaming. Similar implementations could be done using
other main-memory OLTP platforms. We show that we can actually achieve higher
throughput for streaming workloads in S-Store than an equivalent deployment in
H-Store alone. We also show how this can be achieved within H-Store with the
addition of a modest amount of new functionality. Furthermore, we compare
S-Store to two state-of-the-art streaming systems, Spark Streaming and Storm,
and show how S-Store matches and sometimes exceeds their performance while
providing stronger transactional guarantees
06472 Abstracts Collection - XQuery Implementation Paradigms
From 19.11.2006 to 22.11.2006, the Dagstuhl Seminar 06472 ``XQuery Implementation Paradigms'' was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available
MonetDB/XQuery: a fast XQuery processor powered by a relational engine
Relational XQuery systems try to re-use mature relational data management infrastructures to create fast and scalable XML database technology. This paper describes the main features, key contributions, and lessons learned while implementing such a system. Its architecture consists of (i) a range-based encoding of XML documents into relational tables, (ii) a compilation technique that translates XQuery into a basic relational algebra, (iii) a restricted (order) property-aware peephole relational query optimization strategy, and (iv) a mapping from XML update statements into relational updates. Thus, this system implements all essential XML database functionalities (rather than a single feature) such that we can learn from the full consequences of our architectural decisions. While implementing this system, we had to extend the state-of-the-art with a number of new technical contributions, such as loop-lifted staircase join and efficient relational query evaluation strategies for XQuery theta-joins with existential semantics. These contributions as well as the architectural lessons learned are also deemed valuable for other relational back-end engines. The performance and scalability of the resulting system is evaluated on the XMark benchmark up to data sizes of 11GB. The performance section also provides an extensive benchmark comparison of all major XMark results published previously, which confirm that the goal of purely relational XQuery processing, namely speed and scalability, was met
An introduction to Graph Data Management
A graph database is a database where the data structures for the schema
and/or instances are modeled as a (labeled)(directed) graph or generalizations
of it, and where querying is expressed by graph-oriented operations and type
constructors. In this article we present the basic notions of graph databases,
give an historical overview of its main development, and study the main current
systems that implement them
An Evaluation of Physical Disk I/Os for Complex Object Processing
In order to obtain the performance required for nonstandard database environments, a hierarchical complex object model with object references is used as a storage structure for complex objects. Several storage models for these complex objects, as well as a benchmark to evaluate their performance, are described. A cost model for analytical performance evaluation is developed, and the analytical results are validated by means of measurements on the DASDBS, complex object storage system. The results show which storage structures for complex objects are the most efficient under which circumstance
Four Lessons in Versatility or How Query Languages Adapt to the Web
Exposing not only human-centered information, but machine-processable data on the Web is one of the commonalities of recent Web trends. It has enabled a new kind of applications and businesses where the data is used in ways not foreseen by the data providers. Yet this exposition has fractured the Web into islands of data, each in different Web formats: Some providers choose XML, others RDF, again others JSON or OWL, for their data, even in similar domains. This fracturing stifles innovation as application builders have to cope not only with one Web stack (e.g., XML technology) but with several ones, each of considerable complexity. With Xcerpt we have developed a rule- and pattern based query language that aims to give shield application builders from much of this complexity: In a single query language XML and RDF data can be accessed, processed, combined, and re-published. Though the need for combined access to XML and RDF data has been recognized in previous work (including the W3C’s GRDDL), our approach differs in four main aspects: (1) We provide a single language (rather than two separate or embedded languages), thus minimizing the conceptual overhead of dealing with disparate data formats. (2) Both the declarative (logic-based) and the operational semantics are unified in that they apply for querying XML and RDF in the same way. (3) We show that the resulting query language can be implemented reusing traditional database technology, if desirable. Nevertheless, we also give a unified evaluation approach based on interval labelings of graphs that is at least as fast as existing approaches for tree-shaped XML data, yet provides linear time and space querying also for many RDF graphs. We believe that Web query languages are the right tool for declarative data access in Web applications and that Xcerpt is a significant step towards a more convenient, yet highly efficient data access in a “Web of Data”
- …