6,915 research outputs found
MonetDB/XQuery: a fast XQuery processor powered by a relational engine
Relational XQuery systems try to re-use mature relational data management infrastructures to create fast and scalable XML database technology. This paper describes the main features, key contributions, and lessons learned while implementing such a system. Its architecture consists of (i) a range-based encoding of XML documents into relational tables, (ii) a compilation technique that translates XQuery into a basic relational algebra, (iii) a restricted (order) property-aware peephole relational query optimization strategy, and (iv) a mapping from XML update statements into relational updates. Thus, this system implements all essential XML database functionalities (rather than a single feature) such that we can learn from the full consequences of our architectural decisions. While implementing this system, we had to extend the state-of-the-art with a number of new technical contributions, such as loop-lifted staircase join and efficient relational query evaluation strategies for XQuery theta-joins with existential semantics. These contributions as well as the architectural lessons learned are also deemed valuable for other relational back-end engines. The performance and scalability of the resulting system is evaluated on the XMark benchmark up to data sizes of 11GB. The performance section also provides an extensive benchmark comparison of all major XMark results published previously, which confirm that the goal of purely relational XQuery processing, namely speed and scalability, was met
Recommended from our members
A Generalization of Band Joins and the Merge-Purge Problem
The problem of merging multiple databases of information about common entities is frequently encountered in large commercial and government organizations. The problem we study is often called the Merge/Purge problem and is difficult to solve both in scale and accuracy. Large repositories of data always have numerous duplicate information entries about the same entities that are difficult to cull together without an intelligent "equational theory" that identifies equivalent items by a complex, domain dependent matching process. We have developed a system for accomplishing this task for lists of names of potential customers in a direct marketing-type application. Our results for statistically generated data are shown to be accurate and effective when processing the data multiple times using different keys for sorting. The system provides a rule programming module that is easy to program and quite good at finding duplicates especially in an environment with massive amounts of data
AsterixDB: A Scalable, Open Source BDMS
AsterixDB is a new, full-function BDMS (Big Data Management System) with a
feature set that distinguishes it from other platforms in today's open source
Big Data ecosystem. Its features make it well-suited to applications like web
data warehousing, social data storage and analysis, and other use cases related
to Big Data. AsterixDB has a flexible NoSQL style data model; a query language
that supports a wide range of queries; a scalable runtime; partitioned,
LSM-based data storage and indexing (including B+-tree, R-tree, and text
indexes); support for external as well as natively stored data; a rich set of
built-in types; support for fuzzy, spatial, and temporal types and queries; a
built-in notion of data feeds for ingestion of data; and transaction support
akin to that of a NoSQL store.
Development of AsterixDB began in 2009 and led to a mid-2013 initial open
source release. This paper is the first complete description of the resulting
open source AsterixDB system. Covered herein are the system's data model, its
query language, and its software architecture. Also included are a summary of
the current status of the project and a first glimpse into how AsterixDB
performs when compared to alternative technologies, including a parallel
relational DBMS, a popular NoSQL store, and a popular Hadoop-based SQL data
analytics platform, for things that both technologies can do. Also included is
a brief description of some initial trials that the system has undergone and
the lessons learned (and plans laid) based on those early "customer"
engagements
Research Reports: 1984 NASA/ASEE Summer Faculty Fellowship Program
A NASA/ASEE Summer Faulty Fellowship Program was conducted at the Marshall Space Flight Center (MSFC). The basic objectives of the programs are: (1) to further the professional knowledge of qualified engineering and science faculty members; (2) to stimulate an exchange of ideas between participants and NASA; (3) to enrich and refresh the research and teaching activities of the participants' institutions; and (4) to contribute to the research objectives of the NASA Centers. The Faculty Fellows spent ten weeks at MSFC engaged in a research project compatible with their interests and background and worked in collaboration with a NASA/MSFC colleague. This document is a compilation of Fellows' reports on their research during the summer of 1984. Topics covered include: (1) data base management; (2) computational fluid dynamics; (3) space debris; (4) X-ray gratings; (5) atomic oxygen exposure; (6) protective coatings for SSME; (7) cryogenics; (8) thermal analysis measurements; (9) solar wind modelling; and (10) binary systems
Optimiser-based recommendations of physical database design
Die Komplexitiät aktueller relationaler Datenbank Management Systeme stellt eine
immer größere Herausforderung an Datenbankadministratoren dar. Jede
Laufzeitumgebung benötigt eine für sie angepasste Konfiguration, um performant
zu operieren. Selbst innerhalb einer Umgebung können sich die Anforderungen im
Laufe der Zeit ändern und eine erneute Anpassung erfordern. Dies zwingt den DBA
sich kontinuierlich und intensiv mit dem System zu beschäftigen. Das Ziel eines
modernen DBMS muss die Unterstützung des DBAs sein, um seine Arbeit mit
automatisierten Prozessen und Handlungsabläufen zu erleichtern und ihm so stets
schnelle und prezise Entscheidungen zu ermöglichen. Diese Arbeit zielt auf die
Beschreibung und teilweise Umsetzung eines unterstützenden Systems, das die
aktuelle DBMS Konfiguration zusammen mit dem aktuellen Anfrageverhalten
analysiert und dem DBA Vorschläge unterbreitet, wie sich die Performanz und
Effizienz des Systems verbessern lässt.Today's relational database management systems are made up of many complex components and managing these presents a growing challenge for database administrators. Every runtime environment can require different configurations to deliver adequate performance. Even withinthe same environment demands can shift over time when workloads change. Keeping up with these demands requires continuous effort from the DBA. The goal of a modern DBMS must be to support the DBA in his work with automated processes and workflows that allow him tomake quick and precise decisions. This work aims at describing and partially implementing asupportive system that will analyse the current DBMS configuration together with its workload to give recommendations on how to improve its performance and efficiency.Ilmenau, Techn. Univ., Diplomarbeit, 200
Exploiting CAFS-ISP
In the summer of 1982, the ICLCUA CAFS Special Interest Group defined three subject areas for working party activity. These were: 1) interfaces with compilers and databases, 2) end-user language facilities and display methods, and 3) text-handling and office automation. The CAFS SIG convened one working party to address the first subject with the following terms of reference: 1) review facilities and map requirements onto them, 2) "Database or CAFS" or "Database on CAFS", 3) training needs for users to bridge to new techniques, and 4) repair specifications to cover gaps in software. The working party interpreted the topic broadly as the data processing professional's, rather than the end-user's, view of and relationship with CAFS. This report is the result of the working party's activities. The report content for good reasons exceeds the terms of reference in their strictest sense. For example, we examine QUERYMASTER, which is deemed to be an end-user tool by ICL, from both the DP and end-user perspectives. First, this is the only interface to CAFS in the current SV201. Secondly, it is necessary for the DP department to understand the end-user's interface to CAFS. Thirdly, the other subjects have not yet been addressed by other active working parties
Working Together Toward Better Health Outcomes
Healthcare organizations and community-based organizations (CBOs) that provide human services are partnering in shared pursuit of better health outcomes. The Partnership for Healthy Outcomes – Nonprofit Finance Fund (NFF), the Center for Health Care Strategies (CHCS), and the Alliance for Strong Families and Communities (Alliance), with support from the Robert Wood Johnson Foundation (RWJF) – set out to capture and analyze the lessons emerging in this dynamic space. Information from more than 200 partnerships serving all 50 US states provide important lessons from, and for, partnerships that hope to improve access to care, address health inequities, and make progress on social issues like food, education, and housing
- …