9,144 research outputs found
Relational Approach to Logical Query Optimization of XPath
To be able to handle the ever growing volumes of XML documents, effective and efficient data management solutions are needed. Managing XML data in a relational DBMS has great potential. Recently, effective relational storage schemes and index structures have been proposed as well as special-purpose join operators to speed up querying of XML data using XPath/XQuery. In this paper, we address the topic of query plan construction and logical query optimization. The claim of this paper is that standard relational algebra extended with special-purpose join operators suffices for logical query optimization. We focus on the XPath accelerator storage scheme and associated staircase join operators, but the approach can be generalized easily
Recommended from our members
STELLAR (Semantic Technologies Enhancing the Lifecycle of Learning Resources): Jisc Final Report
[Project Summary]
As one of the earliest distance learning providers The Open University (OU) has a rich heritage of archived learning materials. An ever increasing amount of that is in digital form and is being deposited with the University Archive. This growth has been driven by digitisation activity from projects such as AVA (Access to Video Assets) and the Fedora-based Open University Digital Library āa place to discover digital and digitised archival content from the OU Library, from videos and images to digitised documentsā. Other digital content is being captured from web archiving activities, such as work to preserve Moodle Virtual Learning Environment course websites. An evidence based understanding is required to inform digital preservation policies, curation strategy and investment in digital library development.
Following the Pre-enhancement, Enhancement and Post-enhancement methodology set out by Jisc, STELLAR adopted the model of a balanced scorecard to ascertain the value ascribed to the non-current learning materials. Four aspects were considered: Personal and professional perspectives of value; Value to the Higher Educational and academic communities; Value to internal processes and cultures; Financial perspectives of value. The outcomes of the survey indicated that stakeholders place a high value on the materials, and that they perceived them to have value in all areas evaluated.
Three OU courses were chosen from the digital library for the transformation stage. These materials were enhanced and transformed into RDF, a process that required more extensive metadata expertise and effort than was expected. Following enhancement the RDF was accessed through a tool called DiscOU, created by a member of the project team from the OUās Knowledge Media Institute. DiscOU uses both linked data and a semantic meaning engine to analyse the meaning of the text in a search query. This is matched against the meaning of the content derived from an index of the full-text of the digital library content.
In the final stage stakeholders were asked through a survey and series of workshops to use the DiscOU proof-of-concept tool to assess their perception of the value of this transformation. This has revealed that overall, academics and other stakeholders in the university do believe that the value of the selected materials was positively impacted by the application of semantic technologies
Modeling views in the layered view model for XML using UML
In data engineering, view formalisms are used to provide flexibility to users and user applications by allowing them to extract and elaborate data from the stored data sources. Conversely, since the introduction of Extensible Markup Language (XML), it is fast emerging as the dominant standard for storing, describing, and interchanging data among various web and heterogeneous data sources. In combination with XML Schema, XML provides rich facilities for defining and constraining user-defined data semantics and properties, a feature that is unique to XML. In this context, it is interesting to investigate traditional database features, such as view models and view design techniques for XML. However, traditional view formalisms are strongly coupled to the data language and its syntax, thus it proves to be a difficult task to support views in the case of semi-structured data models. Therefore, in this paper we propose a Layered View Model (LVM) for XML with conceptual and schemata extensions. Here our work is three-fold; first we propose an approach to separate the implementation and conceptual aspects of the views that provides a clear separation of concerns, thus, allowing analysis and design of views to be separated from their implementation. Secondly, we define representations to express and construct these views at the conceptual level. Thirdly, we define a view transformation methodology for XML views in the LVM, which carries out automated transformation to a view schema and a view query expression in an appropriate query language. Also, to validate and apply the LVM concepts, methods and transformations developed, we propose a view-driven application development framework with the flexibility to develop web and database applications for XML, at varying levels of abstraction
Recommended from our members
UC Berkeley's Cory Hall: Evaluation of Challenges and Potential Applications of Building-to-Grid Implementation
From September 2009 through June 2010, a team of researchers developed, installed, and tested instrumentation on the energy flows in Cory Hall on the UC Berkeley campus to create a Building-to-Grid testbed. The UC Berkeley team was headed by Professor David Culler, and assisted by members from EnerNex, Lawrence Berkeley National Laboratory, California State University Sacramento, and the California Institute for Energy & Environment. While the Berkeley team mapped the load tree of the building, EnerNex researched types of meters, submeters, monitors, and sensors to be used (Task 1). Next the UC Berkeley team analyzed building needs and designed the network of metering components and data storage/visualization software (Task 2). After meeting with vendors in January, the UCB team procured and installed the components starting in late March (Task 3). Next, the UCB team tested and demonstrated the system (Task 4). Meanwhile, the CSUS team documented the methodology and steps necessary to implement a testbed (Task 5) and Harold Galicer developed a roadmap for the CSUS Smart Grid Center with results from the testbed (Task 5a) and evaluated the Cory Hall implementation process (Task 5b). The CSUS team also worked with local utilities to develop an approach to the energy information communication link between buildings and the utility (Task 6). The UC Berkeley team then prepared a roadmap to outline necessary technology development for Building-to-Grid, and presented the results of the project in early July (Task 7). Finally, CIEE evaluated the implementation, noting challenges and potential applications of Building-to-Grid (Task 8). These deliverables are available at the i4Energy site: http://i4energy.org/
HUDDL for description and archive of hydrographic binary data
Many of the attempts to introduce a universal hydrographic binary data format have failed or have been only partially successful. In essence, this is because such formats either have to simplify the data to such an extent that they only support the lowest common subset of all the formats covered, or they attempt to be a superset of all formats and quickly become cumbersome. Neither choice works well in practice. This paper presents a different approach: a standardized description of (past, present, and future) data formats using the Hydrographic Universal Data Description Language (HUDDL), a descriptive language implemented using the Extensible Markup Language (XML). That is, XML is used to provide a structural and physical description of a data format, rather than the content of a particular file. Done correctly, this opens the possibility of automatically generating both multi-language data parsers and documentation for format specification based on their HUDDL descriptions, as well as providing easy version control of them. This solution also provides a powerful approach for archiving a structural description of data along with the data, so that binary data will be easy to access in the future. Intending to provide a relatively low-effort solution to index the wide range of existing formats, we suggest the creation of a catalogue of format descriptions, each of them capturing the logical and physical specifications for a given data format (with its subsequent upgrades). A C/C++ parser code generator is used as an example prototype of one of the possible advantages of the adoption of such a hydrographic data format catalogue
MonetDB/XQuery: a fast XQuery processor powered by a relational engine
Relational XQuery systems try to re-use mature relational data management infrastructures to create fast and scalable XML database technology. This paper describes the main features, key contributions, and lessons learned while implementing such a system. Its architecture consists of (i) a range-based encoding of XML documents into relational tables, (ii) a compilation technique that translates XQuery into a basic relational algebra, (iii) a restricted (order) property-aware peephole relational query optimization strategy, and (iv) a mapping from XML update statements into relational updates. Thus, this system implements all essential XML database functionalities (rather than a single feature) such that we can learn from the full consequences of our architectural decisions. While implementing this system, we had to extend the state-of-the-art with a number of new technical contributions, such as loop-lifted staircase join and efficient relational query evaluation strategies for XQuery theta-joins with existential semantics. These contributions as well as the architectural lessons learned are also deemed valuable for other relational back-end engines. The performance and scalability of the resulting system is evaluated on the XMark benchmark up to data sizes of 11GB. The performance section also provides an extensive benchmark comparison of all major XMark results published previously, which confirm that the goal of purely relational XQuery processing, namely speed and scalability, was met
AsterixDB: A Scalable, Open Source BDMS
AsterixDB is a new, full-function BDMS (Big Data Management System) with a
feature set that distinguishes it from other platforms in today's open source
Big Data ecosystem. Its features make it well-suited to applications like web
data warehousing, social data storage and analysis, and other use cases related
to Big Data. AsterixDB has a flexible NoSQL style data model; a query language
that supports a wide range of queries; a scalable runtime; partitioned,
LSM-based data storage and indexing (including B+-tree, R-tree, and text
indexes); support for external as well as natively stored data; a rich set of
built-in types; support for fuzzy, spatial, and temporal types and queries; a
built-in notion of data feeds for ingestion of data; and transaction support
akin to that of a NoSQL store.
Development of AsterixDB began in 2009 and led to a mid-2013 initial open
source release. This paper is the first complete description of the resulting
open source AsterixDB system. Covered herein are the system's data model, its
query language, and its software architecture. Also included are a summary of
the current status of the project and a first glimpse into how AsterixDB
performs when compared to alternative technologies, including a parallel
relational DBMS, a popular NoSQL store, and a popular Hadoop-based SQL data
analytics platform, for things that both technologies can do. Also included is
a brief description of some initial trials that the system has undergone and
the lessons learned (and plans laid) based on those early "customer"
engagements
- ā¦