Search CORE

240 research outputs found

An Empirical Evaluation of XQuery Processors

Author: Manegold S. (Stefan)
Publication venue: Blackwell Scientific
Publication date: 01/04/2008
Field of study

This paper presents an extensive and detailed experimental evaluation of XQuery processors. The study consists of running five publicly available XQuery benchmarks --- the Michigan benchmark (MBench), XBench, XMach-1, XMark and X007 --- on six XQuery processors, three stand-alone (file-based) XQuery processors (Galax, Qizx/Open, Saxon-B) and three XML/XQuery database systems (BerkeleyDB/XML, MonetDB/XQuery, X-Hive/DB). Next to assessing and comparing the functionality, performance and scalability for the various systems, the major focus of this work is to report in detail about the experiences made while performing such an exhaustive study, to discuss all the problems that we encountered and how we solved them, and hence to hopefully provide some guidelines (or even a recipe) for performing reproducible large-scale experime

CWI's Institutional Repository

An Empirical Evaluation of XQuery Processors

Author: Manegold S. (Stefan)
Publication venue: CWI
Publication date: 01/01/2007
Field of study

CWI's Institutional Repository

Portable and Accurate Collection of Calling-Context-Sensitive Bytecode Metrics for the Java Virtual Machine

Author: Binder Walter
Mezini Mira
Moret Philippe
Sarimbekov Aibek
Schoeberl Martin
Sewe Andreas
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2011
Field of study

Calling-context profiles and dynamic metrics at the bytecode level are important for profiling, workload characterization, program comprehension, and reverse engineering. Prevailing tools for collecting calling-context profiles or dynamic bytecode metrics often provide only incomplete information or suffer from limited compatibility with standard JVMs. However, completeness and accuracy of the profiles is essential for tasks such as workload characterization, and compatibility with standard JVMs is important to ensure that complex workloads can be executed. In this paper, we present the design and implementation of JP2, a new tool that profiles both the inter- and intra-procedural control flow of workloads on standard JVMs. JP2 produces calling-context profiles preserving callsite information, as well as execution statistics at the level of individual basic blocks of code. JP2 is complemented with scripts that compute various dynamic bytecode metrics from the profiles. As a case-study and tutorial on the use of JP2, we use it for cross-profiling for an embedded Java processor

TUbiblio

Crossref

Online Research Database In Technology

Pathfinder: relational XQuery over multi-gigabyte XML inputs in interactive time

Author: Boncz P.A. (Peter)
Grust T.
Manegold S. (Stefan)
Rittinger J.
Teubner J. (Jens)
Publication venue: CWI
Publication date: 01/01/2005
Field of study

Using a relational DBMS as back-end engine for an XQuery processing system leverages relational query optimization and scalable query processing strategies provided by mature DBMS engines in the XML domain. Though a lot of theoretical work has been done in this area and various solutions have been proposed, no complete systems have been made available so far to give the practical evidence that this is a viable approach. In this paper, we describe the ourely relational XQuery processor Pathfinder that has been built on top of the extensible RDBMS MonetDB. Performance results indicate that the system is capable of evaluating XQuery queries efficiently, even if the input XML documents become huge. We additionally present further contributions such as loop-lifted staircase join, techniques to derive order properties and to reduce sorting effort in the generated relational algebra plans, as well as methods for optimizing XQuery joins, which, taken together, enabled us to reach our performance and scalability goal

CWI's Institutional Repository

Dagstuhl News January - December 2006

Author: Wilhelm Reinhard
Publication venue: Dagstuhl Publications. Dagstuhl News
Publication date: 01/01/2006
Field of study

"Dagstuhl News" is a publication edited especially for the members of the Foundation "Informatikzentrum Schloss Dagstuhl" to thank them for their support. The News give a summary of the scientific work being done in Dagstuhl. Each Dagstuhl Seminar is presented by a small abstract describing the contents and scientific highlights of the seminar as well as the perspectives or challenges of the research topic

Dagstuhl Research Online Publication Server

A node partitioning strategy for optimising the performance of XML queries

Author: Marks Gerard
Publication venue: Dublin City University. School of Computing
Publication date: 01/11/2011
Field of study

For ease of communication between heterogeneous systems, the eXtensible Markup Language (XML) has been widely adopted as a data storage format. However, XML query processing presents issues both in terms of query performance and updatability. Thus, many are choosing to shred XML data into relational databases in order to benet from its mature technology. The problem with this approach is that (often complex and time consuming) data transformation processes are required to transform XML data to relational tables and vice versa. Additionally, many of the benets of XML data can be lost during these processes. In this dissertation, we present a process that partitions nodes within an XML document into disjoint subsets. Briefly, as there are fewer partitions than there are nodes, a more efficient join operation can be performed between partitions, thus reducing the number of inefficient node comparisons. The number and size of partitions varies depending on the structure and layout in the XML document, and the number of partitions impacts query performance. Therefore, we also provide a partition classication process, which signicantly reduces the number of partitions because each partition class represents many equivalent partitions within the XML document. In this dissertation, we will demonstrate that our approach outperforms similar approaches for a large subset of XML queries by eliminating complex join operations (where possible) during the query process

DCU Online Research Access Service

Online Integration of Semistructured Data

Author: Handoko
Publication venue: School of Computing and Information Technology
Publication date: 01/01/2017
Field of study

Data integration systems play an important role in the development of distributed multi-database systems. Data integration collects data from heterogeneous and distributed sources, and provides a global view of data to the users. Systems need to process user\u27s applications in the shortest possible time. The virtualization approach to data integration systems ensures that the answers to user requests are the most up-to-date ones. In contrast, the materialization approach reduces data transmission time at the expense of data consistency between the central and remote sites. The virtualization approach to data integration systems can be applied in either batch or online mode. Batch processing requires all data to be available at a central site before processing is started. Delays in transmission of data over a network contribute to a longer processing time. On the other hand, in an online processing mode data integration is performed piece-by-piece as soon as a unit of data is available at the central site. An online processing mode presents the partial results to the users earlier. Due to the heterogeneity of data models at the remote sites, a semistructured global view of data is required. The performance of data integration systems depends on an appropriate data model and the appropriate data integration algorithms used. This thesis presents a new algorithm for immediate processing of data collected from remote and autonomous database systems. The algorithm utilizes the idle processing states while the central site waits for completion of data transmission to produce instant partial results. A decomposition strategy included in the algorithm balances of the computations between the central and remote sites to force maximum resource utilization at both sites. The thesis chooses the XML data model for the representation of semistructured data, and presents a new formalization of the XML data model together with a set of algebraic operations. The XML data model is used to provide a virtual global view of semistructured data. The algebraic operators are consistent with operations of relational algebra, such that any existing syntax based query optimization technique developed for the relational model of data can be directly applied. The thesis shows how to optimize online processing by generating one online integration plan for several data increments. Further, the thesis shows how each independent increment expression can be processed in a parallel mode on a multi core processor system. The dynamic scheduling system proposed in the thesis is able to defer or terminate a plan such that materialization updates and unnecessary computations are minimized. The thesis shows that processing data chunks of fragmented XML documents allows for data integration in a shorter period of time. Finally, the thesis provides a clear formalization of the semistructured data model, a set of algorithms with high-level descriptions, and running examples. These formal backgrounds show that the proposed algorithms are implementable

Research Online