20 research outputs found

    UTPB: A Benchmark for Scientific Workflow Provenance Storage and Querying Systems

    Get PDF
    A crucial challenge for scientific workflow management systems is to support the efficient and scalable storage and querying of large provenance datasets that record the history of in silico experiments. As new provenance management systems are being developed, it is important to have benchmarks that can evaluate these systems and provide an unbiased comparison. In this paper, based on the requirements for scientific workflow provenance systems, we design an extensible benchmark that features a collection of techniques and tools for workload generation, query selection, performance measurement, and experimental result interpretation

    A Look Back on the XML Benchmark Project

    Get PDF
    The XML Benchmark Project was started to provide a framework for evaluating the interplay of XML technologies and Database Management Systems. The benchmark lays emphasis on engineering aspects as well as on performance of the query processor. In this chapter the authors present a quick overview of the benchmark and point at some of the experience they gathered during the design of the benchmark and while running it on a variety of platforms. Since the benchmark was designed early in the evolution of XML, our experiences also reflect how the perception of XML changed during the three years that have passed since we started working on the subject. The chapter comprises an overview of the benchmark as well as discussions of some lessons learned

    A Scheme for Evaluating XML Engine on RDBMS

    Full text link

    A Look Back on the XML Benchmark Project

    Get PDF
    The XML Benchmark Project was started to provide a framework for evaluating the interplay of XML technologies and Database Management Systems. The benchmark lays emphasis on engineering aspects as well as on performance of the query processor. In this chapter the authors present a quick overview of the benchmark and point at some of the experience they gathered during the design of the benchmark and while running it on a variety of platforms. Since the benchmark was designed early in the evolution of XML, our experiences also reflect how the perception of XML changed during the three years that have passed since we started working on the subject. The chapter comprises an overview of the benchmark as well as discussions of some lessons learned

    Benchmark de base de dados de suporte a serviços de informação

    Get PDF
    Dissertação de Mestrado em Sistemas de InformaçãoOs serviços de informação podem ser considerados como aplicações especialmente vocacionadas para a recolha, armazenamento, tratamento e disseminação de informação, que podem ser facilmente associados à Internet ou similar, e cuja utilização é feita por parte de públicos variados. O uso crescente da Internet como forma de comunicação e a consequente necessidade de disponibilização de informação em cada vez maiores quantidades requer aplicações eficazes, capazes de responder às solicitações de um grande número de utilizadores. Desta forma, um aspecto crucial para o bom desempenho das aplicações, como os serviços de informação, é a forma como armazenam a informação, mais concretamente o modelo que usam para o fazer. A utilização de bases de dados é sem dúvida a forma mais comum para o armazenamento de informação necessária ao funcionamento das aplicações, sendo o modelo relacional, já com muitos anos de utilização, o mais conhecido. No entanto, uma nova abordagem para a representação de informação é o XML, o qual tem ganho uma cada vez maior aceitação. Uma vez que estas duas abordagens para o armazenamento de informação, o modelo relacional em oposição ao XML, são extremamente relevantes, neste trabalho é feita uma análise comparativa de desempenho, sendo definido um benchmark, com o objectivo de identificar situações onde o uso de uma poderá ter vantagens relativamente ao uso da outra num contexto dos serviços de informação. Isto é feito com o recurso a um sistema de testes, baseado em sistemas existentes mas construído de raiz com vista a dar resposta às necessidades deste trabalho. Os resultados obtidos apontam para um desempenho superior do modelo relacional. No entanto, conclui-se que há situações eventualmente mais favoráveis para o uso de XML, onde o modelo relacional poderá ser inferior, pelo que selecção do modelo a usar terá de ser feita tendo consciência das situações onde cada um é potencialmente mais indicado.Online information services can be considered as applications specially oriented to retrieving, storing, treating and disseminating information, that can be easily associated with the internet (or similar) and which utilization is done by many types of public (general public or more restricted public). The growing use of internet as a mean of communication and the resulting need for greater amounts of information requires efficient applications able to respond to the solicitations of a potentially great number of users. Because of this, a crucial aspect to the good performance of applications like online information services is the way the information is stored, more precisely the model used to do this. The use of databases is the most common way to store information needed by the applications and the relational model, with many years of utilization, is the most well known. A new approach to information representation is the XML that has been gaining a growing acceptance. Because these two approaches for information storage, the relational model and XML, are very important, in this work it was made a comparative analysis of performance, by defining a benchmark. This is done with the goal of identifying the situations where each of this approaches are more adequate in the context of online information services. This is done by using a test system inspired in existing systems, but built from scratch with the goal of satisfying the needs of the present work. The results obtained show a superior performance of the relational model. However, one concludes that there are situations more favorable to XML where the relational model could have an inferior performance. Because of this, the selection of a model should be done with the knowledge of these situations where one model has a potential superior performance than the other

    Clustering-based Labelling Scheme - A Hybrid Approach for Efficient Querying and Updating XML Documents

    Get PDF
    Extensible Markup Language (XML) has become a dominant technology for transferring data through the worldwide web. The XML labelling schemes play a key role in handling XML data efficiently and robustly. Thus, many labelling schemes have been proposed. However, these labelling schemes have limitations and shortcomings. Thus, the aim of this research was to investigate the existing XML labelling schemes and their limitations in order to address the issue of efficiency of XML query performance. This thesis investigated the existing labelling schemes and classified them into three categories based on certain criteria, in order to identify the limitations and challenges of these labelling schemes. Based on the outcomes of this investigation, this thesis proposed a state-of-theart labelling scheme, called clustering-based labelling scheme, to resolve or improve the key limitations such as the efficiency of the XML query processing, labelling XML nodes, and XML updates cost. This thesis argued that using certain existing labelling schemes to label nodes, and using the clustering-based techniques can improve query and labelling nodes efficiency. Theoretically, the proposed scheme is based on dividing the nodes of an XML document into clusters. Two existing labelling schemes, which are the Dewey and LLS labelling schemes, were selected for labelling these clusters and their nodes. Subsequently, the proposed scheme was designed and implemented. In addition, the Dewey and LLS labelling scheme were implemented for the purpose of evaluating the proposed scheme. Subsequently, four experiments were designed in order to test the proposed scheme against the Dewey and LLS labelling schemes. The results of these experiments suggest that the proposed scheme achieved better results than the Dewey and LLS schemes. Consequently, the research hypothesis was accepted overall with few exceptions, and the proposed scheme showed an improvement in the performance and all the targeted features and aspects

    Automatic mapping of XML documents into relational database

    Get PDF
    Extensible Markup Language (XML) nowadays is one of the most important standard media used for exchanging and representing data through the Internet. Storing, updating and retrieving the huge amount of web services data such as XML is an attractive area of research for researchers and database vendors. In this thesis, we propose and develop a new mapping model, called MAXDOR, for storing, rebuilding, updating and querying XML documents using a relational database without making use of any XML schemas in the mapping process. The model addressed the problem of solving the structural hole between ordered hierarchical XML and unordered tabular relational database to enable us to use relational database systems for storing, updating and querying XML data. A multiple link list is used to maintain XML document structure, manage the process of updating document contents and retrieve document contents efficiently. Experiments are done to evaluate MAXDOR model. MAXDOR will be compared with other well-known models available in the literature(Tatarinov et al., 2002) and (Torsten et al., 2004) using total expected value of rebuilding XML document execution time and insertion of token execution time.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    A survey on tree matching and XML retrieval

    Get PDF
    International audienceWith the increasing number of available XML documents, numerous approaches for retrieval have been proposed in the literature. They usually use the tree representation of documents and queries to process them, whether in an implicit or explicit way. Although retrieving XML documents can be considered as a tree matching problem between the query tree and the document trees, only a few approaches take advantage of the algorithms and methods proposed by the graph theory. In this paper, we aim at studying the theoretical approaches proposed in the literature for tree matching and at seeing how these approaches have been adapted to XML querying and retrieval, from both an exact and an approximate matching perspective. This study will allow us to highlight theoretical aspects of graph theory that have not been yet explored in XML retrieval

    Labelling Dynamic XML Documents: A GroupBased Approach

    Get PDF
    Documents that comply with the XML standard are characterised by inherent ordering and their modelling usually takes the form of a tree. Nowadays, applications generate massive amounts of XML data, which requires accurate and efficient query-able XML database systems. XML querying depends on XML labelling in much the same way as relational databases rely on indexes. Document order and structural information are encoded by labelling schemes, thus facilitating their use by queries without having to access the original XML document. Dynamic XML data, data which changes, complicates the labelling scheme. As demonstrated by much research efforts, it is difficult to allocate unique labels to nodes in a dynamic XML tree so that all structural relationships between the nodes are encoded by the labels. Static XML documents are generally managed with labelling schemes that use simple labels. By contrast, dynamic labelling schemes have extra labelling costs and lower query performance to allow random updates irrespective of the document update frequency. Given that static and dynamic XML documents are often not clearly distinguished, a labelling scheme whose efficiency does not depend on updating frequency would be useful. The GroupBased labelling scheme proposed in this thesis is compatible with static as well as dynamic XML documents. In particular, this scheme has a high performance in processing dynamic XML data updates. What differentiates it from other dynamic labelling schemes is its uniform behaviour irrespective of whether the document is static or dynamic, ability to determine all structural relationships between nodes, and the improved query performance in both types of document. The advantages of the GroupBased scheme in comparison to earlier schemes are highlighted by the experiment results
    corecore