9 research outputs found

    On the use of query-driven XML auto-indexing

    Full text link

    Indeksit XML-tietokannoissa

    Get PDF
    XML-tietomallin käyttö on yleistynyt mm. rakenteisissa dokumenteissa, verkkosovellusten toteuttamisessa ja Internetissä tapahtuvassa tiedonsiirrossa. Tämän myötä tarve XML-muotoisen tiedon pysyvään säilyttämiseen on kasvanut. Tähän tarkoitukseen on kehitetty XML-tietomallia tiedonsäilytys- ja käsittelymuotonaan käyttäviä XML-pohjaisia tietokantoja. XML-muotoiset dokumentit ovat usein rakenteeltaan monimuotoisia ja kooltaan suuria. Tämän vuoksi XML-tietokannanhallintajärjestelmä on suunniteltava ja toteutettava tehokkaaksi, jotta sen avulla voidaan kohtuullisin laitteistoresurssein ja lyhyin vasteajoin suorittaa suuriakin määriä tietokantakyselyitä ja -päivityksiä, jotka voivat myös olla monipuolisia ja rinnakkaisia ja kohdistua suureen määrään tietoa kerrallaan. Tässä työssä esitetään, miten XML-tietokannanhallintajärjestelmän suorituskykyä voidaan merkittävästi parantaa dokumenttien indeksoinnilla. Indeksoinnissa XML-dokumenttien elementeille luodaan yksikäsitteiset tunnisteet, joihin perustuen luodaan erilaisia indeksihakemistoja. Indeksoinnin avulla tieto voidaan tehokkaasti paikantaa tietokannan tietosivuilta ja siirtää tietokannanhallintajärjestelmän tietosivujen ja puskurin välillä, mikä nopeuttaa tietokannanhallintajärjestelmän toimintaa ja lisää sen kykyä käsitellä rinnakkaisia luku- ja kirjoituspyyntöjä. Indeksoinnin avulla voidaan myös tehostaa tietokannanhallintajärjestelmän kyselynkäsittelyalgoritmien toimintaa mahdollistamalla niiden käyttämien joukkoliitosoperaatioiden tehokas toteutus. BaseX- ja eXist ovat XML-pohjaisia tietokannanhallintajärjestelmiä, joissa käytettävissä on useita erilaisia indeksejä. Indeksien toteutus näissä järjestelmissä kuvataan, ja näiden järjestelmien tehokkuutta XML-dokumentteihin tehtävien tietokantakyselyiden suorituksessa mitataan ja arvioidaan tätä varten kehitetyn XMark-koetinkuorman avulla

    A compact and scalable encoding for updating XML based on node labeling schemes

    Get PDF
    The eXtensible Markup Language (XML) has been adopted as the new standard for data exchange on the World Wide Web. As the rate of adoption increases, there is an ever pressing need to store, query and update XML in its native format, thereby eliminating the overhead of parsing and transforming XML in and out of various data formats. However, the hierarchical, ordered and semi-structured properties of the tree structure underlying the XML data model presents many challenges to updating XML. In particular, many of the tree labeling schemes were designed to solve a particular problem or provide a particular feature, often at the expense of other important features. In this dissertation, we identify the core properties that are representative of the desirable characteristics of a good dynamic labeling scheme for XML. We focus on four features central to the outstanding problems in existing dynamic labeling schemes; namely a compact label encoding, scalability, deleted node label reuse and a label storage scheme for binary-encoded bit-string node labels. At present there is no dynamic labeling scheme that integrates support for all four features. We present a novel compact and scalable adaptive encoding method to facilitate a highly constrained growth rate of label size under arbitrary node insertion and deletion scenarios and our encoding method can scale efficiently. We deploy our encoding method in two novel dynamic labeling schemes for XML that can completely avoid node relabeling, process frequently skewed insertions gracefully and reuse deleted node labels

    The Fourth International VLDB Workshop on Management of Uncertain Data

    Get PDF

    Distributed XML Query Processing

    Get PDF
    While centralized query processing over collections of XML data stored at a single site is a well understood problem, centralized query evaluation techniques are inherently limited in their scalability when presented with large collections (or a single, large document) and heavy query workloads. In the context of relational query processing, similar scalability challenges have been overcome by partitioning data collections, distributing them across the sites of a distributed system, and then evaluating queries in a distributed fashion, usually in a way that ensures locality between (sub-)queries and their relevant data. This thesis presents a suite of query evaluation techniques for XML data that follow a similar approach to address the scalability problems encountered by XML query evaluation. Due to the significant differences in data and query models between relational and XML query processing, it is not possible to directly apply distributed query evaluation techniques designed for relational data to the XML scenario. Instead, new distributed query evaluation techniques need to be developed. Thus, in this thesis, an end-to-end solution to the scalability problems encountered by XML query processing is proposed. Based on a data partitioning model that supports both horizontal and vertical fragmentation steps (or any combination of the two), XML collections are fragmented and distributed across the sites of a distributed system. Then, a suite of distributed query evaluation strategies is proposed. These query evaluation techniques ensure locality between each fragment of the collection and the parts of the query corresponding to the data in this fragment. Special attention is paid to scalability and query performance, which is achieved by ensuring a high degree of parallelism during distributed query evaluation and by avoiding access to irrelevant portions of the data. For maximum flexibility, the suite of distributed query evaluation techniques proposed in this thesis provides several alternative approaches for evaluating a given query over a given distributed collection. Thus, to achieve the best performance, it is necessary to predict and compare the expected performance of each of these alternatives. In this work, this is accomplished through a query optimization technique based on a distribution-aware cost model. The same cost model is also used to fine-tune the way a collection is fragmented to the demands of the query workload evaluated over this collection. To evaluate the performance impact of the distributed query evaluation techniques proposed in this thesis, the techniques were implemented within a production-quality XML database system. Based on this implementation, a thorough experimental evaluation was performed. The results of this evaluation confirm that the distributed query evaluation techniques introduced here lead to significant improvements in query performance and scalability both when compared to centralized techniques and when compared to existing distributed query evaluation techniques

    Exploring a striped XML world

    Get PDF
    EXtensible Markup Language, XML, was designed as a markup language for structuring, storing and transporting data on the World Wide Web. The focus of XML is on data content; arbitrary markup is used to describe data. This versatile, self-describing data representation has established XML as the universal data format and the de facto standard for information exchange on the Web. This has gradually given rise to the need for efficient storage and querying of large XML repositories. To that end, we propose a new model for building a native XML store which is based on a generalisation of vertical decomposition. Nodes of a document satisfying the same label-path, are extracted and stored together in a single container, a Stripe. Stripes make use of a labelling scheme allowing us to maintain full structural information. Over this new representation, we introduce various evaluation techniques, which allow us to handle a large fragment of XPath 2.0. We also focus on the optimisation opportunities that arise from our decomposition model during any query evaluation phase. During query validation, we present an input minimisation process that exploits the proposed model for identifying input that is only relevant to the given query, in terms of Stripes. We also define query equivalence rules for query rewriting over our proposed model. Finally, during query optimisation, we deal with whether and under which circumstances certain evaluation algorithms can be replaced by others having lower I/O and/or CPU cost. We propose three storage schemes under our general decomposition technique. The schemes differ in the compression method imposed on the structural part of the XML document. The first storage scheme imposes no compression. The second storage scheme exploits structural regularities of the document to minimise storage and, thus, I/O cost during query evaluation. Finally, the third storage scheme performs structureagnostic compression of the document structure which results in minimised storage, regardless the actual XML structure. We experiment on XML repositories of varying size, recursion and structural regularity. We consider query input size, execution plan size and query response time as metrics for our experimental results. We process query workloads by applying each of the proposed optimisations in isolation and then all of their combinations. In addition, we apply the same execution pipeline for all proposed storage schemes. As a reference to our proposed query evaluation pipeline, we use the current state-of-the-art system for XML query processing. Our results demonstrate that: • Our proposed data model provides the infrastructure for efficiently selecting the parts of the document that are relevant to a given query. • The application of query rewriting, combined with input minimisation, reduces query input size as well as the number of physical operators used. In addition, when evaluation algorithms are specialised to the decomposition method, query response time is further reduced. • Query evaluation performance is largely affected by the storage schemes, which are closely related to the structural properties of the data. The achieved compression ratio greatly affects storage size and therefore, query response times

    A component framework for personalized multimedia applications

    Get PDF
    Eine praktikable Unterstützung für eine dynamische Erstellung von personalisierten Multimedia-Präsentationen bieten bisher weder industrielle Lösungen noch Forschungsansätze. Mit dem Software-technischen Ansatz des MM4U-Frameworks („MultiMedia For You“) wird erstmals eine generische und zugleich praktikable Unterstützung für den dynamischen Erstellungsprozess bereitgestellt. Das Ziel des MM4U-Frameworks ist es den Anwendungsentwicklern eine umfangreiche und anwendungsunabhängige Unterstützung zur Erstellung von personalisierten Multimedia-Inhalten anzubieten und damit den Entwicklungsprozess solcher Anwendungen erheblich zu erleichtern. Um das Ziel eines Software-Frameworks zur generischen Unterstützung der Entwicklung von personalisierten Multimedia-Anwendungen zu erreichen, stellt sich die Frage nach einer geeigneten Software-technischen Unterstützung zur Entwicklung eines solchen Frameworks. Seit der Einführung von objektorientierten Frameworks, ist heute die Entwicklung immer noch aufwendig und schwierig. Um die Entwicklungsrisiken zu reduzieren, sind geeignete Vorgehensmodelle und Entwicklungsmethoden erstellt worden. Mit der Komponenten-Technologie sind auch so genannte Komponenten-Frameworks entstanden. Im Gegensatz zu objekt-orientierten Frameworks fehlt derzeit jedoch ein geeignetes Vorgehensmodell für Komponenten-Frameworks. Um den Entwicklungsprozess von Komponenten-Frameworks zu verbessern ist mit ProMoCF („Process Model for Component Frameworks“) ein neuartiger Ansatz entwickelt worden. Hierbei handelt es sich um ein leichtgewichtiges Vorgehensmodell und eine Entwicklungsmethodik für Komponenten-Frameworks. Das Vorgehensmodell wurde unter gegenseitigem Nutzen mit der Entwicklung des MM4U-Frameworks erstellt. Das MM4U-Framework stellt keine Neuerfindung der Adaption von Multimedia-Inhalten dar, sondern zielt auf die Vereinigung und Einbettung existierender Forschungsansätze und Lösungen im Umfeld der Multimedia-Personalisierung. Mit so einem Framework an der Hand können Anwendungsentwickler erstmals effizient und einfach eine dynamische Erstellung ihrer personalisierten Multimedia-Inhalte realisieren

    DeweyIDs -- The Key to Fine-Grained Management of XML Documents

    No full text
    Because XML documents tend to be very large and are more and more collaboratively processed, their fine-grained storage and management is a must for which, in turn, a flexible tree representation is mandatory. Performance requirements dictate efficient query and update processing in multi-user environments. For this reason, three aspects are of particular importance: index support to directly access each internal document node if needed, navigation along the parent, child, and sibling axes, selective and direct locking of minimal document granules. The secret to effectively accelerate all of them are DeweyIDs. They identify the tree nodes, avoid relabeling of them even under heavy node insertions and deletions, and allow, at the same time, the derivation of all ancestor node IDs without accessing the document. In this paper, we explore the concept of DeweyIDs, refine the ORDPATH addressing scheme, illustrate its implementation, and give an exhaustive performance evaluation of its practical use
    corecore