1,060 research outputs found

    Investigation into Indexing XML Data Techniques

    Get PDF
    The rapid development of XML technology improves the WWW, since the XML data has many advantages and has become a common technology for transferring data cross the internet. Therefore, the objective of this research is to investigate and study the XML indexing techniques in terms of their structures. The main goal of this investigation is to identify the main limitations of these techniques and any other open issues. Furthermore, this research considers most common XML indexing techniques and performs a comparison between them. Subsequently, this work makes an argument to find out these limitations. To conclude, the main problem of all the XML indexing techniques is the trade-off between the size and the efficiency of the indexes. So, all the indexes become large in order to perform well, and none of them is suitable for all users’ requirements. However, each one of these techniques has some advantages in somehow

    Content-Aware DataGuides for Indexing Large Collections of XML Documents

    Get PDF
    XML is well-suited for modelling structured data with textual content. However, most indexing approaches perform structure and content matching independently, combining the retrieved path and keyword occurrences in a third step. This paper shows that retrieval in XML documents can be accelerated significantly by processing text and structure simultaneously during all retrieval phases. To this end, the Content-Aware DataGuide (CADG) enhances the wellknown DataGuide with (1) simultaneous keyword and path matching and (2) a precomputed content/structure join. Extensive experiments prove the CADG to be 50-90% faster than the DataGuide for various sorts of query and document, including difficult cases such as poorly structured queries and recursive document paths. A new query classification scheme identifies precise query characteristics with a predominant influence on the performance of the individual indices. The experiments show that the CADG is applicable to many real-world applications, in particular large collections of heterogeneously structured XML documents

    XIST: An XML Index Selection Tool

    Full text link

    Intuitionistic fuzzy XML query matching and rewriting

    Get PDF
    With the emergence of XML as a standard for data representation, particularly on the web, the need for intelligent query languages that can operate on XML documents with structural heterogeneity has recently gained a lot of popularity. Traditional Information Retrieval and Database approaches have limitations when dealing with such scenarios. Therefore, fuzzy (flexible) approaches have become the predominant. In this thesis, we propose a new approach for approximate XML query matching and rewriting which aims at achieving soft matching of XML queries with XML data sources following different schemas. Unlike traditional querying approaches, which require exact matching, the proposed approach makes use of Intuitionistic Fuzzy Trees to achieve approximate (soft) query matching. Through this new approach, not only the exact answer of a query, but also approximate answers are retrieved. Furthermore, partial results can be obtained from multiple data sources and merged together to produce a single answer to a query. The proposed approach introduced a new tree similarity measure that considers the minimum and maximum degrees of similarity/inclusion of trees that are based on arc matching. New techniques for soft node and arc matching were presented for matching queries against data sources with highly varied structures. A prototype was developed to test the proposed ideas and it proved the ability to achieve approximate matching for pattern queries with a number of XML schemas and rewrite the original query so that it obtain results from the underlying data sources. This has been achieved through several novel algorithms which were tested and proved efficiency and low CPU/Memory cost even for big number of data sources

    The ViP2P Platform: XML Views in P2P

    Get PDF
    The growing volumes of XML data sources on the Web or produced by enterprises, organizations etc. raise many performance challenges for data management applications. In this work, we are concerned with the distributed, peer-to-peer management of large corpora of XML documents, based on distributed hash table (or DHT, in short) overlay networks. We present ViP2P (standing for Views in Peer-to-Peer), a distributed platform for sharing XML documents based on a structured P2P network infrastructure (DHT). At the core of ViP2P stand distributed materialized XML views, defined by arbitrary XML queries, filled in with data published anywhere in the network, and exploited to efficiently answer queries issued by any network peer. ViP2P allows user queries to be evaluated over XML documents published by peers in two modes. First, a long-running subscription mode, when a query can be registered in the system and receive answers incrementally when and if published data matches the query. Second, queries can also be asked in an ad-hoc, snapshot mode, where results are required immediately and must be computed based on the results of other long-running, subscription queries. ViP2P innovates over other similar DHT-based XML sharing platforms by using a very expressive structured XML query language. This expressivity leads to a very flexible distribution of XML content in the ViP2P network, and to efficient snapshot query execution. ViP2P has been tested in real deployments of hundreds of computers. We present the platform architecture, its internal algorithms, and demonstrate its efficiency and scalability through a set of experiments. Our experimental results outgrow by orders of magnitude similar competitor systems in terms of data volumes, network size and data dissemination throughput.Comment: RR-7812 (2011

    Reasoning & Querying – State of the Art

    Get PDF
    Various query languages for Web and Semantic Web data, both for practical use and as an area of research in the scientific community, have emerged in recent years. At the same time, the broad adoption of the internet where keyword search is used in many applications, e.g. search engines, has familiarized casual users with using keyword queries to retrieve information on the internet. Unlike this easy-to-use querying, traditional query languages require knowledge of the language itself as well as of the data to be queried. Keyword-based query languages for XML and RDF bridge the gap between the two, aiming at enabling simple querying of semi-structured data, which is relevant e.g. in the context of the emerging Semantic Web. This article presents an overview of the field of keyword querying for XML and RDF

    AT-GIS: highly parallel spatial query processing with associative transducers

    Get PDF
    Users in many domains, including urban planning, transportation, and environmental science want to execute analytical queries over continuously updated spatial datasets. Current solutions for largescale spatial query processing either rely on extensions to RDBMS, which entails expensive loading and indexing phases when the data changes, or distributed map/reduce frameworks, running on resource-hungry compute clusters. Both solutions struggle with the sequential bottleneck of parsing complex, hierarchical spatial data formats, which frequently dominates query execution time. Our goal is to fully exploit the parallelism offered by modern multicore CPUs for parsing and query execution, thus providing the performance of a cluster with the resources of a single machine. We describe AT-GIS, a highly-parallel spatial query processing system that scales linearly to a large number of CPU cores. ATGIS integrates the parsing and querying of spatial data using a new computational abstraction called associative transducers(ATs). ATs can form a single data-parallel pipeline for computation without requiring the spatial input data to be split into logically independent blocks. Using ATs, AT-GIS can execute, in parallel, spatial query operators on the raw input data in multiple formats, without any pre-processing. On a single 64-core machine, AT-GIS provides 3Ă— the performance of an 8-node Hadoop cluster with 192 cores for containment queries, and 10Ă— for aggregation queries
    • …
    corecore