255 research outputs found

    Intuitionistic fuzzy XML query matching and rewriting

    Get PDF
    With the emergence of XML as a standard for data representation, particularly on the web, the need for intelligent query languages that can operate on XML documents with structural heterogeneity has recently gained a lot of popularity. Traditional Information Retrieval and Database approaches have limitations when dealing with such scenarios. Therefore, fuzzy (flexible) approaches have become the predominant. In this thesis, we propose a new approach for approximate XML query matching and rewriting which aims at achieving soft matching of XML queries with XML data sources following different schemas. Unlike traditional querying approaches, which require exact matching, the proposed approach makes use of Intuitionistic Fuzzy Trees to achieve approximate (soft) query matching. Through this new approach, not only the exact answer of a query, but also approximate answers are retrieved. Furthermore, partial results can be obtained from multiple data sources and merged together to produce a single answer to a query. The proposed approach introduced a new tree similarity measure that considers the minimum and maximum degrees of similarity/inclusion of trees that are based on arc matching. New techniques for soft node and arc matching were presented for matching queries against data sources with highly varied structures. A prototype was developed to test the proposed ideas and it proved the ability to achieve approximate matching for pattern queries with a number of XML schemas and rewrite the original query so that it obtain results from the underlying data sources. This has been achieved through several novel algorithms which were tested and proved efficiency and low CPU/Memory cost even for big number of data sources

    Bounded repairability for regular tree languages

    Get PDF
    We study the problem of bounded repairability of a given restriction tree language R into a target tree language T. More precisely, we say that R is bounded repairable w.r.t. T if there exists a bound on the number of standard tree editing operations necessary to apply to any tree in R in order to obtain a tree in T. We consider a number of possible specifications for tree languages: bottom-up tree automata (on curry encoding of unranked trees) that capture the class of XML Schemas and DTDs. We also consider a special case when the restriction language R is universal, i.e., contains all trees over a given alphabet. We give an effective characterization of bounded repairability between pairs of tree languages represented with automata. This characterization introduces two tools, synopsis trees and a coverage relation between them, allowing one to reason about tree languages that undergo a bounded number of editing operations. We then employ this characterization to provide upper bounds to the complexity of deciding bounded repairability and we show that these bounds are tight. In particular, when the input tree languages are specified with arbitrary bottom-up automata, the problem is coNEXPTIME-complete. The problem remains coNEXPTIME-complete even if we use deterministic non-recursive DTDs to specify the input languages. The complexity of the problem can be reduced if we assume that the alphabet, the set of node labels, is fixed: the problem becomes PSPACE-complete for non-recursive DTDs and coNP-complete for deterministic non-recursive DTDs. Finally, when the restriction tree language R is universal, we show that the bounded repairability problem becomes EXPTIME-complete if the target language is specified by an arbitrary bottom-up tree automaton and becomes tractable (PTIME-complete, in fact) when a deterministic bottom-up automaton is used

    XML Schema Clustering with Semantic and Hierarchical Similarity Measures

    Get PDF
    With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis

    Bounded repairability for regular tree languages

    Get PDF
    International audienceWe consider the problem of repairing unranked trees (e.g., XML documents) satisfying a given restriction specification R (e.g., a DTD) into unranked trees satisfying a given target specification T. Specifically, we focus on the question of whether one can get from any tree in a regular language R to some tree in another regular language T with a finite, uniformly bounded, number of edit operations (i.e., deletions and insertions of nodes). We give effective characterizations of the pairs of specifications R and T for which such a uniform bound exists, and we study the complexity of the problem under different representations of the regular tree languages (e.g., non-deterministic stepwise automata, deterministic stepwise automata, DTDs). Finally, we point out some connections with the analogous problem for regular languages of words

    Similarity of XML Data

    Get PDF
    Currently, XML is still more and more important format for storing and exchanging data. Evaluation of similarity of XML data plays a crucial role in efficient storing, processing and manipulating data. This work deals with possibility to evaluate similarity of DTDs. Firstly, suitable DTD tree representation is designed. Next, the algorithm based on tree edit distance technique is proposed. Finally, we are focusing on various aspects of similarity, such as, e.g., structural and linguistic information, and integrate them into our method.Jazyk XML se v dneĆĄnĂ­ době stĂĄvĂĄ stĂĄle dĆŻleĆŸitějĆĄĂ­m formĂĄtem pro uchovĂĄnĂ­ a vĂœměnu dat. ProvnĂĄnĂ­ podobnosti XML dat hraje zĂĄsadnĂ­ roli v efektivnĂ­m uklĂĄdĂĄnĂ­, zpracovĂĄvĂĄnĂ­ a manipulaci s daty. Tato prĂĄce se zabĂœvĂĄ moĆŸnostmi jak zjiĆĄĆ„ovat podobnost mezi DTD. Napƙed je navrĆŸena vhodnĂĄ reprezentace DTD stromĆŻ. NĂĄsledně je navrĆŸen takĂ© algoritmus, kterĂœ je zaloĆŸenĂœ na editačnĂ­ vzdĂĄlenosti stromĆŻ. Nakonec se zaměƙujeme na rĆŻznĂ© aspekty podobnosti, jako jsou napƙíklad strukturĂĄlnĂ­ a lingvistickĂ© informace, a snaĆŸĂ­me se je zahrnout do naĆĄĂ­ metody.Department of Software EngineeringKatedra softwarovĂ©ho inĆŸenĂœrstvĂ­Faculty of Mathematics and PhysicsMatematicko-fyzikĂĄlnĂ­ fakult

    Exploring and visualizing the ”Alma” of XML documents

    Get PDF
    In this paper we introduce eXVisXML, a visual tool to explore documents annotated with the mark-up language XML, in order to easily perform over them tasks as knowledge extraction or document engineering. eXVisXML was designed mainly for two kind of users. Those who want to analyze an annotated document to explore the information contained, for them a visual inspection tool can be of great help, and a slicing functionality can be an efective complement. The other target group is composed by document engineers who might be interested in assessing the quality of the annotation created. This can be achieved through the measurements of some parameters that will allow to compare the elements and attributes of the DTD/Schema against those efectively used in the document instances. Both functionalities and the way they were delineated and implemented will be discussed along the paper.FC

    An Algorithm for Detecting and Correcting XSLT Rules Affected by Schema Updates

    Get PDF
    Thesis (Master of Science in Informatics)--University of Tsukuba, no. 37776, 2017.3.2

    Visual exploration and retrieval of XML document collections with the generic system X2

    Get PDF
    This article reports on the XML retrieval system X2 which has been developed at the University of Munich over the last five years. In a typical session with X2, the user first browses a structural summary of the XML database in order to select interesting elements and keywords occurring in documents. Using this intermediate result, queries combining structure and textual references are composed semiautomatically. After query evaluation, the full set of answers is presented in a visual and structured way. X2 largely exploits the structure found in documents, queries and answers to enable new interactive visualization and exploration techniques that support mixed IR and database-oriented querying, thus bridging the gap between these three views on the data to be retrieved. Another salient characteristic of X2 which distinguishes it from other visual query systems for XML is that it supports various degrees of detailedness in the presentation of answers, as well as techniques for dynamically reordering and grouping retrieved elements once the complete answer set has been computed
    • 

    corecore