5,802 research outputs found

    XML Schema Clustering with Semantic and Hierarchical Similarity Measures

    Get PDF
    With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis

    Querying XML data streams from wireless sensor networks: an evaluation of query engines

    Get PDF
    As the deployment of wireless sensor networks increase and their application domain widens, the opportunity for effective use of XML filtering and streaming query engines is ever more present. XML filtering engines aim to provide efficient real-time querying of streaming XML encoded data. This paper provides a detailed analysis of several such engines, focusing on the technology involved, their capabilities, their support for XPath and their performance. Our experimental evaluation identifies which filtering engine is best suited to process a given query based on its properties. Such metrics are important in establishing the best approach to filtering XML streams on-the-fly

    From Regular Expression Matching to Parsing

    Full text link
    Given a regular expression RR and a string QQ, the regular expression parsing problem is to determine if QQ matches RR and if so, determine how it matches, e.g., by a mapping of the characters of QQ to the characters in RR. Regular expression parsing makes finding matches of a regular expression even more useful by allowing us to directly extract subpatterns of the match, e.g., for extracting IP-addresses from internet traffic analysis or extracting subparts of genomes from genetic data bases. We present a new general techniques for efficiently converting a large class of algorithms that determine if a string QQ matches regular expression RR into algorithms that can construct a corresponding mapping. As a consequence, we obtain the first efficient linear space solutions for regular expression parsing

    Static and dynamic semantics of NoSQL languages

    Get PDF
    We present a calculus for processing semistructured data that spans differences of application area among several novel query languages, broadly categorized as "NoSQL". This calculus lets users define their own operators, capturing a wider range of data processing capabilities, whilst providing a typing precision so far typical only of primitive hard-coded operators. The type inference algorithm is based on semantic type checking, resulting in type information that is both precise, and flexible enough to handle structured and semistructured data. We illustrate the use of this calculus by encoding a large fragment of Jaql, including operations and iterators over JSON, embedded SQL expressions, and co-grouping, and show how the encoding directly yields a typing discipline for Jaql as it is, namely without the addition of any type definition or type annotation in the code

    Towards a query language for annotation graphs

    Get PDF
    The multidimensional, heterogeneous, and temporal nature of speech databases raises interesting challenges for representation and query. Recently, annotation graphs have been proposed as a general-purpose representational framework for speech databases. Typical queries on annotation graphs require path expressions similar to those used in semistructured query languages. However, the underlying model is rather different from the customary graph models for semistructured data: the graph is acyclic and unrooted, and both temporal and inclusion relationships are important. We develop a query language and describe optimization techniques for an underlying relational representation.Comment: 8 pages, 10 figure

    Compressed materialised views of semi-structured data

    Get PDF
    Query performance issues over semi-structured data have led to the emergence of materialised XML views as a means of restricting the data structure processed by a query. However preserving the conventional representation of such views remains a significant limiting factor especially in the context of mobile devices where processing power, memory usage and bandwidth are significant factors. To explore the concept of a compressed materialised view, we extend our earlier work on structural XML compression to produce a combination of structural summarisation and data compression techniques. These techniques provide a basis for efficiently dealing with both structural queries and valuebased predicates. We evaluate the effectiveness of such a scheme, presenting results and performance measures that show advantages of using such structures

    Enhancing the EAST-ADL error model with HiP-HOPS semantics

    Get PDF
    EAST-ADL is a domain-specific modelling language for the engineering of automotive embedded systems. The language has abstractions that enable engineers to capture a variety of information about design in the course of the lifecycle — from requirements to detailed design of hardware and software architectures. The specification of the EAST-ADL language includes an error model extension which documents language structures that allow potential failures of design elements to be specified locally. The effects of these failures are then later assessed in the context of the architecture design. To provide this type of useful assessment, a language and a specification are not enough; a compiler-like tool that can read and operate on a system specification together with its error model is needed. In this paper we integrate the error model of EAST-ADL with the precise semantics of HiP-HOPS — a state-of-the-art tool that enables dependability analysis and optimization of design models. We present the integration concept between EAST-ADL structure and HiP-HOPS error propagation logic and its transformation into the HiP-HOPS model. Source and destination models are represented using the corresponding XML formats. The connection of these two models at tool level enables practical EAST-ADL designs of embedded automotive systems to be analysed in terms of dependability, i.e. safety, reliability and availability. In addition, the information encoded in the error model can be re-used across different contexts of application with the associated benefits for cost reduction, simplification, and rationalisation of dependability assessments in complex engineering designs

    Fast and Compact Regular Expression Matching

    Get PDF
    We study 4 problems in string matching, namely, regular expression matching, approximate regular expression matching, string edit distance, and subsequence indexing, on a standard word RAM model of computation that allows logarithmic-sized words to be manipulated in constant time. We show how to improve the space and/or remove a dependency on the alphabet size for each problem using either an improved tabulation technique of an existing algorithm or by combining known algorithms in a new way

    Web Queries: From a Web of Data to a Semantic Web?

    Get PDF
    • 

    corecore