5,802 research outputs found
XML Schema Clustering with Semantic and Hierarchical Similarity Measures
With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis
Querying XML data streams from wireless sensor networks: an evaluation of query engines
As the deployment of wireless sensor networks increase and their application domain widens, the opportunity for effective use of XML filtering and streaming query engines is ever more present. XML filtering engines aim to provide efficient real-time querying of streaming XML encoded data. This paper provides a detailed analysis of several such engines, focusing on the technology involved, their capabilities, their support for XPath and their performance. Our experimental evaluation identifies which filtering engine is best suited to process a given query based on its properties. Such metrics are important in establishing the best approach to filtering XML streams on-the-fly
From Regular Expression Matching to Parsing
Given a regular expression and a string , the regular expression
parsing problem is to determine if matches and if so, determine how it
matches, e.g., by a mapping of the characters of to the characters in .
Regular expression parsing makes finding matches of a regular expression even
more useful by allowing us to directly extract subpatterns of the match, e.g.,
for extracting IP-addresses from internet traffic analysis or extracting
subparts of genomes from genetic data bases. We present a new general
techniques for efficiently converting a large class of algorithms that
determine if a string matches regular expression into algorithms that
can construct a corresponding mapping. As a consequence, we obtain the first
efficient linear space solutions for regular expression parsing
Static and dynamic semantics of NoSQL languages
We present a calculus for processing semistructured data that spans
differences of application area among several novel query languages, broadly
categorized as "NoSQL". This calculus lets users define their own operators,
capturing a wider range of data processing capabilities, whilst providing a
typing precision so far typical only of primitive hard-coded operators. The
type inference algorithm is based on semantic type checking, resulting in type
information that is both precise, and flexible enough to handle structured and
semistructured data. We illustrate the use of this calculus by encoding a large
fragment of Jaql, including operations and iterators over JSON, embedded SQL
expressions, and co-grouping, and show how the encoding directly yields a
typing discipline for Jaql as it is, namely without the addition of any type
definition or type annotation in the code
Towards a query language for annotation graphs
The multidimensional, heterogeneous, and temporal nature of speech databases
raises interesting challenges for representation and query. Recently,
annotation graphs have been proposed as a general-purpose representational
framework for speech databases. Typical queries on annotation graphs require
path expressions similar to those used in semistructured query languages.
However, the underlying model is rather different from the customary graph
models for semistructured data: the graph is acyclic and unrooted, and both
temporal and inclusion relationships are important. We develop a query language
and describe optimization techniques for an underlying relational
representation.Comment: 8 pages, 10 figure
Compressed materialised views of semi-structured data
Query performance issues over semi-structured data have led to the emergence of materialised XML views as a means of restricting the data structure processed by a query. However preserving the conventional representation of such views remains a significant limiting factor especially in the context of mobile devices where processing power, memory usage and bandwidth are significant factors. To explore the concept of a compressed materialised view, we extend our earlier work on structural XML compression to produce a combination of structural summarisation and data compression techniques. These techniques provide a basis for efficiently dealing with both structural queries and valuebased predicates. We evaluate the effectiveness of such a scheme, presenting results and performance measures that show advantages of using such structures
Enhancing the EAST-ADL error model with HiP-HOPS semantics
EAST-ADL is a domain-specific modelling language for the engineering of automotive embedded systems. The language has abstractions that enable engineers to capture a variety of information about design in the course of the lifecycle â from requirements to detailed design of hardware and software architectures. The specification of the EAST-ADL language includes an error model extension which documents language structures that allow potential failures of design elements to be specified locally. The effects of these failures are then later assessed in the context of the architecture design. To provide this type of useful assessment, a language and a specification are not enough; a compiler-like tool that can read and operate on a system specification together with its error model is needed. In this paper we integrate the error model of EAST-ADL with the precise semantics of HiP-HOPS â a state-of-the-art tool that enables dependability analysis and optimization of design models. We present the integration concept between EAST-ADL structure and HiP-HOPS error propagation logic and its transformation into the HiP-HOPS model. Source and destination models are represented using the corresponding XML formats. The connection of these two models at tool level enables practical EAST-ADL designs of embedded automotive systems to be analysed in terms of dependability, i.e. safety, reliability and availability. In addition, the information encoded in the error model can be re-used across different contexts of application with the associated benefits for cost reduction, simplification, and rationalisation of dependability assessments in complex engineering designs
Fast and Compact Regular Expression Matching
We study 4 problems in string matching, namely, regular expression matching,
approximate regular expression matching, string edit distance, and subsequence
indexing, on a standard word RAM model of computation that allows
logarithmic-sized words to be manipulated in constant time. We show how to
improve the space and/or remove a dependency on the alphabet size for each
problem using either an improved tabulation technique of an existing algorithm
or by combining known algorithms in a new way
- âŠ