1,114 research outputs found

    Tree Signatures for XML Querying and Navigation

    Full text link

    Content-Aware DataGuides for Indexing Large Collections of XML Documents

    Get PDF
    XML is well-suited for modelling structured data with textual content. However, most indexing approaches perform structure and content matching independently, combining the retrieved path and keyword occurrences in a third step. This paper shows that retrieval in XML documents can be accelerated significantly by processing text and structure simultaneously during all retrieval phases. To this end, the Content-Aware DataGuide (CADG) enhances the wellknown DataGuide with (1) simultaneous keyword and path matching and (2) a precomputed content/structure join. Extensive experiments prove the CADG to be 50-90% faster than the DataGuide for various sorts of query and document, including difficult cases such as poorly structured queries and recursive document paths. A new query classification scheme identifies precise query characteristics with a predominant influence on the performance of the individual indices. The experiments show that the CADG is applicable to many real-world applications, in particular large collections of heterogeneously structured XML documents

    Matching Subsequences in Trees

    Full text link
    Given two rooted, labeled trees PP and TT the tree path subsequence problem is to determine which paths in PP are subsequences of which paths in TT. Here a path begins at the root and ends at a leaf. In this paper we propose this problem as a useful query primitive for XML data, and provide new algorithms improving the previously best known time and space bounds.Comment: Minor correction of typos, et

    An extended preorder index for optimising XPath expressions

    Get PDF
    Many of the problems with native XML databases relate to query performance and subsequently, it can be difficult to convince traditional database users of the benefits of using semi- or unstructured databases. Presently, there still lacks an index structure providing efficient support for structural queries and the traditional data-centric and content queries. This paper presents an extended index structure based on the preorder traversal rank and the level (or depth) rank of each node in a document tree. The extended index fully supports the navigation of all XPath axes while efficiently supporting data-centric queries. The ability to start path traversals from arbitrary nodes in a document tree also enables the extended index to support the evaluation of path traversals embedded in XQuery expressions. Furthermore, an encoding technique is presented where properties of the level ranking may be exploited to provide efficient and optimised level-based XPath evaluations

    TREE-D-SEEK: A Framework for Retrieving Three-Dimensional Scenes

    Get PDF
    In this dissertation, a strategy and framework for retrieving 3D scenes is proposed. The strategy is to retrieve 3D scenes based on a unified approach for indexing content from disparate information sources and information levels. The TREE-D-SEEK framework implements the proposed strategy for retrieving 3D scenes and is capable of indexing content from a variety of corpora at distinct information levels. A semantic annotation model for indexing 3D scenes in the TREE-D-SEEK framework is also proposed. The semantic annotation model is based on an ontology for rapid prototyping of 3D virtual worlds. With ongoing improvements in computer hardware and 3D technology, the cost associated with the acquisition, production and deployment of 3D scenes is decreasing. As a consequence, there is a need for efficient 3D retrieval systems for the increasing number of 3D scenes in corpora. An efficient 3D retrieval system provides several benefits such as enhanced sharing and reuse of 3D scenes and 3D content. Existing 3D retrieval systems are closed systems and provide search solutions based on a predefined set of indexing and matching algorithms Existing 3D search systems and search solutions cannot be customized for specific requirements, type of information source and information level. In this research, TREE-D-SEEK—an open, extensible framework for retrieving 3D scenes—is proposed. The TREE-D-SEEK framework is capable of retrieving 3D scenes based on indexing low level content to high-level semantic metadata. The TREE-D-SEEK framework is discussed from a software architecture perspective. The architecture is based on a common process flow derived from indexing disparate information sources. Several indexing and matching algorithms are implemented. Experiments are conducted to evaluate the usability and performance of the framework. Retrieval performance of the framework is evaluated using benchmarks and manually collected corpora. A generic, semantic annotation model is proposed for indexing a 3D scene. The primary objective of using the semantic annotation model in the TREE-D-SEEK framework is to improve retrieval relevance and to support richer queries within a 3D scene. The semantic annotation model is driven by an ontology. The ontology is derived from a 3D rapid prototyping framework. The TREE-D-SEEK framework supports querying by example, keyword based and semantic annotation based query types for retrieving 3D scenes

    Internet based molecular collaborative and publishing tools

    No full text
    The scientific electronic publishing model has hitherto been an Internet based delivery of electronic articles that are essentially replicas of their paper counterparts. They contain little in the way of added semantics that may better expose the science, assist the peer review process and facilitate follow on collaborations, even though the enabling technologies have been around for some time and are mature. This thesis will examine the evolution of chemical electronic publishing over the past 15 years. It will illustrate, which the help of two frameworks, how publishers should be exploiting technologies to improve the semantics of chemical journal articles, namely their value added features and relationships with other chemical resources on the Web. The first framework is an early exemplar of structured and scalable electronic publishing where a Web content management system and a molecular database are integrated. It employs a test bed of articles from several RSC journals and supporting molecular coordinate and connectivity information. The value of converting 3D molecular expressions in chemical file formats, such as the MOL file, into more generic 3D graphics formats, such as Web3D, is assessed. This exemplar highlights the use of metadata management for bidirectional hyperlink maintenance in electronic publishing. The second framework repurposes this metadata management concept into a Semantic Web application called SemanticEye. SemanticEye demonstrates how relationships between chemical electronic articles and other chemical resources are established. It adapts the successful semantic model used for digital music metadata management by popular applications such as iTunes. Globally unique identifiers enable relationships to be established between articles and other resources on the Web and SemanticEye implements two: the Document Object Identifier (DOI) for articles and the IUPAC International Chemical Identifier (InChI) for molecules. SemanticEye’s potential as a framework for seeding collaborations between researchers, who have hitherto never met, is explored using FOAF, the friend-of-a-friend Semantic Web standard for social networks

    Preface of the Proceedings of WRAP 2004

    Get PDF
    • 

    corecore