38 research outputs found

    Reconstructing Provenance

    Get PDF

    BSML: A Binding Schema Markup Language for Data Interchange in Problem Solving Environments (PSEs)

    Full text link
    We describe a binding schema markup language (BSML) for describing data interchange between scientific codes. Such a facility is an important constituent of scientific problem solving environments (PSEs). BSML is designed to integrate with a PSE or application composition system that views model specification and execution as a problem of managing semistructured data. The data interchange problem is addressed by three techniques for processing semistructured data: validation, binding, and conversion. We present BSML and describe its application to a PSE for wireless communications system design

    Storing XML Documents in Databases

    Get PDF
    The authors introduce concepts for loading large amounts of XML documents into databases where the documents are stored and maintained. The goal is to make XML databases as unobtrusive in multi-tier systems as possible and at the same time provide as many services defined by the XML standards as possible. The ubiquity of XML has sparked great interest in deploying concepts known from Relational Database Management Systems such as declarative query languages, transactions, indexes and integrity constraints. This chapter presents now bulkloading is done in Monet XML, a main memory XML database system, and evaluates the cost of bulkloading and bulk deletion with respect to strategies which base on insertion and deletion of individual nodes. Additionally, we survey the applicability of the techniques to a wider class of XML storage schemas

    Novel Method for Measuring Structure and Semantic Similarity of XML Documents Based on Extended Adjacency Matrix

    Get PDF
    AbstractSimilarity measurement of XML documents is crucial to meet various needs of approximate searches and document classifications in XML-oriented applications. Some methods have been proposed for this purpose. Nevertheless, few methods can be elegantly exploited to depict structure and semantic information and hence to effectively measure the similarity of XML documents. In this paper, we present a new method of computing the structure and semantic similarity of XML documents based on extended adjacency matrix(EAM). Different from a general adjacency matrix, in an EAM, the structure information of not only the adjacent layers but also the ancestor-descendant layers can be stored. For measuring the similarity of two XML documents, the proposed method firstly stores the structure and semantic information in two extended adjacency matrices(M1, M2). Then it computes similarity of the two documents through cos(M1, M2) Experimental results on bench-mark data show that the method holds high efficiency and accuracy

    Bulkloading and Maintaining XML Documents

    Get PDF
    The popularity of XML as a exchange and storage format brings about massive amounts of documents to be stored, maintained and analyzed -- a challenge that traditionally has been tackled with Database Management Systems (DBMS). To open up the content of XML documents to analysis with declarative query languages, efficient bulk loading techniques are necessary. Database technology has traditionally been offering support for these tasks but yet falls short of providing efficient automation techniques for the challenges that large collections of XML data raise. As storage back-end, many applications rely on relational databases, which are designed towards large data volumes. This paper studies the bulk load and update algorithms for XML data stored in relational format and outlines opportunities and problems. We investigate both (1) bulk insertion and deletion as well as (2) updates in the form of edit scripts which heavily use pointer-chasing techniques which often are considered orthogonal to the algebraic operations relational databases are optimized for. To get the most out of relational database systems, we show that one should make careful use of edit scripts and replace them with bulk operations if more than a very small portion of the database is updated. We implemented our ideas on top of the Monet Database System and benchmarked their performance

    Novel Techniques For Model-Code Synchronization

    Get PDF
    The orientation of the current software development practice requires efficient model-based iterative solutions. The high costs of maintenance and evolution during the life cycle of the software can be reduced by using tool-aided iterative development. This paper presents how model-based iterative software development can be supported through efficient model-code change propagation. The presented approach facilitates bi-directional synchronization between the modified source code and the refined initial models. The backgrounds of the synchronization technique are three-way abstract syntax tree (AST) differencing and merging. The AST-based solution enables syntactically correct merge operations. OMG's Model-Driven Architecture describes a proposal for platform-specific model creation and source code generation. We extend this vision with the synchronization feature to assist the iterative development. Furthermore, a case study is also provided

    Managing and Querying Multi-Version XML Data with Update Logging

    Get PDF
    With the increasing popularity of storing content on the WWW and intranet in XML form, there arises the need for the control and management of this data. As this data is constantly evolving, users want to be able to query previous versions, query changes in documents, as well as to retrieve a particular document version efficiently. This paper proposes a version management system for XML data that can manage and query changes in an effective and meaningful manner

    CaSePer: An efficient model for personalized web page change detection based on segmentation

    Get PDF
    AbstractUsers who visit a web page repeatedly at frequent intervals are more interested in knowing the recent changes that have occurred on the page than the entire contents of the web page. Because of the increased dynamism of web pages, it would be difficult for the user to identify the changes manually. This paper proposes an enhanced model for detecting changes in the pages, which is called CaSePer (Change detection based on Segmentation with Personalization). The change detection is micro-managed by introducing web page segmentation. The web page change detection process is made efficient by having it perform a dual-step process. The proposed method reduces the complexity of the change-detection by focusing only on the segments in which the changes have occurred. The user-specific personalized change detection is also incorporated in the proposed model. The model is validated with the help of a prototype implementation. The experiments conducted on the prototype implementation confirm a 77.8% improvement and a 97.45% accuracy rate

    What Makes Data Possible? A Sociotechnical View on Structured Data Innovations

    Get PDF
    Drawing from the theory of digital objects, this paper examines the distinction between structured and unstructured data as carriers of facts. We argue that data do not ‘have’ a structure but are made by a structure that confers data their capacity to represent contextual facts. We employ a case vignette involving XBRL (eXtensible Business Reporting Language) and its use in statutory financial reporting to illustrate and explore the sociotechnical nature of data and to describe what we call data innovations: new valuable ways to render phenomena as data. We find that data structure is best viewed as a matter that is relative to a purpose in a context. Theorizing data from a sociotechnical perspective could evolve to provide, in effect, the material science of digital economy
    corecore