2,106 research outputs found

    The pragmatic proof: hypermedia API composition and execution

    Get PDF
    Machine clients are increasingly making use of the Web to perform tasks. While Web services traditionally mimic remote procedure calling interfaces, a new generation of so-called hypermedia APIs works through hyperlinks and forms, in a way similar to how people browse the Web. This means that existing composition techniques, which determine a procedural plan upfront, are not sufficient to consume hypermedia APIs, which need to be navigated at runtime. Clients instead need a more dynamic plan that allows them to follow hyperlinks and use forms with a preset goal. Therefore, in this paper, we show how compositions of hypermedia APIs can be created by generic Semantic Web reasoners. This is achieved through the generation of a proof based on semantic descriptions of the APIs' functionality. To pragmatically verify the applicability of compositions, we introduce the notion of pre-execution and post-execution proofs. The runtime interaction between a client and a server is guided by proofs but driven by hypermedia, allowing the client to react to the application's actual state indicated by the server's response. We describe how to generate compositions from descriptions, discuss a computer-assisted process to generate descriptions, and verify reasoner performance on various composition tasks using a benchmark suite. The experimental results lead to the conclusion that proof-based consumption of hypermedia APIs is a feasible strategy at Web scale.Peer ReviewedPostprint (author's final draft

    On-Demand Big Data Integration: A Hybrid ETL Approach for Reproducible Scientific Research

    Full text link
    Scientific research requires access, analysis, and sharing of data that is distributed across various heterogeneous data sources at the scale of the Internet. An eager ETL process constructs an integrated data repository as its first step, integrating and loading data in its entirety from the data sources. The bootstrapping of this process is not efficient for scientific research that requires access to data from very large and typically numerous distributed data sources. a lazy ETL process loads only the metadata, but still eagerly. Lazy ETL is faster in bootstrapping. However, queries on the integrated data repository of eager ETL perform faster, due to the availability of the entire data beforehand. In this paper, we propose a novel ETL approach for scientific data integration, as a hybrid of eager and lazy ETL approaches, and applied both to data as well as metadata. This way, Hybrid ETL supports incremental integration and loading of metadata and data from the data sources. We incorporate a human-in-the-loop approach, to enhance the hybrid ETL, with selective data integration driven by the user queries and sharing of integrated data between users. We implement our hybrid ETL approach in a prototype platform, Obidos, and evaluate it in the context of data sharing for medical research. Obidos outperforms both the eager ETL and lazy ETL approaches, for scientific research data integration and sharing, through its selective loading of data and metadata, while storing the integrated data in a scalable integrated data repository.Comment: Pre-print Submitted to the DMAH Special Issue of the Springer DAPD Journa

    AONS II: continuing the trend towards preservation software ‘Nirvana’

    Get PDF
    File format obsolescence is a major risk factor threatening the sustainability of and access to digital information. While the preservation community has become increasingly interested in tools for migration and transformation of file formats, the National Library of Australia is developing mechanisms specifically focused on monitoring and assessing the risks of file format obsolescence. This paper reports on the AONS II project, undertaken by the National Library of Australia (NLA) in conjunction with the Australian Partnership for Sustainable Repositories (APSR). The project aimed to develop a software tool which allows users to automatically monitor the status of file formats in their repositories, make risk assessments based on a core set of obsolescence risk questions, and receive notifications when file format risks change or other related events occur. This paper calls for the preservation community to develop a co-operating file format obsolescence community which includes registries, software tool creators and end users to effectively curate digital content in order to maintain long-term access
    • …
    corecore