219,354 research outputs found

    Dynamic Provenance for SPARQL Update

    Get PDF
    While the Semantic Web currently can exhibit provenance information by using the W3C PROV standards, there is a "missing link" in connecting PROV to storing and querying for dynamic changes to RDF graphs using SPARQL. Solving this problem would be required for such clear use-cases as the creation of version control systems for RDF. While some provenance models and annotation techniques for storing and querying provenance data originally developed with databases or workflows in mind transfer readily to RDF and SPARQL, these techniques do not readily adapt to describing changes in dynamic RDF datasets over time. In this paper we explore how to adapt the dynamic copy-paste provenance model of Buneman et al. [2] to RDF datasets that change over time in response to SPARQL updates, how to represent the resulting provenance records themselves as RDF in a manner compatible with W3C PROV, and how the provenance information can be defined by reinterpreting SPARQL updates. The primary contribution of this paper is a semantic framework that enables the semantics of SPARQL Update to be used as the basis for a 'cut-and-paste' provenance model in a principled manner.Comment: Pre-publication version of ISWC 2014 pape

    Merging Special Collections with GIS Technology to Enhance the User Experience

    Get PDF
    This analysis evaluates how PhillyHistory.org merged their unique special collection materials with geospatial-based progressive technology to challenge and educate the global community. A new generation of technologically savvy researchers has emerged that expect a more enhanced user experience than earlier generations. To meet these needs, collection managers are collaborating with community and local institutions to increase online access to materials; mixing best metadata practices with custom elements to create map mashups; and merging progressive GIS technology and geospatial based applications with their collections to enhance the user experience. The PhillyHistory.org website was analyzed to explore how they used various geospatial technology to create a new type of digital content management system based on geographical information and make their collections accessible via online software and mobile applications

    SLIM : Scalable Linkage of Mobility Data

    Get PDF
    We present a scalable solution to link entities across mobility datasets using their spatio-temporal information. This is a fundamental problem in many applications such as linking user identities for security, understanding privacy limitations of location based services, or producing a unified dataset from multiple sources for urban planning. Such integrated datasets are also essential for service providers to optimise their services and improve business intelligence. In this paper, we first propose a mobility based representation and similarity computation for entities. An efficient matching process is then developed to identify the final linked pairs, with an automated mechanism to decide when to stop the linkage. We scale the process with a locality-sensitive hashing (LSH) based approach that significantly reduces candidate pairs for matching. To realize the effectiveness and efficiency of our techniques in practice, we introduce an algorithm called SLIM. In the experimental evaluation, SLIM outperforms the two existing state-of-the-art approaches in terms of precision and recall. Moreover, the LSH-based approach brings two to four orders of magnitude speedup

    CC-interop : COPAC/Clumps Continuing Technical Cooperation. Final Project Report

    Get PDF
    As far as is known, CC-interop was the first project of its kind anywhere in the world and still is. Its basic aim was to test the feasibility of cross-searching between physical and virtual union catalogues, using COPAC and the three functioning "clumps" or virtual union catalogues (CAIRNS, InforM25, and RIDING), all funded or part-funded by JISC in recent years. The key issues investigated were technical interoperability of catalogues, use of collection level descriptions to search union catalogues dynamically, quality of standards in cataloguing and indexing practices, and usability of union catalogues for real users. The conclusions of the project were expected to, and indeed do, contribute to the development of the JISC Information Environment and to the ongoing debate as to the feasibility and desirability of creating a national UK catalogue. They also inhabit the territory of collection level descriptions (CLDs) and the wider services of JISC's Information Environment Services Registry (IESR). The results of this project will also have applicability for the common information environment, particularly through the landscaping work done via SCONE/CAIRNS. This work is relevant not just to HE and not just to digital materials, but encompasses other sectors and domains and caters for print resources as well. Key findings are thematically grouped as follows: System performance when inter-linking COPAC and the Z39.50 clumps. The various individual Z39.50 configurations permit technical interoperability relatively easily but only limited semantic interoperability is possible. Disparate cataloguing and indexing practices are an impairment to semantic interoperability, not just for catalogues but also for CLDs and descriptions of services (like those constituting JISC's IESR). Creating dynamic landscaping through CLDs: routines can be written to allow collection description databases to be output in formats that other UK users of CLDs, including developers of the JISC information environment. Searching a distributed (virtual) catalogue or clump via Z39.50: use of Z39.50 to Z39.50 middleware permits a distributed catalogue to be searched via Z39.50 from such disparate user services as another virtual union catalogue or clump, a physical union catalogue like COPAC, an individual Z client and other IE services. The breakthrough in this Z39.50 to Z39.50 conundrum came with the discovery that the JISC-funded JAFER software (a result of the 5/99 programme) meets many of the requirements and can be used by the current clumps services. It is technically possible for the user to select all or a sub-set of available end destination Z39.50 servers (we call this "landscaping") within this middleware. Comparing results processing between COPAC and clumps. Most distributed services (clumps) do not bring back complete results sets from associated Z servers (in order to save time for users). COPAC on-the-fly routines could feasibly be applied to the clumps services. An automated search set up to repeat its query of 17 catalogues in a clump (InforM25) hourly over nearly 3 months returned surprisingly good results; for example, over 90% of responses were received in less than one second, and no servers showed slower response times in periods of traditionally heavy OPAC use (mid-morning to early evening). User behaviour when cross-searching catalogues: the importance to users of a number of on-screen features, including the ability to refine a search and clear indication that a search is processing. The importance to users of information about the availability of an item as well as the holdings data. The impact of search tools such as Google and Amazon on user behaviour and the expectations of more information than is normally available from a library catalogue. The distrust of some librarians interviewed of the data sources in virtual union catalogues, thinking that there was not true interoperability

    Noise-tolerant approximate blocking for dynamic real-time entity resolution

    Get PDF
    Entity resolution is the process of identifying records in one or multiple data sources that represent the same real-world entity. This process needs to deal with noisy data that contain for example wrong pronunciation or spelling errors. Many real world applications require rapid responses for entity queries on dynamic datasets. This brings challenges to existing approaches which are mainly aimed at the batch matching of records in static data. Locality sensitive hashing (LSH) is an approximate blocking approach that hashes objects within a certain distance into the same block with high probability. How to make approximate blocking approaches scalable to large datasets and effective for entity resolution in real-time remains an open question. Targeting this problem, we propose a noise-tolerant approximate blocking approach to index records based on their distance ranges using LSH and sorting trees within large sized hash blocks. Experiments conducted on both synthetic and real-world datasets show the effectiveness of the proposed approach

    Collaborative recommendations with content-based filters for cultural activities via a scalable event distribution platform

    Get PDF
    Nowadays, most people have limited leisure time and the offer of (cultural) activities to spend this time is enormous. Consequently, picking the most appropriate events becomes increasingly difficult for end-users. This complexity of choice reinforces the necessity of filtering systems that assist users in finding and selecting relevant events. Whereas traditional filtering tools enable e.g. the use of keyword-based or filtered searches, innovative recommender systems draw on user ratings, preferences, and metadata describing the events. Existing collaborative recommendation techniques, developed for suggesting web-shop products or audio-visual content, have difficulties with sparse rating data and can not cope at all with event-specific restrictions like availability, time, and location. Moreover, aggregating, enriching, and distributing these events are additional requisites for an optimal communication channel. In this paper, we propose a highly-scalable event recommendation platform which considers event-specific characteristics. Personal suggestions are generated by an advanced collaborative filtering algorithm, which is more robust on sparse data by extending user profiles with presumable future consumptions. The events, which are described using an RDF/OWL representation of the EventsML-G2 standard, are categorized and enriched via smart indexing and open linked data sets. This metadata model enables additional content-based filters, which consider event-specific characteristics, on the recommendation list. The integration of these different functionalities is realized by a scalable and extendable bus architecture. Finally, focus group conversations were organized with external experts, cultural mediators, and potential end-users to evaluate the event distribution platform and investigate the possible added value of recommendations for cultural participation

    Using geographical information systems for management of back-pain data

    Get PDF
    This is the post-print version of the Article. The official published version can be accessed from the link below - Copyright @ 2002 MCB UP LtdIn the medical world, statistical visualisation has largely been confined to the realm of relatively simple geographical applications. This remains the case, even though hospitals have been collecting spatial data relating to patients. In particular, hospitals have a wealth of back pain information, which includes pain drawings, usually detailing the spatial distribution and type of pain suffered by back-pain patients. Proposes several technological solutions, which permit data within back-pain datasets to be digitally linked to the pain drawings in order to provide methods of computer-based data management and analysis. In particular, proposes the use of geographical information systems (GIS), up till now a tool used mainly in the geographic and cartographic domains, to provide novel and powerful ways of visualising and managing back-pain data. A comparative evaluation of the proposed solutions shows that, although adding complexity and cost, the GIS-based solution is the one most appropriate for visualisation and analysis of back-pain datasets
    • 

    corecore