392 research outputs found

    Reconstructing human-generated provenance through similarity-based clustering

    Get PDF
    In this paper, we revisit our method for reconstructing the primary sources of documents, which make up an important part of their provenance. Our method is based on the assumption that if two documents are semantically similar, there is a high chance that they also share a common source. We previously evaluated this assumption on an excerpt from a news archive, achieving 68.2% precision and 73% recall when reconstructing the primary sources of all articles. However, since we could not release this dataset to the public, it made our results hard to compare to others. In this work, we extend the flexibility of our method by adding a new parameter, and re-evaluate it on the human-generated dataset created for the 2014 Provenance Reconstruction Challenge. The extended method achieves up to 86% precision and 59% recall, and is now directly comparable to any approach that uses the same dataset

    Reconstructing Provenance

    Get PDF

    Automatic discovery of high-level provenance using semantic similarity

    Get PDF
    As interest in provenance grows among the Semantic Web community, it is recognized as a useful tool across many domains. However, existing automatic provenance collection techniques are not universally applicable. Most existing methods either rely on (low-level) observed provenance, or require that the user discloses formal workflows. In this paper, we propose a new approach for automatic discovery of provenance, at multiple levels of granularity. To accomplish this, we detect entity derivations, relying on clustering algorithms, linked data and semantic similarity. The resulting derivations are structured in compliance with the Provenance Data Model (PROV-DM). While the proposed approach is purposely kept general, allowing adaptation in many use cases, we provide an implementation for one of these use cases, namely discovering the sources of news articles. With this implementation, we were able to detect 73% of the original sources of 410 news stories, at 68% precision. Lastly, we discuss possible improvements and future work

    Reproducibility of scientific workflows execution using cloud-aware provenance (ReCAP)

    Get PDF
    Š 2018, Springer-Verlag GmbH Austria, part of Springer Nature. Provenance of scientific workflows has been considered a mean to provide workflow reproducibility. However, the provenance approaches adopted so far are not applicable in the context of Cloud because the provenance trace lacks the Cloud information. This paper presents a novel approach that collects the Cloud-aware provenance and represents it as a graph. The workflow execution reproducibility on the Cloud is determined by comparing the workflow provenance at three levels i.e., workflow structure, execution infrastructure and workflow outputs. The experimental evaluation shows that the implemented approach can detect changes in the provenance traces and the outputs produced by the workflow

    Enabling automatic provenance-based trust assessment of web content

    Get PDF

    Hydrologic Information Systems: Advancing Cyberinfrastructure for Environmental Observatories

    Get PDF
    Recently, community initiatives have emerged for the establishment of large-scale environmental observatories. Cyberinfrastructure is the backbone upon which these observatories will be built, and scientists\u27 ability to access and use the data collected within observatories to address research questions will depend on the successful implementation of cyberinfrastructure. The research described in this dissertation advances the cyberinfrastructure available for supporting environmental observatories. This has been accomplished through both development of new cyberinfrastructure components as well as through the demonstration and application of existing tools, with a specific focus on point observations data. The cyberinfrastructure that was developed and deployed to support collection, management, analysis, and publication of data generated by an environmental sensor network in the Little Bear River environmental observatory test bed is described, as is the sensor network design and deployment. Results of several analyses that demonstrate how high-frequency data enable identification of trends and analysis of physical, chemical, and biological behavior that would be impossible using traditional, low-frequency monitoring data are presented. This dissertation also illustrates how the cyberinfrastructure components demonstrated in the Little Bear River test bed have been integrated into a data publication system that is now supporting a nationwide network of 11 environmental observatory test bed sites, as well as other research sites within and outside of the United States. Enhancements to the infrastructure for research and education that are enabled by this research are impacting a diverse community, including the national community of researchers involved with prospective Water and Environmental Research Systems (WATERS) Network environmental observatories as well as other observatory efforts, research watersheds, and test beds. The results of this research provide insight into and potential solutions for some of the bottlenecks associated with design and implementation of cyberinfrastructure for observatory support

    Retrieving haystacks: a data driven information needs model for faceted search.

    Get PDF
    The research aim was to develop an understanding of information need characteristics for word co-occurrence-based search result filters (facets). No prior research has been identified into what enterprise searchers may find useful for exploratory search and why. Various word co-occurrence techniques were applied to results from sample queries performed on industry membership content. The results were used in an international survey of 54 practising petroleum engineers from 32 organizations. Subject familiarity, job role, personality and query specificity are possible causes for survey response variation. An information needs model is presented: Broad, Rich, Intriguing, Descriptive, General, Expert and Situational (BRIDGES). This may help professionals to more effectively meet their information needs and stimulate new needs, improving a systems ability to facilitate serendipity. This research has implications for faceted search in enterprise search and digital library deployments

    Systems Biology Knowledgebase for a New Era in Biology A Genomics:GTL Report from the May 2008 Workshop

    Full text link

    Proceedings of the 10th International Conference on Ecological Informatics: translating ecological data into knowledge and decisions in a rapidly changing world: ICEI 2018

    Get PDF
    The Conference Proceedings are an impressive display of the current scope of Ecological Informatics. Whilst Data Management, Analysis, Synthesis and Forecasting have been lasting popular themes over the past nine biannual ICEI conferences, ICEI 2018 addresses distinctively novel developments in Data Acquisition enabled by cutting edge in situ and remote sensing technology. The here presented ICEI 2018 abstracts captures well current trends and challenges of Ecological Informatics towards: • regional, continental and global sharing of ecological data, • thorough integration of complementing monitoring technologies including DNA-barcoding, • sophisticated pattern recognition by deep learning, • advanced exploration of valuable information in ‘big data’ by means of machine learning and process modelling, • decision-informing solutions for biodiversity conservation and sustainable ecosystem management in light of global changes
    • …
    corecore