5,487 research outputs found

    Challenges in the Evaluation of Observational Data Trustworthiness From a Data Producers Viewpoint (FAIR+)

    Get PDF
    Recent discussions in many scientific disciplines stress the necessity of “FAIR” data. FAIR data, however, does not necessarily include information on data trustworthiness, where trustworthiness comprises reliability, validity and provenience/provenance. This opens up the risk of misinterpreting scientific data, even though all criteria of “FAIR” are fulfilled. Especially applications such as secondary data processing, data blending, and joint interpretation or visualization efforts are affected. This paper intends to start a discussion in the scientific community about how to evaluate, describe, and implement trustworthiness in a standardized data evaluation approach and in its metadata description following the FAIR principles. It discusses exemplarily different assessment tools regarding soil moisture measurements, data processing and visualization and elaborates on which additional (metadata) information is required to increase the trustworthiness of data for secondary usage. Taking into account the perspectives of data collectors, providers and users, the authors identify three aspects of data trustworthiness that promote efficient data sharing: 1) trustworthiness of the measurement 2) trustworthiness of the data processing and 3) trustworthiness of the data integration and visualization. The paper should be seen as the basis for a community discussion on data trustworthiness for a scientifically correct secondary use of the data. We do not have the intention to replace existing procedures and do not claim completeness of reliable tools and approaches described. Our intention is to discuss several important aspects to assess data trustworthiness based on the data life cycle of soil moisture data as an example

    Utilising Provenance to Enhance Social Computation

    Get PDF
    Postprin

    From Data Fusion to Knowledge Fusion

    Get PDF
    The task of {\em data fusion} is to identify the true values of data items (eg, the true date of birth for {\em Tom Cruise}) among multiple observed values drawn from different sources (eg, Web sites) of varying (and unknown) reliability. A recent survey\cite{LDL+12} has provided a detailed comparison of various fusion methods on Deep Web data. In this paper, we study the applicability and limitations of different fusion techniques on a more challenging problem: {\em knowledge fusion}. Knowledge fusion identifies true subject-predicate-object triples extracted by multiple information extractors from multiple information sources. These extractors perform the tasks of entity linkage and schema alignment, thus introducing an additional source of noise that is quite different from that traditionally considered in the data fusion literature, which only focuses on factual errors in the original sources. We adapt state-of-the-art data fusion techniques and apply them to a knowledge base with 1.6B unique knowledge triples extracted by 12 extractors from over 1B Web pages, which is three orders of magnitude larger than the data sets used in previous data fusion papers. We show great promise of the data fusion approaches in solving the knowledge fusion problem, and suggest interesting research directions through a detailed error analysis of the methods.Comment: VLDB'201

    Users' trust in information resources in the Web environment: a status report

    Get PDF
    This study has three aims; to provide an overview of the ways in which trust is either assessed or asserted in relation to the use and provision of resources in the Web environment for research and learning; to assess what solutions might be worth further investigation and whether establishing ways to assert trust in academic information resources could assist the development of information literacy; to help increase understanding of how perceptions of trust influence the behaviour of information users

    Trust and Risk Relationship Analysis on a Workflow Basis: A Use Case

    Get PDF
    Trust and risk are often seen in proportion to each other; as such, high trust may induce low risk and vice versa. However, recent research argues that trust and risk relationship is implicit rather than proportional. Considering that trust and risk are implicit, this paper proposes for the first time a novel approach to view trust and risk on a basis of a W3C PROV provenance data model applied in a healthcare domain. We argue that high trust in healthcare domain can be placed in data despite of its high risk, and low trust data can have low risk depending on data quality attributes and its provenance. This is demonstrated by our trust and risk models applied to the BII case study data. The proposed theoretical approach first calculates risk values at each workflow step considering PROV concepts and second, aggregates the final risk score for the whole provenance chain. Different from risk model, trust of a workflow is derived by applying DS/AHP method. The results prove our assumption that trust and risk relationship is implicit

    Provenance analysis for instagram photos

    Get PDF
    As a feasible device fingerprint, sensor pattern noise (SPN) has been proven to be effective in the provenance analysis of digital images. However, with the rise of social media, millions of images are being uploaded to and shared through social media sites every day. An image downloaded from social networks may have gone through a series of unknown image manipulations. Consequently, the trustworthiness of SPN has been challenged in the provenance analysis of the images downloaded from social media platforms. In this paper, we intend to investigate the effects of the pre-defined Instagram images filters on the SPN-based image provenance analysis. We identify two groups of filters that affect the SPN in quite different ways, with Group I consisting of the filters that severely attenuate the SPN and Group II consisting of the filters that well preserve the SPN in the images. We further propose a CNN-based classifier to perform filter-oriented image categorization, aiming to exclude the images manipulated by the filters in Group I and thus improve the reliability of the SPN-based provenance analysis. The results on about 20, 000 images and 18 filters are very promising, with an accuracy higher than 96% in differentiating the filters in Group I and Group II
    • …
    corecore