671 research outputs found

    Requirements for Provenance on the Web

    Get PDF
    From where did this tweet originate? Was this quote from the New York Times modified? Daily, we rely on data from the Web but often it is difficult or impossible to determine where it came from or how it was produced. This lack of provenance is particularly evident when people and systems deal with Web information or with any environment where information comes from sources of varying quality. Provenance is not captured pervasively in information systems. There are major technical, social, and economic impediments that stand in the way of using provenance effectively. This paper synthesizes requirements for provenance on the Web for a number of dimensions focusing on three key aspects of provenance: the content of provenance, the management of provenance records, and the uses of provenance information. To illustrate these requirements, we use three synthesized scenarios that encompass provenance problems faced by Web users toda

    Citation and peer review of data: moving towards formal data publication

    Get PDF
    This paper discusses many of the issues associated with formally publishing data in academia, focusing primarily on the structures that need to be put in place for peer review and formal citation of datasets. Data publication is becoming increasingly important to the scientific community, as it will provide a mechanism for those who create data to receive academic credit for their work and will allow the conclusions arising from an analysis to be more readily verifiable, thus promoting transparency in the scientific process. Peer review of data will also provide a mechanism for ensuring the quality of datasets, and we provide suggestions on the types of activities one expects to see in the peer review of data. A simple taxonomy of data publication methodologies is presented and evaluated, and the paper concludes with a discussion of dataset granularity, transience and semantics, along with a recommended human-readable citation syntax

    A Framework for the Analysis and User-Driven Evaluation of Trust on the Semantic Web

    Get PDF
    This project will examine the area of trust on the Semantic Web and develop a framework for publishing and verifying trusted Linked Data. Linked Data describes a method of publishing structured data, automatically readable by computers, which can linked to other heterogeneous data with the purpose of becoming more useful. Trust plays a significant role in the adoption of new technologies and even more so in a sphere with such vast amounts of publicly-created data. Trust is paramount to the effective sharing and communication of tacit knowledge (Hislop, 2013). Up to now, the area of trust in Linked Data has not been adequately addressed, despite the Semantic Web stack having included a trust layer from the very beginning (Artz and Gil, 2007). Some of the most accurate data on the Semantic Web lies practically unused, while some of the most used linked data has high numbers of errors (Zaveri et al., 2013). Many of the datasets and links that exist on the Semantic Web are out of date and/or invalid and this undermines the credibility and validity, and ultimately, the trustworthiness of both the dataset and the data provider (Rajabi et al., 2012). This research will examine a number of datasets to determine the quality metrics that a dataset is required to meet to be considered ‘trusted’. The key findings will be assessed and utilized in the creation of a learning tool and a framework for creating trusted Linked Data

    To share or not to share: Publication and quality assurance of research data outputs. A report commissioned by the Research Information Network

    No full text
    A study on current practices with respect to data creation, use, sharing and publication in eight research disciplines (systems biology, genomics, astronomy, chemical crystallography, rural economy and land use, classics, climate science and social and public health science). The study looked at data creation and care, motivations for sharing data, discovery, access and usability of datasets and quality assurance of data in each discipline

    Challenges of connecting chemistry to pharmacology: perspectives from curating the IUPHAR/BPS Guide to PHARMACOLOGY

    Get PDF
    Connecting chemistry to pharmacology (c2p) has been an objective of GtoPdb and its precursor IUPHAR-DB since 2003. This has been achieved by populating our database with expert-curated relationships between documents, assays, quantitative results, chemical structures, their locations within the documents and the protein targets in the assays (D-A-R-C-P). A wide range of challenges associated with this are described in this perspective, using illustrative examples from GtoPdb entries. Our selection process begins with judgements of pharmacological relevance and scientific quality. Even though we have a stringent focus for our small-data extraction we note that assessing the quality of papers has become more difficult over the last 15 years. We discuss ambiguity issues with the resolution of authors’ descriptions of A-R-C-P entities to standardised identifiers. We also describe developments that have made this somewhat easier over the same period both in the publication ecosystem as well as enhancements of our internal processes over recent years. This perspective concludes with a look at challenges for the future including the wider capture of mechanistic nuances and possible impacts of text mining on automated entity extractio

    Connections in Music

    Get PDF
    PhDThis work is copyright (c) 2010 Kurt Jacobson, and is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported Licence. To view a copy of this licence, visit http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.Connections between music artists or songs provide a context and lineage for music and form the basis for recommendation, playlist generation, and general navigation of the musical universe. We examine the structure of the connections between music artists found on the web. It is shown that different methods of finding associations between artists yeild different net- work structures - the details of associations and how these associations are discovered impact the global structure of the artist network. This realization informs our associations framework - based on seman- tic web technologies and centered around a small RDF/OWL ontology that emphasizes the provenance and transparency of association statements. We develop the MuSim Similarity Ontology and show how, combined with the concepts of linked data, it can be used to create a distributed web-scale ecosystem for music similarity. The Similarity Ontology is evaluated against psychological models for similarity and shown to be flexible enough to accommodate each model examined. Several applications are developed based on the visualization of music artist network structures and the utilization of our associations framework along with other music-related linked data

    Micropublications: a Semantic Model for Claims, Evidence, Arguments and Annotations in Biomedical Communications

    Get PDF
    The Micropublications semantic model for scientific claims, evidence, argumentation and annotation in biomedical publications, is a metadata model of scientific argumentation, designed to support several key requirements for exchange and value-addition of semantic metadata across the biomedical publications ecosystem. Micropublications allow formalizing the argument structure of scientific publications so that (a) their internal structure is semantically clear and computable; (b) citation networks can be easily constructed across large corpora; (c) statements can be formalized in multiple useful abstraction models; (d) statements in one work may cite statements in another, individually; (e) support, similarity and challenge of assertions can be modelled across corpora; (f) scientific assertions, particularly in review articles, may be transitively closed to supporting evidence and methods. The model supports natural language statements; data; methods and materials specifications; discussion and commentary; as well as challenge and disagreement. A detailed analysis of nine use cases is provided, along with an implementation in OWL 2 and SWRL, with several example instantiations in RDF.Comment: Version 4. Minor revision

    High Quality Research Environments

    Get PDF
    A major challenge facing all research communities is creating and sustaining high quality research environments. A model describing strategic social structures that constrain knowledge production suggests that targeting these structures will have greater impact than addressing issues surrounding individual lab cultures, as important as these are. A literature search identified five common themes underlying bioscience research environments comprising collaboration, data processing, confidence in data and scientists, trust, user-led development, and a deep commitment to public benefit. Club theory was used to develop a model describing the social structures that constrain and contextualise research environments. It is argued that collaboration underlies impactful science and that this is hindered by high transaction costs, and the benefits associated with competition. These combined with poorly defined property rights surrounding publicly funded data limit the ability of data markets to operate efficiently. Although the science community is best placed to provide solutions for these issues, incentivisation by funding agencies to increase the benefits of collaboration will be an accelerator. Given the complexity of emerging datasets and the collaborations need to exploit them, trust-by-design solutions are suggested. The underlying ‘glue’ that holds this activity together is the aesthetic and ethical value-base underlying good science

    High Quality Research Environments

    Get PDF
    A major challenge facing all research communities is creating and sustaining high quality research environments. A model describing strategic social structures that constrain knowledge production suggests that targeting these structures will have greater impact than addressing issues surrounding individual lab cultures, as important as these are. A literature search identified five common themes underlying bioscience research environments comprising collaboration, data processing, confidence in data and scientists, trust, user-led development, and a deep commitment to public benefit. Club theory was used to develop a model describing the social structures that constrain and contextualise research environments. It is argued that collaboration underlies impactful science and that this is hindered by high transaction costs, and the benefits associated with competition. These combined with poorly defined property rights surrounding publicly funded data limit the ability of data markets to operate efficiently. Although the science community is best placed to provide solutions for these issues, incentivisation by funding agencies to increase the benefits of collaboration will be an accelerator. Given the complexity of emerging datasets and the collaborations need to exploit them, trust-by-design solutions are suggested. The underlying ‘glue’ that holds this activity together is the aesthetic and ethical value-base underlying good science
    • …
    corecore