Search CORE

20,982 research outputs found

Management of data quality when integrating data with known provenance

Author: Angeles Maria del Pilar
Publication venue: Mathematical and Computer Sciences
Publication date: 01/01/2007
Field of study

Abstract unavailable please refer to PD

ROS: The Research Output Service. Heriot-Watt University Edinburgh

OpenGrey Repository

e-Social Science and Evidence-Based Policy Assessment : Challenges and Solutions

Author: Alison H. Chorley
Anderson A.H.
Bernstein A.
Chorley A.
Chris Mellish
De Roure D.
Edoardo Pignotti
Edwards P.
Feikje Hielkema
Guy M.
Hielkema F.
Hielkema F.
HM Treasury.
J. Gary Polhill
John H. Farrington
Lorna J. Philip
Nick M. Gotts
Peter Edwards
Pignotti E.
Polhill J.G.
Power R.
Richard Reid
Schwitter R.
UK Cabinet Office Strategy Unit.
UK Cabinet Office.
Publication venue: 'SAGE Publications'
Publication date: 01/11/2009
Field of study

Peer reviewedPreprin

Aberdeen University Research

Crossref

Provenance in Linked Data Integration

Author: Gibbins Nicholas
Omitola Temitope
Shadbolt Nigel
Publication venue
Publication date: 16/12/2010
Field of study

The open world of the (Semantic) Web is a global information space offering diverse materials of disparate qualities, and the opportunity to re-use, aggregate, and integrate these materials in novel ways. The advent of Linked Data brings the potential to expose data on the Web, creating new challenges for data consumers who want to integrate these data. One challenge is the ability, for users, to elicit the reliability and/or the accuracy of the data they come across. In this paper, we describe a light-weight provenance extension for the voiD vocabulary that allows data publishers to add provenance metadata to their datasets. These provenance metadata can be queried by consumers and used as contextual information for integration and inter-operation of information resources on the Semantic Web

Southampton (e-Prints Soton)

Structurally Tractable Uncertain Data

Author: Abiteboul S.
Abiteboul S.
Agrawal R.
Amarilli A.
Carlson A.
Courcelle B.
Deutch D.
Dong X.
Galárraga L.
Gottlob G.
Lauritzen S. L.
Maniu S.
Raedt L. D.
Robertson N.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/07/2015
Field of study

Many data management applications must deal with data which is uncertain, incomplete, or noisy. However, on existing uncertain data representations, we cannot tractably perform the important query evaluation tasks of determining query possibility, certainty, or probability: these problems are hard on arbitrary uncertain input instances. We thus ask whether we could restrict the structure of uncertain data so as to guarantee the tractability of exact query evaluation. We present our tractability results for tree and tree-like uncertain data, and a vision for probabilistic rule reasoning. We also study uncertainty about order, proposing a suitable representation, and study uncertain data conditioned by additional observations.Comment: 11 pages, 1 figure, 1 table. To appear in SIGMOD/PODS PhD Symposium 201

arXiv.org e-Print Archive

Crossref

Chemical information matters: an e-Research perspective on information and data sharing in the chemical sciences

Author: Bird Colin
Frey Jeremy G.
Publication venue: 'Royal Society of Chemistry (RSC)'
Publication date: 01/01/2013
Field of study

Recently, a number of organisations have called for open access to scientific information and especially to the data obtained from publicly funded research, among which the Royal Society report and the European Commission press release are particularly notable. It has long been accepted that building research on the foundations laid by other scientists is both effective and efficient. Regrettably, some disciplines, chemistry being one, have been slow to recognise the value of sharing and have thus been reluctant to curate their data and information in preparation for exchanging it. The very significant increases in both the volume and the complexity of the datasets produced has encouraged the expansion of e-Research, and stimulated the development of methodologies for managing, organising, and analysing "big data". We review the evolution of cheminformatics, the amalgam of chemistry, computer science, and information technology, and assess the wider e-Science and e-Research perspective. Chemical information does matter, as do matters of communicating data and collaborating with data. For chemistry, unique identifiers, structure representations, and property descriptors are essential to the activities of sharing and exchange. Open science entails the sharing of more than mere facts: for example, the publication of negative outcomes can facilitate better understanding of which synthetic routes to choose, an aspiration of the Dial-a-Molecule Grand Challenge. The protagonists of open notebook science go even further and exchange their thoughts and plans. We consider the concepts of preservation, curation, provenance, discovery, and access in the context of the research lifecycle, and then focus on the role of metadata, particularly the ontologies on which the emerging chemical Semantic Web will depend. Among our conclusions, we present our choice of the "grand challenges" for the preservation and sharing of chemical information

Southampton (e-Prints Soton)

Linked Data - the story so far

Author: Berners-Lee Tim
Bizer Christian
Heath Tom
Publication venue: 'IGI Global'
Publication date: 01/01/2009
Field of study

The term “Linked Data” refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertions— the Web of Data. In this article, the authors present the concept and technical principles of Linked Data, and situate these within the broader context of related technological developments. They describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked Data community as it moves forward

Southampton (e-Prints Soton)

MAnnheim DOCument Server

Requirements for Provenance on the Web

Author: Cheney J
Gil Y
Groth P.T.
Miles S
Publication venue
Publication date: 01/01/2012
Field of study

From where did this tweet originate? Was this quote from the New York Times modified? Daily, we rely on data from the Web but often it is difficult or impossible to determine where it came from or how it was produced. This lack of provenance is particularly evident when people and systems deal with Web information or with any environment where information comes from sources of varying quality. Provenance is not captured pervasively in information systems. There are major technical, social, and economic impediments that stand in the way of using provenance effectively. This paper synthesizes requirements for provenance on the Web for a number of dimensions focusing on three key aspects of provenance: the content of provenance, the management of provenance records, and the uses of provenance information. To illustrate these requirements, we use three synthesized scenarios that encompass provenance problems faced by Web users toda

CiteSeerX

Crossref

VU Research Portal

Directory of Open Access Journals

Edinburgh Research Explorer

King's Research Portal

International Journal of Digital Curation

Towards Exascale Scientific Metadata Management

Author: Blanas Spyros
Byna Surendra
Publication venue
Publication date: 29/03/2015
Field of study

Advances in technology and computing hardware are enabling scientists from all areas of science to produce massive amounts of data using large-scale simulations or observational facilities. In this era of data deluge, effective coordination between the data production and the analysis phases hinges on the availability of metadata that describe the scientific datasets. Existing workflow engines have been capturing a limited form of metadata to provide provenance information about the identity and lineage of the data. However, much of the data produced by simulations, experiments, and analyses still need to be annotated manually in an ad hoc manner by domain scientists. Systematic and transparent acquisition of rich metadata becomes a crucial prerequisite to sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and domain-agnostic metadata management infrastructure that can meet the demands of extreme-scale science is notable by its absence. To address this gap in scientific data management research and practice, we present our vision for an integrated approach that (1) automatically captures and manipulates information-rich metadata while the data is being produced or analyzed and (2) stores metadata within each dataset to permeate metadata-oblivious processes and to query metadata through established and standardized data access interfaces. We motivate the need for the proposed integrated approach using applications from plasma physics, climate modeling and neuroscience, and then discuss research challenges and possible solutions

arXiv.org e-Print Archive

eScholarship - University of California