12 research outputs found

    Quality Assurance in the Ingestion of Data into the CDS VizieR Catalogue and Data Services

    No full text
    International audienceVizieR is a reference service provided by the CDS for astronomical catalogues and tables published in academic journals (Ochsenbein et al. 2000), and also for associated data. Quality assurance is a key factor that guides the operations, development and maintenance of the data ingestion procedures. The catalogue ingestion pipeline involves a number of validation steps, which must be implemented with high efficiency to process the 1200 catalogues per year from the major astronomy journals. These processes involve integrated teams of software engineers, specialized data librarians (documentalists) and astronomers, and various levels of interaction with the original authors and data providers. Procedures for the ingestion of associated data (Landais 2016) have recently been improved with semi-automatic mapping of metadata into the IVOA ObsCore standard, with an interactive tool to help authors submit their data (images, spectra, time series etc.). We present an overview of the quality assurance procedures in place for the operation of the VizieR pipelines, and identify the future challenges of increasing volumes and complexity of data. We highlight the lessons learned from implementing the FITS metadata mapping tools for authors and data providers. We show how the quality assurance is an essential part of making the VizieR data comply with FAIR (Findable, Accessible, Interoperable and Re-useable) principles, and the necessity of quality assurance in for the operational aspects of supporting more than 300,000 VizieR queries per day through multiple interactive and programmatic interfaces

    Associated data: Indexation, discovery, challenges and roles

    No full text
    Astronomers are nowadays required by their funding agencies to make the data obtained through public-financed means (ground and space observatories and labs) available to the public and the community at large. This is a fundamental step in enabling the open science paradigm the astronomical community is striving for. In other words, tabular data (catalogs) arriving to CDS for ingestion into its databases, in particular VizieR, is more and more frequently accompanied by the reduced observed dataset (spectra, images, data cubes, time series). While the benefits of making this associated data available are obvious, the task is very challenging: in this context "big data" takes the meaning of "extremely heterogeneous data", with a diversity of formats and practices among astronomers, even within the FITS standard. Providing librarians with efficient tools to index this data and generate the relevant metadata is therefore paramount

    Associated data: Indexation, discovery, challenges and roles

    No full text
    Astronomers are nowadays required by their funding agencies to make the data obtained through public-financed means (ground and space observatories and labs) available to the public and the community at large. This is a fundamental step in enabling the open science paradigm the astronomical community is striving for. In other words, tabular data (catalogs) arriving to CDS for ingestion into its databases, in particular VizieR, is more and more frequently accompanied by the reduced observed dataset (spectra, images, data cubes, time series). While the benefits of making this associated data available are obvious, the task is very challenging: in this context "big data" takes the meaning of "extremely heterogeneous data", with a diversity of formats and practices among astronomers, even within the FITS standard. Providing librarians with efficient tools to index this data and generate the relevant metadata is therefore paramount
    corecore