11 research outputs found

    Quality Assurance in the Ingestion of Data into the CDS VizieR Catalogue and Data Services

    No full text
    International audienceVizieR is a reference service provided by the CDS for astronomical catalogues and tables published in academic journals (Ochsenbein et al. 2000), and also for associated data. Quality assurance is a key factor that guides the operations, development and maintenance of the data ingestion procedures. The catalogue ingestion pipeline involves a number of validation steps, which must be implemented with high efficiency to process the 1200 catalogues per year from the major astronomy journals. These processes involve integrated teams of software engineers, specialized data librarians (documentalists) and astronomers, and various levels of interaction with the original authors and data providers. Procedures for the ingestion of associated data (Landais 2016) have recently been improved with semi-automatic mapping of metadata into the IVOA ObsCore standard, with an interactive tool to help authors submit their data (images, spectra, time series etc.). We present an overview of the quality assurance procedures in place for the operation of the VizieR pipelines, and identify the future challenges of increasing volumes and complexity of data. We highlight the lessons learned from implementing the FITS metadata mapping tools for authors and data providers. We show how the quality assurance is an essential part of making the VizieR data comply with FAIR (Findable, Accessible, Interoperable and Re-useable) principles, and the necessity of quality assurance in for the operational aspects of supporting more than 300,000 VizieR queries per day through multiple interactive and programmatic interfaces

    COSIM: The necessary evolution of a cross-identification tool along with data evolution

    No full text
    SIMBAD is a bibliographic added-value database on astronomical objects, where the data on individual objects are cross-identified as far as possible. The data comes exclusively from what has been published by the scientific community. To treat large tables, the work is done semi-automatically with the help of a customized software. Since 2014, we are using a new one, called COSIM (Comparison of Objects for SIMBAD). It meets the new requirements which is a consequence of the evolution of the available astronomical data. It has increased in number, accuracy and diversity. On the basis of the data presented in a published table, COSIM searches for objects that are already known in SIMBAD, by name or by coordinates. A combination of scores based on the available and comparable parameters, like the main object type, coordinates, velocity and magnitudes, suggests whether the candidate is good for cross-identification or not. As soon as the result of the search is clear, indicating that there is either no matching candidate or only one good candidate, COSIM creates the commands necessary for updating the SIMBAD database. The documentalists can act on the method of calculation of each score, according to the nature of the objects in the table. Thus, with COSIM the documentalists manage to obtain a good cross-identification level with a minimum risk of omitted or false cross-identifications in a relatively short time compared to the treated data number

    COSIM: The necessary evolution of a cross-identification tool along with data evolution

    No full text
    SIMBAD is a bibliographic added-value database on astronomical objects, where the data on individual objects are cross-identified as far as possible. The data comes exclusively from what has been published by the scientific community. To treat large tables, the work is done semi-automatically with the help of a customized software. Since 2014, we are using a new one, called COSIM (Comparison of Objects for SIMBAD). It meets the new requirements which is a consequence of the evolution of the available astronomical data. It has increased in number, accuracy and diversity. On the basis of the data presented in a published table, COSIM searches for objects that are already known in SIMBAD, by name or by coordinates. A combination of scores based on the available and comparable parameters, like the main object type, coordinates, velocity and magnitudes, suggests whether the candidate is good for cross-identification or not. As soon as the result of the search is clear, indicating that there is either no matching candidate or only one good candidate, COSIM creates the commands necessary for updating the SIMBAD database. The documentalists can act on the method of calculation of each score, according to the nature of the objects in the table. Thus, with COSIM the documentalists manage to obtain a good cross-identification level with a minimum risk of omitted or false cross-identifications in a relatively short time compared to the treated data number

    Associated data: Indexation, discovery, challenges and roles

    No full text
    Astronomers are nowadays required by their funding agencies to make the data obtained through public-financed means (ground and space observatories and labs) available to the public and the community at large. This is a fundamental step in enabling the open science paradigm the astronomical community is striving for. In other words, tabular data (catalogs) arriving to CDS for ingestion into its databases, in particular VizieR, is more and more frequently accompanied by the reduced observed dataset (spectra, images, data cubes, time series). While the benefits of making this associated data available are obvious, the task is very challenging: in this context "big data" takes the meaning of "extremely heterogeneous data", with a diversity of formats and practices among astronomers, even within the FITS standard. Providing librarians with efficient tools to index this data and generate the relevant metadata is therefore paramount

    Associated data: Indexation, discovery, challenges and roles

    No full text
    Astronomers are nowadays required by their funding agencies to make the data obtained through public-financed means (ground and space observatories and labs) available to the public and the community at large. This is a fundamental step in enabling the open science paradigm the astronomical community is striving for. In other words, tabular data (catalogs) arriving to CDS for ingestion into its databases, in particular VizieR, is more and more frequently accompanied by the reduced observed dataset (spectra, images, data cubes, time series). While the benefits of making this associated data available are obvious, the task is very challenging: in this context "big data" takes the meaning of "extremely heterogeneous data", with a diversity of formats and practices among astronomers, even within the FITS standard. Providing librarians with efficient tools to index this data and generate the relevant metadata is therefore paramount

    Management of Catalogs at CDS

    No full text
    International audienceVizieR (Ochsenbein et al. 2000) provides access to the most complete library of published astronomical catalogs (data tables and associated data) available online and organized in a self-documented database. (There were 11769 catalogs in November 2013.) Indexing the metadata in the VizieR search engine requires the expertise of scientists and documentalists for each catalog ingested. The metadata go into an efficient position search engine that is adapted to big data. (For instance, the GAIA simulation catalog has more than two billion objects). Information in VizieR tables is well described and can be retrieved easily. The search results provide visibility to catalogs with tools and protocols to disseminate data to the Virtual Observatory, thus giving scientists data that is reusable by dedicated tools (e.g. image vizualisation tools). Also, new functionality allows users to extract all photometric data in catalogs for a given position. Finally, it is also through cross-identification tools that the CDS becomes a partner in producing large data sets, such as GAIA

    TWiki: A Collaborative Space of Internal Documentation, an Efficient Way to Work Together

    No full text
    International audienceThe documentalists at Strasbourg astronomical Data Center (CDS) mine publications in order to update the SIMBAD and VizieR databases with astronomical data. The process of mining publications is quite complex and, over time, the databases and tools used evolve as the field of astronomy evolves. The ingest process needs to be agreed upon, well described, and shared by all involved. This requires specific knowledge and mutual support among the documentalists in interaction with computer engineers and astronomers. The documentalists at CDS have therefore organized and enriched their internal documentation; the wiki collaborative tool is an efficient framework to do so. For more than a decade, the CDS has been developing a "TWiki" collaborative space. Recently, we have created a working group to refurbish the collaborative space and it is now better structured and clearer and this provides new functionality, giving the user a better experience

    A New Bibliographical Feature for SIMBAD: Highlighting the Most Relevant Papers for One Astronomical Object

    No full text
    International audienceThe number of bibliographical references attached to an astronomical object in SIMBAD is has been growing continuously over the years. It is important for astronomers to retrieve the most relevant papers, those that give important information about the object of study. This is not easy since there can be many references attached to one object. For instance, in 2014, more than 15,000 objects had been attached to more than 50 references. The location of the object's citations inside the paper and its number of occurrences are important criteria to extract the most relevant papers. Since 2008, because of the DJIN application (a semi-automatic tool to search for object names in full text) this information has been collected. For each article associated with an astronomical object, we know where it is cited and how many times and with which name it appears. Since September 2013, the users of SIMBAD web site can choose to retrieve the most relevant references for an astronomical object depending on its location in the publication. A new formula to sort references by combining all locations, number of occurrences, total number of objects studied, citation count, and year is presented in this paper

    How Documentalists Update SIMBAD

    No full text
    International audienceThe Strasbourg astronomical Data Center (CDS) was created in 1972 and has had a major role in astronomy for more than forty years. CDS develops a service called SIMBAD that provides basic data, cross-identifications, bibliography, and measurements for astronomical objects outside the solar system. It brings to the scientific community an added value to content which is updated daily by a team of documentalists working together in close collaboration with astronomers and IT specialists. We explain how the CDS staff updates SIMBAD with object citations in the main astronomical journals, as well as with astronomical data and measurements. We also explain how the identification is made between the objects found in the literature and those already existing in SIMBAD. We show the steps followed by the documentalist team to update the database using different tools developed at CDS, like the sky visualizer Aladin, and the large catalogues and survey database VizieR. As a direct result of this teamwork, SIMBAD integrates almost 10.000 bibliographic references per year. The service receives more than 400.000 queries per day

    Working Together at CDS: The Symbiosis Between Astronomers, Documentalists, and IT Specialists

    No full text
    International audienceSince the CDS (Centre de Données astronomiques de Strasbourg) began a little more than forty years ago, astronomers, documentalists, and information technology (IT) specialists have been working together. The synergy between these three professional groups support the core of the work and is becoming more and more crucial with the increasing volume and complexity of data handled. The astronomers use their understanding of the subject and of users' needs to help to maintain the accuracy and the relevance of data. The computer engineers enhance these data by maintaining the database framework and continuing to add useful tools to retrieve and reuse this content. Finally, the documentalists, by definition, manage the content. They do so with the help of IT tools developed at CDS; they analyze the publications, extract the relevant information, verify the data, make comparisons with existing data, add the useful information in VizieR and SIMBAD, and confer with astronomers to make corrections, if needed. After an historical review of the evolution in data and the way data have been provided at CDS, we will further discuss the fundamental roles of the three professional groups to support the mission of the CDS
    corecore