20 research outputs found
CDS homogenisation of metadata from publishers
The mission of the CDS is to collect, add value to, and distribute the data published in astrophysics journals. CDS is renewing its pipeline dedicated to the analysis of articles published in the main astrophysics journals. This is the entry point for information that is processed for example for the CDS SIMBAD service (Wenger et al. 2000)-for detecting new and known astronomical sources in articles. For efficiency the CDS pipeline needs to download entire volumes or issues of articles for a journal. The text analysis software must behave in the same way regardless the journals. However, publishers provide articles in different formats such as PDF and XML (often with different schema). As such, the CDS pipelines require a pre-processing to convert all these into a single format more suitable for our analysis. Here, we describe the recent efforts to convert all the articles provided by different publishers into a single and homogeneous XML format
Bibliographical references: From publishers to SIMBAD
The SIMBAD astronomical database hosted by the CDS provides basic data, cross-identifications, bibliography and measurements for astronomical objects outside the solar system.
The CDS receives the bibliographic meta-data of the articles published in the main astronomical journals directly from the publishers. How we receive the data and their format vary from one publisher to the next. These data are first extracted and stored in files with a standardised format. Then, to avoid errors or misprints, we perform different tests on these data:
- Author names are compared to a reference list maintained at CDS, and the keywords are compared with the AAS list
- Astronomical objects are verified by checking their name in the SIMBAD database
- A completion test checks that all of articles of a journal volume are present
The next step identifies whether an astronomical object appears inside a title, a keyword or an abstract, and if so, we add a link to the object in SIMBAD. Once all of the verifications and corrections have been made we add the meta-data into SIMBAD. We also add other information such as the number of different astronomical objects studied in the paper, the presence tables and their links to VizieR, any new acronyms, as well as some comments. New developments are in progress to automatically extract the data from the tables in the articles (that have not been processed by, or provided to VizieR) . In addition, each night automatic checks are executed to list the new data and to test the coherence of these data in SIMBAD
Bibliographical references: From publishers to SIMBAD
The SIMBAD astronomical database hosted by the CDS provides basic data, cross-identifications, bibliography and measurements for astronomical objects outside the solar system.
The CDS receives the bibliographic meta-data of the articles published in the main astronomical journals directly from the publishers. How we receive the data and their format vary from one publisher to the next. These data are first extracted and stored in files with a standardised format. Then, to avoid errors or misprints, we perform different tests on these data:
- Author names are compared to a reference list maintained at CDS, and the keywords are compared with the AAS list
- Astronomical objects are verified by checking their name in the SIMBAD database
- A completion test checks that all of articles of a journal volume are present
The next step identifies whether an astronomical object appears inside a title, a keyword or an abstract, and if so, we add a link to the object in SIMBAD. Once all of the verifications and corrections have been made we add the meta-data into SIMBAD. We also add other information such as the number of different astronomical objects studied in the paper, the presence tables and their links to VizieR, any new acronyms, as well as some comments. New developments are in progress to automatically extract the data from the tables in the articles (that have not been processed by, or provided to VizieR) . In addition, each night automatic checks are executed to list the new data and to test the coherence of these data in SIMBAD
TWiki: A Collaborative Space of Internal Documentation, an Efficient Way to Work Together
International audienceThe documentalists at Strasbourg astronomical Data Center (CDS) mine publications in order to update the SIMBAD and VizieR databases with astronomical data. The process of mining publications is quite complex and, over time, the databases and tools used evolve as the field of astronomy evolves. The ingest process needs to be agreed upon, well described, and shared by all involved. This requires specific knowledge and mutual support among the documentalists in interaction with computer engineers and astronomers. The documentalists at CDS have therefore organized and enriched their internal documentation; the wiki collaborative tool is an efficient framework to do so. For more than a decade, the CDS has been developing a "TWiki" collaborative space. Recently, we have created a working group to refurbish the collaborative space and it is now better structured and clearer and this provides new functionality, giving the user a better experience
The cost-saving effect of centralized histological reviews with soft tissue and visceral sarcomas, GIST, and desmoid tumors: The experiences of the pathologists of the French Sarcoma Group
International audienc
A New Bibliographical Feature for SIMBAD: Highlighting the Most Relevant Papers for One Astronomical Object
International audienceThe number of bibliographical references attached to an astronomical object in SIMBAD is has been growing continuously over the years. It is important for astronomers to retrieve the most relevant papers, those that give important information about the object of study. This is not easy since there can be many references attached to one object. For instance, in 2014, more than 15,000 objects had been attached to more than 50 references. The location of the object's citations inside the paper and its number of occurrences are important criteria to extract the most relevant papers. Since 2008, because of the DJIN application (a semi-automatic tool to search for object names in full text) this information has been collected. For each article associated with an astronomical object, we know where it is cited and how many times and with which name it appears. Since September 2013, the users of SIMBAD web site can choose to retrieve the most relevant references for an astronomical object depending on its location in the publication. A new formula to sort references by combining all locations, number of occurrences, total number of objects studied, citation count, and year is presented in this paper
Discordant diagnoses in sarcoma, GIST and desmoide tumour in France: results from the network RREPS
International audienc
Discordant diagnoses in sarcoma, GIST and desmoide tumour in France: results from the network RREPS
International audienc
How Documentalists Update SIMBAD
International audienceThe Strasbourg astronomical Data Center (CDS) was created in 1972 and has had a major role in astronomy for more than forty years. CDS develops a service called SIMBAD that provides basic data, cross-identifications, bibliography, and measurements for astronomical objects outside the solar system. It brings to the scientific community an added value to content which is updated daily by a team of documentalists working together in close collaboration with astronomers and IT specialists. We explain how the CDS staff updates SIMBAD with object citations in the main astronomical journals, as well as with astronomical data and measurements. We also explain how the identification is made between the objects found in the literature and those already existing in SIMBAD. We show the steps followed by the documentalist team to update the database using different tools developed at CDS, like the sky visualizer Aladin, and the large catalogues and survey database VizieR. As a direct result of this teamwork, SIMBAD integrates almost 10.000 bibliographic references per year. The service receives more than 400.000 queries per day