76 research outputs found
Improving Domain Repository Connectivity
ABSTRACTDomain repositories, i.e. repositories that store, manage, and persist data pertaining to a specific scientific domain, are common and growing in the research landscape. Many of these repositories develop close, long-term communities made up of individuals and organizations that collect, analyze, and publish results based on the data in the repositories. Connections between these datasets, papers, people, and organizations are an important part of the knowledge infrastructure surrounding the repository.All these research objects, people, and organizations can now be identified using various unique and persistent identifiers (PIDs) and it is possible for domain repositories to build on their existing communities to facilitate and accelerate the identifier adoption process. As community members contribute to multiple datasets and articles, identifiers for them, once found, can be used multiple times.We explore this idea by defining a connectivity metric and applying it to datasets collected and papers published by members of the UNAVCO community. Finding identifiers in DataCite and Crossref metadata and spreading those identifiers through the UNAVCO DataCite metadata can increase connectivity from less than 10% to close to 50% for people and organizations
Mapping ISO 19115-1 geographic metadata standards to CodeMeta
The CodeMeta Project recently proposed a vocabulary for software metadata. ISO Technical Committee 211 has published a set of metadata standards for geographic data and many kinds of related resources, including software. In order for ISO metadata creators and users to take advantage of the CodeMeta recommendations, a mapping from ISO elements to the CodeMeta vocabulary must exist. This mapping is complicated by differences in the approaches used by ISO and CodeMeta, primarily a difference between hard and soft typing of metadata elements. These differences are described in detail and a mapping is proposed that includes sixty-four of the sixty-eight CodeMeta V2 terms. The CodeMeta terms have also been mapped to dialects used by twenty-one software repositories, registries and archives. The average number of terms mapped in these cases is 11.2. The disparity between these numbers reflects the fact that many of the dialects that have been mapped to CodeMeta are focused on citation or dependency identification and management while ISO and CodeMeta share additional targets that include access, use, and understanding. Addressing this broader set of use cases requires more metadata elements
Connecting Repositories to the Global Research Community: A Re-Curation Process
Over the last decade, significant changes have affected the work that data repositories of all kinds do. First, the emergence of globally unique and persistent identifiers (PIDs) has created new opportunities for repositories to engage with the global research community by connecting existing repository resources to the global research infrastructure. Second, repository use cases have evolved from data discovery to data discovery and reuse, significantly increasing metadata requirements.To respond to these evolving requirements, we need retrospective and on-going curation, i.e. re-curation, processes that 1) find identifiers and add them to existing metadata to connect datasets to a wider range of communities, and 2) add elements that support reuse to globally connected metadata.The goal of this work is to introduce the concept of re-curation with representative examples that are generally applicable to many repositories: 1) increasing completeness of affiliations and identifiers for organizations and funders in the Dryad Repository and 2) measuring and increasing FAIRness of DataCite metadata beyond required fields for institutional repositories.These re-curation efforts are a critical part of reshaping existing metadata and repository processes so they can take advantage of new connections, engage with global research communities, and facilitate data reuse
Collection Evaluation and Evolution
We will review metadata evaluation tools and share results from our most recent CMR analysis. We will demonstrate results using Google spreadsheets and present new results in terms of number of records that include specific content. We will show evolution of UMM-compliance over time and also show results of comparing various CMR collections (NASA, non-NASA, and SciOps)
Wie FAIR sind unsere Metadaten?
Im vorliegenden Erfahrungsbericht stellen wir eine Metadatenanalyse vor, welche die Metadatenqualität von 144 Repositorien des TIB-DOI-Service im Hinblick auf die Erfüllung der FAIR Data Principles, Konsistenz und Vollständigkeit untersucht. Im Ergebnis zeigt sich, dass der Fokus der untersuchten Repositorien schwerpunktmäßig auf der Auffindbarkeit der mit Metadaten beschriebenen Ressourcen liegt und im Gesamtdurchschnitt über die Metadaten-Pflichtfelder hinaus nur wenige weitere Metadaten angegeben werden. Insbesondere mit Blick auf eine angestrebte bessere Nachnutzbarkeit sowie eine stärkere Verknüpfung mit anderen in Beziehung stehenden persistenten Identifikatoren wie ORCID, ROR ID oder DOI-zu-DOI-Beziehungen mit zitierten oder zitierenden Ressourcen, bestehen noch ungenutzte Potenziale, die im Sinne einer offenen, zukunftsweisenden Wissenschaft erschlossen werden sollten. Dahingegen zeigt unsere Analyse auch einzelne Repositorien mit umfangreichen Metadaten als Best-Practice-Beispiele auf, an denen sich andere Repositorien orientieren können. Insgesamt ermöglicht die durchgeführte Metadatenanalyse die Ableitung von Handlungsempfehlungen zur passgenauen Beratung von Repositorien, die ihre Metadatenqualität verbessern möchten
Recommended from our members
Wie FAIR sind unsere Metadaten? : Eine Analyse der Metadaten in den Repositorien des TIB-DOI-Services
Im vorliegenden Erfahrungsbericht stellen wir eine Metadatenanalyse vor, welche die Metadatenqualität von 144 Repositorien des TIB-DOI-Service im Hinblick auf die Erfüllung der FAIR Data Principles, Konsistenz und Vollständigkeit untersucht. Im Ergebnis zeigt sich, dass der Fokus der untersuchten Repositorien schwerpunktmäßig auf der Auffindbarkeit der mit Metadaten beschriebenen Ressourcen liegt und im Gesamtdurchschnitt über die Metadaten-Pflichtfelder hinaus nur wenige weitere Metadaten angegeben werden. Insbesondere mit Blick auf eine angestrebte bessere Nachnutzbarkeit sowie eine stärkere Verknüpfung mit anderen in Beziehung stehenden persistenten Identifikatoren wie ORCID, ROR ID oder DOI-zu-DOI-Beziehungen mit zitierten oder zitierenden Ressourcen, bestehen noch ungenutzte Potenziale, die im Sinne einer offenen, zukunftsweisenden Wissenschaft erschlossen werden sollten. Dahingegen zeigt unsere Analyse auch einzelne Repositorien mit umfangreichen Metadaten als Best-Practice-Beispiele auf, an denen sich andere Repositorien orientieren können. Insgesamt ermöglicht die durchgeführte Metadatenanalyse die Ableitung von Handlungsempfehlungen zur passgenauen Beratung von Repositorien, die ihre Metadatenqualität verbessern möchten
Big Earth Data Initiative: Metadata Improvement: Case Studies
Big Earth Data Initiative (BEDI) The Big Earth Data Initiative (BEDI) invests in standardizing and optimizing the collection, management and delivery of U.S. Government's civil Earth observation data to improve discovery, access use, and understanding of Earth observations by the broader user community. Complete and consistent standard metadata helps address all three goals
Task 28: Web Accessible APIs in the Cloud Trade Study
This study explored three candidate architectures for serving NASA Earth Science Hierarchical Data Format Version 5 (HDF5) data via Hyrax running on Amazon Web Services (AWS). We studied the cost and performance for each architecture using several representative Use-Cases. The objectives of the project are: Conduct a trade study to identify one or more high performance integrated solutions for storing and retrieving NASA HDF5 and Network Common Data Format Version 4 (netCDF4) data in a cloud (web object store) environment. The target environment is Amazon Web Services (AWS) Simple Storage Service (S3).Conduct needed level of software development to properly evaluate solutions in the trade study and to obtain required benchmarking metrics for input into government decision of potential follow-on prototyping. Develop a cloud cost model for the preferred data storage solution (or solutions) that accounts for different granulation and aggregation schemes as well as cost and performance trades
Recommended from our members
Persistent Identification Of Instruments
Instruments play an essential role in creating research data. Given the importance of instruments and associated metadata to the assessment of data quality and data reuse, globally unique, persistent and resolvable identification of instruments is crucial. The Research Data Alliance Working Group Persistent Identification of Instruments (PIDINST) developed a community-driven solution for persistent identification of instruments which we present and discuss in this paper. Based on an analysis of 10 use cases, PIDINST developed a metadata schema and prototyped schema implementation with DataCite and ePIC as representative persistent identifier infrastructures and with HZB (Helmholtz-Zentrum Berlin für Materialien und Energie) and BODC (British Oceanographic Data Centre) as representative institutional instrument providers. These implementations demonstrate the viability of the proposed solution in practice. Moving forward, PIDINST will further catalyse adoption and consolidate the schema by addressing new stakeholder requirements
Persistent identification of instruments
Instruments play an essential role in creating research data. Given the importance of instruments and associated metadata to the assessment of data quality and data reuse, globally unique, persistent and resolvable identification of instruments is crucial. The Research Data Alliance Working Group Persistent Identification of Instruments (PIDINST) developed a community-driven solution for persistent identification of instruments which we present and discuss in this paper. Based on an analysis of 10 use cases, PIDINST developed a metadata schema and prototyped schema implementation with DataCite and ePIC as representative persistent identifier infrastructures and with HZB (Helmholtz-Zentrum Berlin für Materialien und Energie) and BODC (British Oceanographic Data Centre) as representative institutional instrument providers. These implementations demonstrate the viability of the proposed solution in practice. Moving forward, PIDINST will further catalyse adoption and consolidate the schema by addressing new stakeholder requirements
- …