712 research outputs found

    Interoperability and FAIRness through a novel combination of Web technologies

    Get PDF
    Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT). These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not. The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability. Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings. We show that by using off-the-shelf technologies, interoperability can be achieved atthe level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles. The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs

    The OBAA Standard for Developing Repositories of Learning Objects: the Case of Ocean Literacy in Azores

    Get PDF
    This paper describes the existing web resources of learning objects to promote ocean literacy. The several projects and sites are explored, and the shortcomings revealed. The limitations identified include insufficient metadata about registered learning objects and lack of support for intelligent applications. As solution, we promote the seaThings project that relies on a multi-disciplinary approach to promote literacy in the marine environment by implementing a specific Learning Objects repositories (LOR) and a federation of repositories (FED), supported by a OBAA, a versatile and innovative standard that will provide the necessary support for intelligent applications for education purposes, to be used in schools and other educational institutions.info:eu-repo/semantics/publishedVersio

    Constructing a biodiversity terminological inventory.

    Get PDF
    The increasing growth of literature in biodiversity presents challenges to users who need to discover pertinent information in an efficient and timely manner. In response, text mining techniques offer solutions by facilitating the automated discovery of knowledge from large textual data. An important step in text mining is the recognition of concepts via their linguistic realisation, i.e., terms. However, a given concept may be referred to in text using various synonyms or term variants, making search systems likely to overlook documents mentioning less known variants, which are albeit relevant to a query term. Domain-specific terminological resources, which include term variants, synonyms and related terms, are thus important in supporting semantic search over large textual archives. This article describes the use of text mining methods for the automatic construction of a large-scale biodiversity term inventory. The inventory consists of names of species, amongst which naming variations are prevalent. We apply a number of distributional semantic techniques on all of the titles in the Biodiversity Heritage Library, to compute semantic similarity between species names and support the automated construction of the resource. With the construction of our biodiversity term inventory, we demonstrate that distributional semantic models are able to identify semantically similar names that are not yet recorded in existing taxonomies. Such methods can thus be used to update existing taxonomies semi-automatically by deriving semantically related taxonomic names from a text corpus and allowing expert curators to validate them. We also evaluate our inventory as a means to improve search by facilitating automatic query expansion. Specifically, we developed a visual search interface that suggests semantically related species names, which are available in our inventory but not always in other repositories, to incorporate into the search query. An assessment of the interface by domain experts reveals that our query expansion based on related names is useful for increasing the number of relevant documents retrieved. Its exploitation can benefit both users and developers of search engines and text mining applications

    The ORKG R Package and Its Use in Data Science

    Get PDF
    Research infrastructures and services provide access to (meta)data via user interfaces and APIs. The more advanced services also support access through (Python, R, etc.) packages that users can use in computational environments. For scientific information as a particular kind of research data, the Open Research Knowledge Graph (ORKG) is an example of an advanced service that also supports accessing data from Python scripts. Since many research communities use R as the statistical language of choice, we have developed the ORKG R package to support accessing and processing ORKG data directly from R scripts. Inspired by the Python library, the ORKG R package supports a comparable set of features through a similar programmatic interface. Having developed the ORKG R package, we demonstrate its use in various applications grounded in life science and soil science research fields. As an additional key contribution of this work, we show how the ORKG R package can be used in combination with ORKG templates to support the pre-publication production and publication of machine-readable scientific information, during the data analysis phase of the research life cycle and directly in the scripts that produce scientific information. This new mode of machine-readable scientific information production complements the post-publication Crowdsourcing-based manual and NLP-based automated approaches with the major advantages of unmatched high accuracy and fine granularity

    User-centered semantic dataset retrieval

    Get PDF
    Finding relevant research data is an increasingly important but time-consuming task in daily research practice. Several studies report on difficulties in dataset search, e.g., scholars retrieve only partial pertinent data, and important information can not be displayed in the user interface. Overcoming these problems has motivated a number of research efforts in computer science, such as text mining and semantic search. In particular, the emergence of the Semantic Web opens a variety of novel research perspectives. Motivated by these challenges, the overall aim of this work is to analyze the current obstacles in dataset search and to propose and develop a novel semantic dataset search. The studied domain is biodiversity research, a domain that explores the diversity of life, habitats and ecosystems. This thesis has three main contributions: (1) We evaluate the current situation in dataset search in a user study, and we compare a semantic search with a classical keyword search to explore the suitability of semantic web technologies for dataset search. (2) We generate a question corpus and develop an information model to figure out on what scientific topics scholars in biodiversity research are interested in. Moreover, we also analyze the gap between current metadata and scholarly search interests, and we explore whether metadata and user interests match. (3) We propose and develop an improved dataset search based on three components: (A) a text mining pipeline, enriching metadata and queries with semantic categories and URIs, (B) a retrieval component with a semantic index over categories and URIs and (C) a user interface that enables a search within categories and a search including further hierarchical relations. Following user centered design principles, we ensure user involvement in various user studies during the development process

    Library and Information Science Education and eScience: The Current State of ALA Accredited MLS/MLIS Programs in Preparing Librarians and Information Professionals for eScience Needs

    Get PDF
    The purpose of this study is multifaceted: 1) to describe eScience research in acomprehensive way; 2) to help library and information specialists understand the realm of eScience research and the information needs of the community and demonstrate the importance of LIS professionals within the eScience domain; 3) and to explore the current state of curricular content of ALA accredited MLS/MLIS programs to understand the extent to which they prepare new professionals within eScience librarianship. The literature review focuses heavily on eScientists and other data-driven researchers’ information service needs in addition to demonstrating how and why librarians and information specialists can and should fulfill these service gaps and information needs within eScience research. By looking at the current curriculum of American Library Association (ALA) accredited MLS/MLIS programs, we can identify potential gaps in knowledge and where to improve in order to prepare and train new MLS/MLIS graduates to fulfill the needs of eScientists. This investigation is meant to be informative and can be used as a tool for LIS programs to assess their curriculums in comparison to the needs of eScience and other data-driven and networked research. Finally, this investigation will provide awareness and insight into the services needed to support a thriving eScience and data-driven research community to the LIS profession

    Discovery and publishing of primary biodiversity data associated with multimedia resources: The Audubon Core strategies and approaches

    Get PDF
    The Audubon Core Multimedia Resource Metadata Schema is a representation-free vocabulary for the description of biodiversity multimedia resources and collections, now in the final stages as a proposed Biodiversity Informatics Standards (TDWG) standard. By defining only six terms as mandatory, it seeks to lighten the burden for providing or using multimedia useful for biodiversity science. At the same time it offers rich optional metadata terms that can help curators of multimedia collections provide authoritative media that document species occurrence, ecosystems, identification tools, ontologies, and many other kinds of biodiversity documents or data. About half of the vocabulary is re-used from other relevant controlled vocabularies that are often already in use for multimedia metadata, thereby reducing the mapping burden on existing repositories. A central design goal is to allow consuming applications to have a high likelihood of discovering suitable resources, reducing the human examination effort that might be required to decide if the resource is fit for the purpose of the application

    Producing Linked Open Dataset from Bibliographic Data with Integration of External Data Sources for Academic Libraries

    Get PDF
    This paper has focused on transformation of bibliographic data to linked open data (LOD) as RDF(Resource Description Framework) triple model with integration of external resources. Library & Information centres and knowledge centres deal with various types of databases like bibliographic databases, full text databases, archival databases, statistical databases, CD/DVD ROM databases and more. Presently, web technology changes storing, processing, and disseminating services rapidly. The semantic web technology is an advance technology of web platform which provides structured data on web for describing and retrieving by the organization or institutions. It may provide more information from other external resources to the users. The main objective of this paper is transformation of library bibliographic data, based on MARC21, to RDF triple format as LOD with enrichment of external LOD dataset. External resources like OpenLibrary, VIAF, Wikidata, DBpedia, GeoNames etc. We have proposed a Workflow model (Figure-1) to visualize details steps, activities, components for transforming bibliographic data to LOD dataset. The methodology of this work includes the various methods and steps for conducting such research work. Here we have used an open source tool OpenRefine (version 3.2), formally it is known as GoogleRefine. The OpenRefine tool is used for managing and organizing the messy data with different attribute like row-column manipulation, reconciliation manipulation, different format manipulation like XML, JSON, N-Triple, RDF etc. The OpenRefine tool has played the various roles for the research work such as insertion of URI column, link generation, reconciliation data for external sources, conversion of source format to RDF format etc. After conversion of whole bibliographic data into RDF triple format as considerable LOD dataset. At the production page we may find a RDF file of bibliographic data. This LOD dataset may further be used by the organizations or institutions for their advanced bibliographic service
    corecore