7,047 research outputs found

    The HyperBagGraph DataEdron: An Enriched Browsing Experience of Multimedia Datasets

    Full text link
    Traditional verbatim browsers give back information in a linear way according to a ranking performed by a search engine that may not be optimal for the surfer. The latter may need to assess the pertinence of the information retrieved, particularly when sâ‹…\cdothe wants to explore other facets of a multi-facetted information space. For instance, in a multimedia dataset different facets such as keywords, authors, publication category, organisations and figures can be of interest. The facet simultaneous visualisation can help to gain insights on the information retrieved and call for further searches. Facets are co-occurence networks, modeled by HyperBag-Graphs -- families of multisets -- and are in fact linked not only to the publication itself, but to any chosen reference. These references allow to navigate inside the dataset and perform visual queries. We explore here the case of scientific publications based on Arxiv searches.Comment: Extension of the hypergraph framework shortly presented in arXiv:1809.00164 (possible small overlaps); use the theoretical framework of hb-graphs presented in arXiv:1809.0019

    Finding scientific articles in a large digital archive: BioStor and the Biodiversity Heritage Library

    Get PDF
    The Biodiversity Heritage Library (BHL) is a large digital archive of legacy biological literature, comprising over 31 million pages scanned from books, monographs, and journals. During the digitisation process basic metadata about the scanned items is recorded, but not article-level metadata. Given that the article is the standard unit of citation, this makes it difficult to locate cited literature in BHL. Adding the ability to easily find articles in BHL would greatly enhance the value of the archive. A service was developed to locate articles in BHL based on matching article metadata to BHL metadata using approximate string matching, regular expressions, and string alignment. This article finding service is exposed as a standard OpenURL resolver on the BioStor web site "http://biostor.org/openurl/":http://biostor.org/openurl/. This resolver can be used on the web, or called by bibliographic tools that support OpenURL. BioStor provides tools for extracting, annotating, and visualising articles from the Biodiversity Heritage Library. BioStor is available from "http://biostor.org/":http://biostor.org/

    IVOA Recommendation: Data Model for Astronomical DataSet Characterisation

    Full text link
    This document defines the high level metadata necessary to describe the physical parameter space of observed or simulated astronomical data sets, such as 2D-images, data cubes, X-ray event lists, IFU data, etc.. The Characterisation data model is an abstraction which can be used to derive a structured description of any relevant data and thus to facilitate its discovery and scientific interpretation. The model aims at facilitating the manipulation of heterogeneous data in any VO framework or portal. A VO Characterisation instance can include descriptions of the data axes, the range of coordinates covered by the data, and details of the data sampling and resolution on each axis. These descriptions should be in terms of physical variables, independent of instrumental signatures as far as possible. Implementations of this model has been described in the IVOA Note available at: http://www.ivoa.net/Documents/latest/ImplementationCharacterisation.html Utypes derived from this version of the UML model are listed and commented in the following IVOA Note: http://www.ivoa.net/Documents/latest/UtypeListCharacterisationDM.html An XML schema has been build up from the UML model and is available at: http://www.ivoa.net/xml/Characterisation/Characterisation-v1.11.xsdComment: http://www.ivoa.ne

    Towards Exascale Scientific Metadata Management

    Full text link
    Advances in technology and computing hardware are enabling scientists from all areas of science to produce massive amounts of data using large-scale simulations or observational facilities. In this era of data deluge, effective coordination between the data production and the analysis phases hinges on the availability of metadata that describe the scientific datasets. Existing workflow engines have been capturing a limited form of metadata to provide provenance information about the identity and lineage of the data. However, much of the data produced by simulations, experiments, and analyses still need to be annotated manually in an ad hoc manner by domain scientists. Systematic and transparent acquisition of rich metadata becomes a crucial prerequisite to sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and domain-agnostic metadata management infrastructure that can meet the demands of extreme-scale science is notable by its absence. To address this gap in scientific data management research and practice, we present our vision for an integrated approach that (1) automatically captures and manipulates information-rich metadata while the data is being produced or analyzed and (2) stores metadata within each dataset to permeate metadata-oblivious processes and to query metadata through established and standardized data access interfaces. We motivate the need for the proposed integrated approach using applications from plasma physics, climate modeling and neuroscience, and then discuss research challenges and possible solutions
    • …
    corecore