12 research outputs found

    Looking at a digital research data archive - Visual interfaces to EASY

    Full text link
    In this paper we explore visually the structure of the collection of a digital research data archive in terms of metadata for deposited datasets. We look into the distribution of datasets over different scientific fields; the role of main depositors (persons and institutions) in different fields, and main access choices for the deposited datasets. We argue that visual analytics of metadata of collections can be used in multiple ways: to inform the archive about structure and growth of its collection; to foster collections strategies; and to check metadata consistency. We combine visual analytics and visual enhanced browsing introducing a set of web-based, interactive visual interfaces to the archive's collection. We discuss how text based search combined with visual enhanced browsing enhances data access, navigation, and reuse.Comment: Submitted to the TPDL 201

    Visual exploration of the attribute space of DANS EASY metadata

    No full text
    Study of the metadata of the Electronic Archiving System (EASY) of Data Archiving and Networked Services (DANS) for the purpose of getting insight in the internal structure of the collection. The visualization contains a dump of the EASY metadata set and all important data files that were generated during this analysis and used for the interactive website. It contains metadata extracted from EASY version I (before January 1, 2012) and from EASY II (extracted January 20th, 2012)

    FAIR Digital Objects in Official Statistics

    No full text
    Introduction*1Statistical offices on national and international scale provide statistics on demography, labour, income, society, economy, environment and other domains. Their collective output is usually referred to as ‘Official Statistics’. These offices have a long tradition of publishing data fairly and open, which is often part of their mission statement. For decades they have been providing websites with articles, press releases, graphs and tables of data for free, for research, for policy-making, and for common understanding. However, for users it often is not so easy to find the data needed, to (re-)use it in data-driven work or to refer to the right (sub)set of data in a sustainable way. Therefore, in this article we take a closer look at Official Statistics from a findable, accessibility, interoperability, and reusability (FAIR) perspective.Digital Objects in StatisticsDigital objects in official statistics can be identified on multiple levels. The core concept is the statistical fact: a number describing a certain estimate on a certain phenomenon in a certain population over a certain period of time. For example the estimated number of elderly inhabitants in Province Friesland (the Netherlands) on Jan 1, 2020, or the inflation in Belgium for fruits in 2021 are both statistical facts. Each of these statistical facts is uniquely defined and published as a digital object in the online statistical databases of Statistics Netherlands and Eurostat respectively. Statistical facts may have a production status (provisionary, final, revised) and are typically visualized as a number in a table cell or in a chart.Data without metadata are without meaning. A statistical fact refers to metadata (region, time, subject, population, uncertainty, quality etc.) which are essential to understand the context of the fact. We make a distinction here between structural or conceptual metadata, i.e. the structure and definitions of concepts, dimensions and types of data used, and referential metadata, i.e. descriptive information on the dataset. The metadata are of utmost importance to the data consumer to understand the data. Metadata have their own dynamics, e.g. classifications change over time. They are published as digital objects too, for example the statistical classification of economic activities (NACE).Statistical facts and their metadata form the foundation for higher level statistics products. News releases and thematic articles that explain statistics in a broader context are examples. This higher level content can be seen as digital objects too as it is usually the main entry level for the general public and search engines and enables their findability and accessibility.Standards and FAIREach digital object in official statistics has its own structure, dynamics, dissemination channels and standards. This can make it sometimes hard to work with data from official statistics.Statistical databases differ among statistical organizations, both technically as well as in metadata and the API’s that they offer for automated access. Main standards in this field are the Statistical Data and Metadata eXchange (SDMX), JSON-stat, OData, or simple formats such as CSV. Commonly agreed structural metadata is organized into SDMX registries (global registry, Eurostat registry), which provide automated access to statistical metadata, which is good for accessibility.The SDMX standard is actually targeted to statistical and financial data which may hinder wider reusability. Therefore some statistical offices are moving to semantic standards. An an example are the vocabularies and classifications published as linked open data by Statistics Netherlands. Publishing metadata this way makes it possible to reuse and link data across organizations and gives semantic structure that is machine readable. Another example is from the statistical office of the European Union, Eurostat, that is converting the statistical classifications and correspondence tables from their current metadata system into Linked Open Data in the EU Vocabularies website. The representation is based on XKOS, an ontology for modelling statistical classifications, offering machine-readable access for reusing objects as well as facilitating linking among classifications on national, EU or international level. Yet another initiative is from the United Nations Economic Commission for Europe (UNECE), where statistical organizations collectively develop a Core Ontology for Official Statistics (COOS) describing the statistical production process. All in all for structural metadata, statistical organizations are increasingly moving towards linked data standards to better align to non-statistical communities.In the field of referential metadata the Single Integrated Metadata Structure (SIMS) is used. It offers machine-readable descriptive metadata such as unit of measure, reference period, confidentiality, quality, accuracy etc. Some of the elements are also covered in the widely used RDF-based Data Catalog Vocabulary (DCAT) and the statistical variant (STAT-DCAT), which raises the question whether a further integration of these could improve FAIR-ness of statistical referential metadata.With respect to higher level digital objects, such as statistical articles, the use of semantic web ontologies such as schema.org and Dublin Core for annotating statistical output in common terms are increasingly being used. The use of Digital Object Identifiers (DOIs) where applicable makes it easier to refer to statistical output.From the above we can see that the use of different standards at different levels creates various ways to identify statistical content, such as Uniform Resource Names (URNs), SDMX identifiers, Digital Object Odentifiers (DOIs), Uniform Resource Identifiers (URIs) or organization specific identifiers. Although they probably all satisfy FAIR principle A1, from a user perspective it would be good to minimize variety here.Wrap-upAlthough official statistics have a long tradition and experience in publishing open data, the FAIR principles are an excellent vehicle to further improve findability and enable data-driven work. Openness is not enough, the facts, structural and referential metadata and higher level statistical digital objects should ideally all be optimized from a FAIR point of view. The mix of standards being used at various levels and the distributed statistical system in official statistics may hinder reusability. Moving to semantic-interoperability via generally accepted linked data standards is ongoing and has the promise to increase the reusability of statistics into a broader web of (meta)data. This makes trustful statistics more FAIR, better searchable, findable and interpretable which is necessary for a further integration of official statistics into wider communities

    Design Flow Management: More than Convenient Tool Invocation

    No full text
    The term design flow management is sometimes used for facilities that hardly offer more functionality than showing a graph of tool icons that describes the preferred order in which these tools should be executed. In this paper, we argue that a design flow management system (design flow system, for short) can and should be much more than that. If a design flow system supports the definition and visualization of data dependencies between tools, distinguishes between different design functions of a tool and guarantees the correctness of a design with respect to a configured design flow, it is more suited for use in a real-world design environment. In addition, a powerful design flow system should take the hierarchical decomposition and the different representations of a design into consideration and employ an intuitive mechanism for user-interaction. In this paper, we highlight these aspects of design flow management, based on the experiences gained by building design flow management in t..

    An observational method for determining daily and regional photovoltaic solar energy statistics

    Get PDF
    This paper presents a classical estimation problem for calculating the energy generated by photovoltaic solar energy systems, on a daily, annual, regional and national basis. Our methodology relies on two data sources: PVOutput, an online portal with solar energy production measurements, and modelled irradiance data available for large parts of Africa and Europe, from the Royal Netherlands Meteorological Institute. Combining these, we obtain probability functions of observing energy production, given the irradiation. These are applied to a PV systems database, using Monte Carlo sampling, allowing daily and annual solar energy production to be calculated. These are, in turn, used to calculate solar energy production per municipality. As a case study, we apply this methodology to one country in particular, namely the Netherlands. By examining the variation in our estimates as a result of taking different subsets of PVOutput systems with certain specifications such as azimuth, tilt and inverter loading ratio, we obtain specific annual energy yields in the range of 877-946kWh/kWp and 838-899kWh/kWp for 2016 and 2017 respectively. The current method used at Statistics Netherlands assumes this to be 875kWh/kWp, irrespective of irradiation, meaning the yields were underestimated in 2016 and overestimated in 2017. In the case of the Netherlands, this research demonstrates that an irradiation based measure of solar energy generation is necessary. More generally, this research shows that different types of open data sources may be combined to develop models that calculate the energy production of PV system populations

    The need for timely official statistics. The COVID-19 pandemic as a driver for innovation

    Get PDF
    This paper discusses how Statistics Netherlands managed to respond quickly with a range of new outputs to the sudden increase in the need for statistical information following the outbreak of the COVID-19 pandemic. It describes the innovation process already in place, as well as the innovations in response to the pandemic. This is followed by a discussion of what made speedy innovation and implementation possible, after which lessons are drawn in order to maintain the ability to react quickly to future policy questions. One important success factor is the combination of new data sources with already existing statistics for calibration. The developments at Statistics Netherlands can be seen as a case study. Several other NSIs also accelerated innovation after the outbreak of the pandemic, such as the Australian Bureau of Statistics and the British Office for National Statistics

    A Flexible Access Control Mechanism for CAD Frameworks

    No full text
    this paper we present a configurable and unobtrusive access control mechanism for CAD frameworks that is flexible enough to support a wide range of access control policies. Furthermore, we present the realization of the access control mechanism in the context of the Nelsis CAD Framework. 1. Introductio

    A Markov Chain Monte Carlo approach for the estimation of photovoltaic system parameters

    No full text
    Knowledge of the installation parameters of photovoltaic systems is essential in the context of grid management: by relating these parameters to performance data, forecasting models may be optimised to improve the management of power flow into the grid. In the case of small residential systems, these parameters are often not available. We present a novel method for determining the azimuth (ϕ), tilt (θ) and rated power (P) of photovoltaic systems, using openly available data over the course of 2016–2018 of 12 photovoltaic systems in PVOutput. This method consists of two steps: firstly we identify a candidate list of clear days by computing descriptive statistics of a larger set of 80 PVOutput system profiles. In the second step we compare the observed clear-day profiles, of the aforementioned 12 systems, with modelled clear-sky profiles from the PVLib library. The fits are performed employing a Markov Chain Monte Carlo (MCMC) approach, implemented with the Emcee package: the most favoured parameters and their associated uncertainties, for any given day, are obtained by sampling from the posterior assuming a Gaussian sampling distribution. The results for our 12 systems are in good agreement with the PVOutput metadata
    corecore