11,360 research outputs found

    metajelo: A Metadata Package for Journals to Support External Linked Objects

    Get PDF
    We propose a metadata package that is intended to provide academic journals with a lightweight means of registering, at the time of publication, the existence and disposition of supplementary materials. Information about the supplementary materials is, in most cases, critical for the reproducibility and replicability of scholarly results. In many instances, these materials are curated by a third party, which may or may not follow developing standards for the identification and description of those materials. As such, the vocabulary described here complements existing initiatives that specify vocabularies to describe the supplementary materials or the repositories and archives in which they have been deposited. Where possible, it reuses elements of relevant other vocabularies, facilitating coexistence with them. Furthermore, it provides an “at publication” record of reproducibility characteristics of a particular article that has been selected for publication. The proposed metadata package documents the key characteristics that journals care about in the case of supplementary materials that are held by third parties: existence, accessibility, and permanence. It does so in a robust, time-invariant fashion at the time of publication, when the editorial decisions are made. It also allows for better documentation of less accessible (non-public data), by treating it symmetrically from the point of view of the journal, therefore increasing the transparency of what up until now has been very opaque

    Utility Cost of Formal Privacy for Releasing National Employer-Employee Statistics

    Get PDF
    National statistical agencies around the world publish tabular summaries based on combined employer-employee (ER-EE) data. The privacy of both individuals and business establishments that feature in these data are protected by law in most countries. These data are currently released using a variety of statistical disclosure limitation (SDL) techniques that do not reveal the exact characteristics of particular employers and employees, but lack provable privacy guarantees limiting inferential disclosures. In this work, we present novel algorithms for releasing tabular summaries of linked ER-EE data with formal, provable guarantees of privacy. We show that state-of-the-art differentially private algorithms add too much noise for the output to be useful. Instead, we identify the privacy requirements mandated by current interpretations of the relevant laws, and formalize them using the Pufferfish framework. We then develop new privacy definitions that are customized to ER-EE data and satisfy the statutory privacy requirements. We implement the experiments in this paper on production data gathered by the U.S. Census Bureau. An empirical evaluation of utility for these data shows that for reasonable values of the privacy-loss parameter ϵ≥1, the additive error introduced by our provably private algorithms is comparable, and in some cases better, than the error introduced by existing SDL techniques that have no provable privacy guarantees. For some complex queries currently published, however, our algorithms do not have utility comparable to the existing traditiona

    Security-oriented infrastructures for social simulation

    Get PDF
    The JISC-funded National e-Infrastructure for Social Simulation (NeISS) project aims to develop and provide new services to social scientists and public/private sector policymakers interested in “what-if” questions that have an impact upon society and can be tackled through social simulation. For the first what-if question, a traffic simulation modelling how congestion will affect routes within a city or region projected across a time-span of decades has been identified. This paper describes the work that has been done in implementing a secure, user-oriented environment that provides seamless access to relevant nationally significant data sets such as the 2001 Census and demographic transition statistics from the British Household Panel Survey (BHPS) , and a Population Reconstruction Model (PRM) simulator, which simulates a population of individuals or households based upon these data sets

    Commuting to School in Leeds : How useful is the PLASC?

    Get PDF
    Children's daily travel behaviour is dominated by the journey to school. In some cases, this movement takes only a few minutes and involves no means of transport other than foot; in other instances, the journey can be over substantial distances, be extensive in duration and involve some form of public or private transport. The combination of journeys taking place is likely to have a substantial impact on traffic congestion, particularly since the morning peak coincides with that associated with the journey to work. What datasets exist that allow us to measure and understand this behaviour

    Supporting security-oriented, inter-disciplinary research: crossing the social, clinical and geospatial domains

    Get PDF
    How many people have had a chronic disease for longer than 5-years in Scotland? How has this impacted upon their choices of employment? Are there any geographical clusters in Scotland where a high-incidence of patients with such long-term illness can be found? How does the life expectancy of such individuals compare with the national averages? Such questions are important to understand the health of nations and the best ways in which health care should be delivered and measured for their impact and success. In tackling such research questions, e-Infrastructures need to provide tailored, secure access to an extensible range of distributed resources including primary and secondary e-Health clinical data; social science data, and geospatial data sets amongst numerous others. In this paper we describe the security models underlying these e-Infrastructures and demonstrate their implementation in supporting secure, federated access to a variety of distributed and heterogeneous data sets exploiting the results of a variety of projects at the National e-Science Centre (NeSC) at the University of Glasgow

    CEDAR: The Dutch Historical Censuses as Linked Open Data

    Get PDF
    In this document we describe the CEDAR dataset, a five-star Linked Open Data representation of the Dutch historical censuses, conducted in the Netherlands once every 10 years from 1795 to 1971. We produce a linked dataset from a digitized sample of 2,288 tables. The dataset contains more than 6.8 million statistical observations about the demography, labour and housing of the Dutch society in the 18th, 19th and 20th centuries. The dataset is modeled using the RDF Data Cube vocabulary for multidimensional data, uses Open Annotation to express rules of data harmonization, and keeps track of the provenance of every single data point and its transformations using PROV. We link these observations to well known standard classification systems in social history, such as the Historical International Standard Classification of Occupations (HISCO) and the Amsterdamse Code (AC), which in turn link to DBpedia and GeoNames. The two main contributions of the dataset are the improvement of data integration and access for historical research, and the emergence of new historical data hubs, like classifications of historical religions and historical house types, in the Linked Open Data cloud

    The Scottish school leavers cohort: linkage of education data to routinely collected records for mortality, hospital discharge and offspring birth characteristics

    Get PDF
    Purpose: The Scottish school leavers cohort provides population-wide prospective follow-up of local authority secondary school leavers in Scotland through linkage of comprehensive education data with hospital and mortality records. It considers educational attainment as a proxy for socioeconomic position in young adulthood and enables the study of associations and causal relationships between educational attainment and health outcomes in young adulthood. Participants: Education data for 284 621 individuals who left a local authority secondary school during 2006/2007–2010/2011 were linked with birth, death and hospital records, including general/acute and mental health inpatient and day case records. Individuals were followed up from date of school leaving until September 2012. Age range during follow-up was 15 years to 24 years. Findings: to date Education data included all formal school qualifications attained by date of school leaving; sociodemographic information; indicators of student needs, educational or non-educational support received and special school unit attendance; attendance, absence and exclusions over time and school leaver destination. Area-based measures of school and home deprivation were provided. Health data included dates of admission/discharge from hospital; principal/secondary diagnoses; maternal-related, birth-related and baby-related variables and, where relevant, date and cause of death. This paper presents crude rates for all-cause and cause-specific deaths and general/acute and psychiatric hospital admissions as well as birth outcomes for children of female cohort members. Future plans: This study is the first in Scotland to link education and health data for the population of local authority secondary school leavers and provides access to a large, representative cohort with the ability to study rare health outcomes. There is the potential to study health outcomes over the life course through linkage with future hospital and death records for cohort members. The cohort may also be expanded by adding data from future school leavers. There is scope for linkage to the Prescribing Information System and the Scottish Primary Care Information Resource
    • …
    corecore