95 research outputs found

    The re-identification risk of Canadians from longitudinal demographics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The public is less willing to allow their personal health information to be disclosed for research purposes if they do not trust researchers and how researchers manage their data. However, the public is more comfortable with their data being used for research if the risk of re-identification is low. There are few studies on the risk of re-identification of Canadians from their basic demographics, and no studies on their risk from their longitudinal data. Our objective was to estimate the risk of re-identification from the basic cross-sectional and longitudinal demographics of Canadians.</p> <p>Methods</p> <p>Uniqueness is a common measure of re-identification risk. Demographic data on a 25% random sample of the population of Montreal were analyzed to estimate population uniqueness on postal code, date of birth, and gender as well as their generalizations, for periods ranging from 1 year to 11 years.</p> <p>Results</p> <p>Almost 98% of the population was unique on full postal code, date of birth and gender: these three variables are effectively a unique identifier for Montrealers. Uniqueness increased for longitudinal data. Considerable generalization was required to reach acceptably low uniqueness levels, especially for longitudinal data. Detailed guidelines and disclosure policies on how to ensure that the re-identification risk is low are provided.</p> <p>Conclusions</p> <p>A large percentage of Montreal residents are unique on basic demographics. For non-longitudinal data sets, the three character postal code, gender, and month/year of birth represent sufficiently low re-identification risk. Data custodians need to generalize their demographic information further for longitudinal data sets.</p

    Carbon dioxide reduction in the building life cycle: a critical review

    Get PDF
    The construction industry is known to be a major contributor to environmental pressures due to its high energy consumption and carbon dioxide generation. The growing amount of carbon dioxide emissions over buildings’ life cycles has prompted academics and professionals to initiate various studies relating to this problem. Researchers have been exploring carbon dioxide reduction methods for each phase of the building life cycle – from planning and design, materials production, materials distribution and construction process, maintenance and renovation, deconstruction and disposal, to the material reuse and recycle phase. This paper aims to present the state of the art in carbon dioxide reduction studies relating to the construction industry. Studies of carbon dioxide reduction throughout the building life cycle are reviewed and discussed, including those relating to green building design, innovative low carbon dioxide materials, green construction methods, energy efficiency schemes, life cycle energy analysis, construction waste management, reuse and recycling of materials and the cradle-to-cradle concept. The review provides building practitioners and researchers with a better understanding of carbon dioxide reduction potential and approaches worldwide. Opportunities for carbon dioxide reduction can thereby be maximised over the building life cycle by creating environmentally benign designs and using low carbon dioxide materials

    De-identifying a public use microdata file from the Canadian national discharge abstract database

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Canadian Institute for Health Information (CIHI) collects hospital discharge abstract data (DAD) from Canadian provinces and territories. There are many demands for the disclosure of this data for research and analysis to inform policy making. To expedite the disclosure of data for some of these purposes, the construction of a DAD public use microdata file (PUMF) was considered. Such purposes include: confirming some published results, providing broader feedback to CIHI to improve data quality, training students and fellows, providing an easily accessible data set for researchers to prepare for analyses on the full DAD data set, and serve as a large health data set for computer scientists and statisticians to evaluate analysis and data mining techniques. The objective of this study was to measure the probability of re-identification for records in a PUMF, and to de-identify a national DAD PUMF consisting of 10% of records.</p> <p>Methods</p> <p>Plausible attacks on a PUMF were evaluated. Based on these attacks, the 2008-2009 national DAD was de-identified. A new algorithm was developed to minimize the amount of suppression while maximizing the precision of the data. The acceptable threshold for the probability of correct re-identification of a record was set at between 0.04 and 0.05. Information loss was measured in terms of the extent of suppression and entropy.</p> <p>Results</p> <p>Two different PUMF files were produced, one with geographic information, and one with no geographic information but more clinical information. At a threshold of 0.05, the maximum proportion of records with the diagnosis code suppressed was 20%, but these suppressions represented only 8-9% of all values in the DAD. Our suppression algorithm has less information loss than a more traditional approach to suppression. Smaller regions, patients with longer stays, and age groups that are infrequently admitted to hospitals tend to be the ones with the highest rates of suppression.</p> <p>Conclusions</p> <p>The strategies we used to maximize data utility and minimize information loss can result in a PUMF that would be useful for the specific purposes noted earlier. However, to create a more detailed file with less information loss suitable for more complex health services research, the risk would need to be mitigated by requiring the data recipient to commit to a data sharing agreement.</p

    A systematic review of the incidence of schizophrenia: the distribution of rates and the influence of sex, urbanicity, migrant status and methodology

    Get PDF
    BACKGROUND: Understanding variations in the incidence of schizophrenia is a crucial step in unravelling the aetiology of this group of disorders. The aims of this review are to systematically identify studies related to the incidence of schizophrenia, to describe the key features of these studies, and to explore the distribution of rates derived from these studies. METHODS: Studies with original data related to the incidence of schizophrenia (published 1965–2001) were identified via searching electronic databases, reviewing citations and writing to authors. These studies were divided into core studies, migrant studies, cohort studies and studies based on Other Special Groups. Between- and within-study filters were applied in order to identify discrete rates. Cumulative plots of these rates were made and these distributions were compared when the underlying rates were sorted according to sex, urbanicity, migrant status and various methodological features. RESULTS: We identified 100 core studies, 24 migrant studies, 23 cohort studies and 14 studies based on Other Special Groups. These studies, which were drawn from 33 countries, generated a total of 1,458 rates. Based on discrete core data for persons (55 studies and 170 rates), the distribution of rates was asymmetric and had a median value (10%–90% quantile) of 15.2 (7.7–43.0) per 100,000. The distribution of rates was significantly higher in males compared to females; the male/female rate ratio median (10%–90% quantile) was 1.40 (0.9–2.4). Those studies conducted in urban versus mixed urban-rural catchment areas generated significantly higher rate distributions. The distribution of rates in migrants was significantly higher compared to native-born; the migrant/native-born rate ratio median (10%–90% quantile) was 4.6 (1.0–12.8). Apart from the finding that older studies reported higher rates, other study features were not associated with significantly different rate distributions (e.g. overall quality, methods related to case finding, diagnostic confirmation and criteria, the use of age-standardization and age range). CONCLUSIONS: There is a wealth of data available on the incidence of schizophrenia. The width and skew of the rate distribution, and the significant impact of sex, urbanicity and migrant status on these distributions, indicate substantial variations in the incidence of schizophrenia
    • …
    corecore