236 research outputs found

    Quality and complexity measures for data linkage and deduplication

    Get PDF
    Summary. Deduplicating one data set or linking several data sets are increasingly important tasks in the data preparation steps of many data mining projects. The aim of such linkages is to match all records relating to the same entity. Research interest in this area has increased in recent years, with techniques originating from statistics, machine learning, information retrieval, and database research being combined and applied to improve the linkage quality, as well as to increase performance and efficiency when linking or deduplicating very large data sets. Different measures have been used to characterise the quality and complexity of data linkage algorithms, and several new metrics have been proposed. An overview of the issues involved in measuring data linkage and deduplication quality and complexity is presented in this chapter. It is shown that measures in the space of record pair comparisons can produce deceptive quality results. Various measures are discussed and recommendations are given on how to assess data linkage and deduplication quality and complexity. Key words: data or record linkage, data integration and matching, deduplication, data mining pre-processing, quality and complexity measures

    Should patients with abnormal liver function tests in primary care be tested for chronic viral hepatitis: cost minimisation analysis based on a comprehensively tested cohort

    Get PDF
    Background Liver function tests (LFTs) are ordered in large numbers in primary care, and the Birmingham and Lambeth Liver Evaluation Testing Strategies (BALLETS) study was set up to assess their usefulness in patients with no pre-existing or self-evident liver disease. All patients were tested for chronic viral hepatitis thereby providing an opportunity to compare various strategies for detection of this serious treatable disease. Methods This study uses data from the BALLETS cohort to compare various testing strategies for viral hepatitis in patients who had received an abnormal LFT result. The aim was to inform a strategy for identification of patients with chronic viral hepatitis. We used a cost-minimisation analysis to define a base case and then calculated the incremental cost per case detected to inform a strategy that could guide testing for chronic viral hepatitis. Results Of the 1,236 study patients with an abnormal LFT, 13 had chronic viral hepatitis (nine hepatitis B and four hepatitis C). The strategy advocated by the current guidelines (repeating the LFT with a view to testing for specific disease if it remained abnormal) was less efficient (more expensive per case detected) than a simple policy of testing all patients for viral hepatitis without repeating LFTs. A more selective strategy of viral testing all patients for viral hepatitis if they were born in countries where viral hepatitis was prevalent provided high efficiency with little loss of sensitivity. A notably high alanine aminotransferase (ALT) level (greater than twice the upper limit of normal) on the initial ALT test had high predictive value, but was insensitive, missing half the cases of viral infection. Conclusions Based on this analysis and on widely accepted clinical principles, a "fast and frugal" heuristic was produced to guide general practitioners with respect to diagnosing cases of viral hepatitis in asymptomatic patients with abnormal LFTs. It recommends testing all patients where a clear clinical indication of infection is present (e.g. evidence of intravenous drug use), followed by testing all patients who originated from countries where viral hepatitis is prevalent, and finally testing those who have a notably raised ALT level (more than twice the upper limit of normal). Patients not picked up by this efficient algorithm had a risk of chronic viral hepatitis that is lower than the general population

    Asymptomatic bacteriuria in sickle cell disease: a cross-sectional study

    Get PDF
    BACKGROUND: It is known that there is significant morbidity associated with urinary tract infection and with renal dysfunction in sickle cell disease (SCD). However, it is not known if there are potential adverse outcomes associated with asymptomatic bacteriuria (ASB) infections in sickle cell disease if left untreated. This study was undertaken to determine the prevalence of ASB, in a cohort of patients with SCD. METHODS: This is a cross-sectional study of patients in the Jamaican Sickle Cell Cohort. Aseptically collected mid-stream urine (MSU) samples were obtained from 266 patients for urinalysis, culture and sensitivity analysis. Proteinuria was measured by urine dipsticks. Individuals with abnormal urine culture results had repeat urine culture. Serum creatinine was measured and steady state haematology and uric acid concentrations were obtained from clinical records. This was completed at a primary care health clinic dedicated to sickle cell diseases in Kingston, Jamaica. There were 133 males and 133 females in the sample studied. The mean age (mean ± sd) of participants was 26.6 ± 2.5 years. The main outcome measures were the culture of ≄ 10(5 )colony forming units of a urinary tract pathogen per milliliter of urine from a MSU specimen on a single occasion (probable ASB) or on consecutive occasions (confirmed ASB). RESULTS: Of the 266 urines collected, 234 were sterile and 29 had significant bacteriuria yielding a prevalence of probable ASB of 10.9% (29/266). Fourteen patients had confirmed ASB (prevalence 5.3%) of which 13 had pyuria. Controlling for genotype, females were 14.7 times more likely to have confirmed ASB compared to males (95%CI 1.8 to 121.0). The number of recorded visits for symptomatic UTI was increased by a factor of 2.5 (95% CI 1.4 to 4.5, p < 0.005) but serum creatinine, uric acid and haematology values were not different in patients with confirmed ASB compared with those with sterile urine. There was no association with history of gram negative sepsis. CONCLUSION: ASB is a significant problem in individuals with SCD and may be the source of pathogens in UTI. However, further research is needed to determine the clinical significance of ASB in SCD

    Sociodemographic differences in linkage error: An examination of four large-scale datasets

    Get PDF
    © 2018 The Author(s). Background: Record linkage is an important tool for epidemiologists and health planners. Record linkage studies will generally contain some level of residual record linkage error, where individual records are either incorrectly marked as belonging to the same individual, or incorrectly marked as belonging to separate individuals. A key question is whether errors in linkage quality are distributed evenly throughout the population, or whether certain subgroups will exhibit higher rates of error. Previous investigations of this issue have typically compared linked and un-linked records, which can conflate bias caused by record linkage error, with bias caused by missing records (data capture errors). Methods: Four large administrative datasets were individually de-duplicated, with results compared to an available 'gold-standard' benchmark, allowing us to avoid methodological issues with comparing linked and un-linked records. Results were compared by gender, age, geographic remoteness (major cities, regional or remote) and socioeconomic status. Results: Results varied between datasets, and by sociodemographic characteristic. The most consistent findings were worse linkage quality for younger individuals (seen in all four datasets) and worse linkage quality for those living in remote areas (seen in three of four datasets). The linkage quality within sociodemographic categories varied between datasets, with the associations with linkage error reversed across different datasets due to quirks of the specific data collection mechanisms and data sharing practices. Conclusions: These results suggest caution should be taken both when linking younger individuals and those in remote areas, and when analysing linked data from these subgroups. Further research is required to determine the ramifications of worse linkage quality in these subpopulations on research outcomes

    The Populus holobiont: dissecting the effects of plant niches and genotype on the microbiome

    Get PDF
    Background: Microorganisms serve important functions within numerous eukaryotic host organisms. An understanding of the variation in the plant niche-level microbiome, from rhizosphere soils to plant canopies, is imperative to gain a better understanding of how both the structural and functional processes of microbiomes impact the health of the overall plant holobiome. Using Populus trees as a model ecosystem, we characterized the archaeal/bacterial and fungal microbiome across 30 different tissue-level niches within replicated Populus deltoides and hybrid Populus trichocarpa × deltoides individuals using 16S and ITS2 rRNA gene analyses. Results: Our analyses indicate that archaeal/bacterial and fungal microbiomes varied primarily across broader plant habitat classes (leaves, stems, roots, soils) regardless of plant genotype, except for fungal communities within leaf niches, which were greatly impacted by the host genotype. Differences between tree genotypes are evident in the elevated presence of two potential fungal pathogens, Marssonina brunnea and Septoria sp., on hybrid P. trichocarpa × deltoides trees which may in turn be contributing to divergence in overall microbiome composition. Archaeal/bacterial diversity increased from leaves, to stem, to root, and to soil habitats, whereas fungal diversity was the greatest in stems and soils. Conclusions: This study provides a holistic understanding of microbiome structure within a bioenergy relevant plant host, one of the most complete niche-level analyses of any plant. As such, it constitutes a detailed atlas or map for further hypothesis testing on the significance of individual microbial taxa within specific niches and habitats of Populus and a baseline for comparisons to other plant species

    Recent developments in multiple sclerosis therapeutics

    Get PDF
    Multiple sclerosis, the most common neurologic disorder of young adults, is traditionally considered to be an inflammatory, autoimmune, demyelinating disease of the central nervous system. Based on this understanding, the initial therapeutic strategies were directed at immune modulation and inflammation control. These approaches, including high-dose corticosteroids for acute relapses and long-term use of parenteral interferon-ÎČ, glatiramer acetate or natalizumab for disease modification, are at best moderately effective. Growing evidence supports that, while an inflammatory pathology characterizes the early relapsing stage of multiple sclerosis, neurodegenerative pathology dominates the later progressive stage of the disease. Multiple sclerosis disease-modifying therapies currently in development attempt to specifically target the underlying pathology at each stage of the disease, while avoiding frequent self-injection. These include a variety of oral medications and monoclonal antibodies to reduce inflammation in relapsing multiple sclerosis and agents intended to promote neuroprotection and neurorepair in progressive multiple sclerosis. Although newer therapies for relapsing MS have the potential to be more effective and easier to administer than current therapies, they also carry greater risks. Effective treatments for progressive multiple sclerosis are still being sought

    Equitable representation in councils: theory and an application to the United Nations Security Council

    Get PDF
    We analyze democratic equity in council voting games (CVGs). In a CVG, a voting body containing all members delegates decision-making to a (time-varying) subset of its members, as describes, e.g., the relationship between the United Nations General Assembly and the United Nations Security Council (UNSC). We develop a theoretical framework for analyzing democratic equitability in CVGs at both the country and region levels, and for different assumptions regarding preference correlation. We apply the framework to evaluate the equitability of the UNSC, and the claims of those who seek to reform it. We find that the individual permanent members are overrepresented by between 21.3 times (United Kingdom) and 3.8 times (China) from a country-level perspective, while from a region perspective Eastern Europe is the most heavily overrepresented region with more than twice its equitable representation, and Africa the most heavily underrepresented. Our equity measures do not preclude some UNSC members from exercising veto rights, however
    • 

    corecore