182 research outputs found

    Medical record linkage in health information systems by approximate string matching and clustering

    Get PDF
    BACKGROUND: Multiplication of data sources within heterogeneous healthcare information systems always results in redundant information, split among multiple databases. Our objective is to detect exact and approximate duplicates within identity records, in order to attain a better quality of information and to permit cross-linkage among stand-alone and clustered databases. Furthermore, we need to assist human decision making, by computing a value reflecting identity proximity. METHODS: The proposed method is in three steps. The first step is to standardise and to index elementary identity fields, using blocking variables, in order to speed up information analysis. The second is to match similar pair records, relying on a global similarity value taken from the Porter-Jaro-Winkler algorithm. And the third is to create clusters of coherent related records, using graph drawing, agglomerative clustering methods and partitioning methods. RESULTS: The batch analysis of 300,000 "supposedly" distinct identities isolates 240,000 true unique records, 24,000 duplicates (clusters composed of 2 records) and 3,000 clusters whose size is greater than or equal to 3 records. CONCLUSION: Duplicate-free databases, used in conjunction with relevant indexes and similarity values, allow immediate (i.e.: real-time) proximity detection when inserting a new identity

    Estimating parameters for probabilistic linkage of privacy-preserved datasets.

    Get PDF
    Background: Probabilistic record linkage is a process used to bring together person-based records from within the same dataset (de-duplication) or from disparate datasets using pairwise comparisons and matching probabilities. The linkage strategy and associated match probabilities are often estimated through investigations into data quality and manual inspection. However, as privacy-preserved datasets comprise encrypted data, such methods are not possible. In this paper, we present a method for estimating the probabilities and threshold values for probabilistic privacy-preserved record linkage using Bloom filters. Methods: Our method was tested through a simulation study using synthetic data, followed by an application using real-world administrative data. Synthetic datasets were generated with error rates from zero to 20% error. Our method was used to estimate parameters (probabilities and thresholds) for de-duplication linkages. Linkage quality was determined by F-measure. Each dataset was privacy-preserved using separate Bloom filters for each field. Match probabilities were estimated using the expectation-maximisation (EM) algorithm on the privacy-preserved data. Threshold cut-off values were determined by an extension to the EM algorithm allowing linkage quality to be estimated for each possible threshold. De-duplication linkages of each privacy-preserved dataset were performed using both estimated and calculated probabilities. Linkage quality using the F-measure at the estimated threshold values was also compared to the highest F-measure. Three large administrative datasets were used to demonstrate the applicability of the probability and threshold estimation technique on real-world data. Results: Linkage of the synthetic datasets using the estimated probabilities produced an F-measure that was comparable to the F-measure using calculated probabilities, even with up to 20% error. Linkage of the administrative datasets using estimated probabilities produced an F-measure that was higher than the F-measure using calculated probabilities. Further, the threshold estimation yielded results for F-measure that were only slightly below the highest possible for those probabilities. Conclusions: The method appears highly accurate across a spectrum of datasets with varying degrees of error. As there are few alternatives for parameter estimation, the approach is a major step towards providing a complete operational approach for probabilistic linkage of privacy-preserved datasets

    Lessons learnt from a discontinued randomised controlled trial:Adalimumab injection compared with placebo for patients receiving physiotherapy treatment for sciatica (Subcutaneous Injection of Adalimumab Trial compared with Control: SCIATiC)

    Get PDF
    Background Adalimumab, a biological treatment targeting tumour necrosis factor α, might be useful in sciatica. This paper describes the challenges faced when developing a new treatment pathway for a randomised controlled trial of adalimumab for people with sciatica, as well as the reasons why the trial discussed was stopped early. Methods A pragmatic, parallel group, randomised controlled trial with blinded (masked) participants, clinicians, outcome assessment and statistical analysis was conducted in six UK sites. Participants were identified and recruited from general practices, musculoskeletal services and outpatient physiotherapy clinics. They were adults with persistent symptoms of sciatica of 1 to 6 months’ duration with moderate to high level of disability. Eligibility was assessed by research physiotherapists according to clinical criteria, and participants were randomised to receive two doses of adalimumab (80 mg then 40 mg 2 weeks later) or saline placebo subcutaneous injections in the posterior lateral thigh. Both groups were referred for a course of physiotherapy. Outcomes were measured at baseline, 6-week, 6-month and 12-month follow-up. The main outcome measure was disability measured using the Oswestry Disability Index. The planned sample size was 332, with the first 50 in an internal pilot phase. Results The internal pilot phase was discontinued after 10 months from opening owing to low recruitment (two of the six sites active, eight participants recruited). There were several challenges: contractual delays; one site did not complete contract negotiations, and two sites signed contracts shortly before trial closure; site withdrawal owing to patient safety concerns; difficulties obtaining excess treatment costs; and in the two sites that did recruit, recruitment was slower than planned because of operational issues and low uptake by potential participants. Conclusions Improved patient care requires robust clinical research within contexts in which treatments can realistically be provided. Step changes in treatment, such as the introduction of biologic treatments for severe sciatica, raise complex issues that can delay trial initiation and retard recruitment. Additional preparatory work might be required before testing novel treatments. A randomised controlled trial of tumour necrosis factor-α blockade is still needed to determine its cost-effectiveness in severe sciatica

    Building an XML document warehouse

    Get PDF
    International audienceData Warehouses and OLAP (On Line Analytical Processing) technologies are dedicated to analyzing structured data issued from organizations' OLTP (On Line Transaction Processing) systems. Furthermore, in order to enhance their decision support systems, these organizations need to explore XML (eXtensible Markup Language) documents as an additional and important source of unstructured data. In this context, this paper addresses the warehousing of document-centric XML documents. More specifically, we propose a two-method approach to build Document Warehouse conceptual schemas. The first method is for the unification of XML document structures; it aims to elaborate a global and generic view for a set of XML documents belonging to the same domain. The second method is for designing multidimensional galaxy schemas for Document Warehouses

    How good is probabilistic record linkage to reconstruct reproductive histories? Results from the Aberdeen children of the 1950s study

    Get PDF
    BACKGROUND: Probabilistic record linkage is widely used in epidemiology, but studies of its validity are rare. Our aim was to validate its use to identify births to a cohort of women, being drawn from a large cohort of people born in Scotland in the early 1950s. METHODS: The Children of the 1950s cohort includes 5868 females born in Aberdeen 1950–56 who were in primary schools in the city in 1962. In 2001 a postal questionnaire was sent to the cohort members resident in the UK requesting information on offspring. Probabilistic record linkage (based on surname, maiden name, initials, date of birth and postcode) was used to link the females in the cohort to birth records held by the Scottish Maternity Record System (SMR 2). RESULTS: We attempted to mail a total of 5540 women; 3752 (68%) returned a completed questionnaire. Of these 86% reported having had at least one birth. Linkage to SMR 2 was attempted for 5634 women, one or more maternity records were found for 3743. There were 2604 women who reported at least one birth in the questionnaire and who were linked to one or more SMR 2 records. When judged against the questionnaire information, the linkage correctly identified 4930 births and missed 601 others. These mostly occurred outside of Scotland (147) or prior to full coverage by SMR 2 (454). There were 134 births incorrectly linked to SMR 2. CONCLUSION: Probabilistic record linkage to routine maternity records applied to population-based cohort, using name, date of birth and place of residence, can have high specificity, and as such may be reliably used in epidemiological research

    Varicella susceptibility and transmission dynamics in Slovenia

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A cross-sectional, age-stratified study was conducted to determine varicella-zoster seroprevalence and force of infection in Slovenia.</p> <p>Methods</p> <p>3689 serum samples were tested for VZV IgG antibodies with an enzyme immunoassay. Semiparametric and parametric modelling were used to estimate the force of infection.</p> <p>Results</p> <p>Overall, 85.6% of serum samples were seropositive. Age-specific prevalence rose rapidly in preschool children and over 90% of 8 years old tested positive for VZV. However, 2.8% of serum samples among women of childbearing age were seronegative. Semiparametric modelling yielded force of infection estimates of 0.182 (95% CI 0.158-0.206), 0.367 (95% CI 0.285-0.448) and 0.008 (95% CI 0.0-0.032) for age groups 0.5- < 6, 6-11 and ≥12 years, respectively, and 0.175 (95% CI 0.147-0.202), 0.391 (95% CI 0.303-0.480) and 0.025 (95% CI 0.003-0.046) for age groups 0.5- < 5, 5-9 and ≥10 years, respectively.</p> <p>Conclusions</p> <p>Regardless of the age grouping used, the highest transmission occurred in children in their first years of school.</p

    What low back pain is and why we need to pay attention

    Get PDF
    Low back pain is a very common symptom. It occurs in high-income, middle-income, and low-income countries and all age groups from children to the elderly population. Globally, years lived with disability caused by low back pain increased by 54% between 1990 and 2015, mainly because of population increase and ageing, with the biggest increase seen in low-income and middle-income countries. Low back pain is now the leading cause of disability worldwide. For nearly all people with low back pain, it is not possible to identify a specific nociceptive cause. Only a small proportion of people have a well understood pathological cause—eg, a vertebral fracture, malignancy, or infection. People with physically demanding jobs, physical and mental comorbidities, smokers, and obese individuals are at greatest risk of reporting low back pain. Disabling low back pain is over-represented among people with low socioeconomic status. Most people with new episodes of low back pain recover quickly; however, recurrence is common and in a small proportion of people, low back pain becomes persistent and disabling. Initial high pain intensity, psychological distress, and accompanying pain at multiple body sites increases the risk of persistent disabling low back pain. Increasing evidence shows that central pain-modulating mechanisms and pain cognitions have important roles in the development of persistent disabling low back pain. Cost, health-care use, and disability from low back pain vary substantially between countries and are influenced by local culture and social systems, as well as by beliefs about cause and effect. Disability and costs attributed to low back pain are projected to increase in coming decades, in particular in low-income and middle-income countries, where health and other systems are often fragile and not equipped to cope with this growing burden. Intensified research efforts and global initiatives are clearly needed to address the burden of low back pain as a public health problem

    Accuracy and completeness of patient pathways – the benefits of national data linkage in Australia

    Get PDF
    Background - The technical challenges associated with national data linkage, and the extent of cross-border population movements, are explored as part of a pioneering research project. The project involved linking state-based hospital admission records and death registrations across Australia for a national study of hospital related deaths. Methods - The project linked over 44 million morbidity and mortality records from four Australian states between 1st July 1999 and 31st December 2009 using probabilistic methods. The accuracy of the linkage was measured through a comparison with jurisdictional keys sourced from individual states. The extent of cross-border population movement between these states was also assessed. Results - Data matching identified almost twelve million individuals across the four Australian states. The percentage of individuals from one state with records found in another ranged from 3-5 %. Using jurisdictional keys to measure linkage quality, results indicate a high matching efficiency (F measure 97 to 99 %), with linkage processing taking only a matter of days. Conclusions - The results demonstrate the feasibility and accuracy of undertaking cross jurisdictional linkage for national research. The benefits are substantial, particularly in relation to capturing the full complement of records in patient pathways as a result of cross-border population movements. The project identified a sizeable ‘mobile’ population with hospital records in more than one state. Research studies that focus on a single jurisdiction will under-enumerate the extent of hospital usage by individuals in the population. It is important that researchers understand and are aware of the impact of this missing hospital activity on their studies. The project highlights the need for an efficient and accurate data linkage system to support national research across Australia
    corecore