3 research outputs found

    Data quality and patient characteristics in European ANCA-associated vasculitis registries: data retrieval by federated querying

    Get PDF
    Objectives This study aims to describe the data structure and harmonisation process, explore data quality and define characteristics, treatment, and outcomes of patients across six federated antineutrophil cytoplasmic antibody-associated vasculitis (AAV) registries.Methods Through creation of the vasculitis-specific Findable, Accessible, Interoperable, Reusable, VASCulitis ontology, we harmonised the registries and enabled semantic interoperability. We assessed data quality across the domains of uniqueness, consistency, completeness and correctness. Aggregated data were retrieved using the semantic query language SPARQL Protocol and Resource Description Framework Query Language (SPARQL) and outcome rates were assessed through random effects meta-analysis.Results A total of 5282 cases of AAV were identified. Uniqueness and data-type consistency were 100% across all assessed variables. Completeness and correctness varied from 49%–100% to 60%–100%, respectively. There were 2754 (52.1%) cases classified as granulomatosis with polyangiitis (GPA), 1580 (29.9%) as microscopic polyangiitis and 937 (17.7%) as eosinophilic GPA. The pattern of organ involvement included: lung in 3281 (65.1%), ear-nose-throat in 2860 (56.7%) and kidney in 2534 (50.2%). Intravenous cyclophosphamide was used as remission induction therapy in 982 (50.7%), rituximab in 505 (17.7%) and pulsed intravenous glucocorticoid use was highly variable (11%–91%). Overall mortality and incidence rates of end-stage kidney disease were 28.8 (95% CI 19.7 to 42.2) and 24.8 (95% CI 19.7 to 31.1) per 1000 patient-years, respectively.Conclusions In the largest reported AAV cohort-study, we federated patient registries using semantic web technologies and highlighted concerns about data quality. The comparison of patient characteristics, treatment and outcomes was hampered by heterogeneous recruitment settings

    Data quality in ANCA-associated vasculitis: an analysis of the FAIRVASC registries

    No full text
    Background: The FAIRVASC project seeks to federate the data of seven ANCA-associated vasculitis registries across Europe using semantic web technology. A high standard of data quality (DQ) is required for the types of data analysis planned for the FAIRVASC architecture. We sought to design and implement a DQ assessment of the FAIRVASC registries. Methods: A Data Quality Group was established within the consortium. This group consisted of individuals from a variety of specialist backgrounds including clinician scientists, health informaticians, statisticians and computer scientists. DQ domains selected for evaluation were Uniqueness, Consistency, Completeness and Correctness. These dimensions were prioritised by investigator consensus from a pool of nine candidate dimensions drawn from the literature and assessed using statistical methods and tools developed through prior published research. A DQ worksheet was designed using an iterative approach. A representative at each registry used the worksheet to evaluate their local registry DQ. Results: Registry participants identification numbers were 100% unique across all seven registries. Consistency of data class was 100% across all measured variables. Consistency on logic testing was 99.9% across all registries. Completeness was 94.3% across all registries. Correctness was still under assessment at the time of this report. Where missing data were present due to an assessed variable not being present in a registry dataset, these were removed prior to analysis. Percentages represent the mean of summary percentages reported for each registry as a whole and were not adjusted for registry size. Conclusions: This analysis demonstrated a high level of DQ across the initial seven FAIRVASC registries. The registry data were therefore deemed highly suited to FAIRVASC objectives including epidemiological analysis of European data and cluster analysis to determine novel disease phenotypes. Future work will include a DQ improvement process with multiple potential objectives such removal of duplicates, selection of highest quality records, imputation of missing values, re-entry of data and increased specificity of registry metadata. Disclosures: Non
    corecore