23 research outputs found

    Data quality and patient characteristics in European ANCA-associated vasculitis registries: data retrieval by federated querying

    Get PDF
    Objectives This study aims to describe the data structure and harmonisation process, explore data quality and define characteristics, treatment, and outcomes of patients across six federated antineutrophil cytoplasmic antibody-associated vasculitis (AAV) registries.Methods Through creation of the vasculitis-specific Findable, Accessible, Interoperable, Reusable, VASCulitis ontology, we harmonised the registries and enabled semantic interoperability. We assessed data quality across the domains of uniqueness, consistency, completeness and correctness. Aggregated data were retrieved using the semantic query language SPARQL Protocol and Resource Description Framework Query Language (SPARQL) and outcome rates were assessed through random effects meta-analysis.Results A total of 5282 cases of AAV were identified. Uniqueness and data-type consistency were 100% across all assessed variables. Completeness and correctness varied from 49%–100% to 60%–100%, respectively. There were 2754 (52.1%) cases classified as granulomatosis with polyangiitis (GPA), 1580 (29.9%) as microscopic polyangiitis and 937 (17.7%) as eosinophilic GPA. The pattern of organ involvement included: lung in 3281 (65.1%), ear-nose-throat in 2860 (56.7%) and kidney in 2534 (50.2%). Intravenous cyclophosphamide was used as remission induction therapy in 982 (50.7%), rituximab in 505 (17.7%) and pulsed intravenous glucocorticoid use was highly variable (11%–91%). Overall mortality and incidence rates of end-stage kidney disease were 28.8 (95% CI 19.7 to 42.2) and 24.8 (95% CI 19.7 to 31.1) per 1000 patient-years, respectively.Conclusions In the largest reported AAV cohort-study, we federated patient registries using semantic web technologies and highlighted concerns about data quality. The comparison of patient characteristics, treatment and outcomes was hampered by heterogeneous recruitment settings

    Clustering of anti-neutrophil cytoplasmic antibody-associated vasculitis - using a pre-processed harmonised dataset

    No full text
    Background: The sub-classification of anti-neutrophil cytoplasmic antibody (ANCA)-associated vasculitis (AAV) has been a long-standing debate. Unsupervised learning has previously been used for partitioning of phenotypic groups, but as AAV is a rare disease, small sample sizes have been a limiting factor. Here we attempt clustering of a small dataset harmonised to the FAIRVASC ontology, allowing potential future inclusion of an additional 6000 AAV patients from the FAIRVASC collaboration registries to the cluster model. FAIRVASC is a research project seeking to federate AAV registries across Europe using semantic web technologies (https://fairvasc.eu). Methods: This study used a dataset of 292 patients from southern Sweden, classified as granulomatosis with polyangiitis (GPA) or microscopic polyangiitis (MPA), according to the European Medicines Agency algorithm. The dataset was pre-processed from a relational database format to a resource descriptive framework (RDF) graph-based data model, harmonising the dataset to a FAIRVASC standard. Factor analysis of mixed data (FAMD) and agglomerative hierarchical cluster analysis on principal components (HCPC) was used to develop a cluster model, including organ pattern, ANCA status, serum creatinine, C-reactive protein, gender, and age at diagnosis. The generated clusters were evaluated by baseline characteristics, mortality, and renal outcome. Results: The analyses involved data for 163 subjects with GPA and 129 with MPA. The clustering model resulted in two larger clusters and three smaller ones. The larger clusters were a predominantly anti-PR3 positive cluster of young (mean 57.5 years at diagnosis) patients with ear-nose-throat involvement and a favourable outcome (Cluster 1), and a predominantly anti-MPO positive cluster with severe kidney involvement and high rates of mortality and end-stage kidney disease (Cluster 5). The three smaller clusters differed in terms of organ involvement and ANCA status at diagnosis, one with severe lung and renal involvement and a poor outcome (Cluster 3) and two with similar outcome, one ANCA negative (Cluster 4), and one with peripheral nerve involvement (Cluster 2). The descriptive characteristics of the clusters are presented in table 1. Conclusions: Our analysis suggests five clusters of AAV patients based on baseline features, associated with different mortality and renal outcome. The investigation acts as a proof of concept of the FAIRVASC ontology and infrastructure for the harmonisation of heterogeneous AAV datasets. The cluster model may in the future readily include an unprecedented number of European AAV patients. Disclosures: Non

    FAIRVASC: A semantic web approach to rare disease registry integration

    No full text
    Rare disease data is often fragmented within multiple heterogeneous siloed regional disease registries, each containing a small number of cases. These data are particularly sensitive, as low subject counts make the identification of patients more likely, meaning registries are not inclined to share subject level data outside their registries. At the same time access to multiple rare disease datasets is important as it will lead to new research opportunities and analysis over larger cohorts. To enable this, two major challenges must therefore be overcome. The first is to integrate data at a semantic level, so that it is possible to query over registries and return results which are comparable. The second is to enable queries which do not take subject level data from the registries. To meet the first challenge, this paper presents the FAIRVASC ontology to manage data related to the rare disease anti-neutrophil cytoplasmic antibody (ANCA) associated vasculitis (AAV), which is based on the harmonisation of terms in seven European data registries. It has been built upon a set of key clinical questions developed by a team of experts in vasculitis selected from the registry sites and makes use of several standard classifications, such as Systematized Nomenclature of Medicine - Clinical Terms (SNOMED-CT) and Orphacode. It also presents the method for adding semantic meaning to AAV data across the registries using the declarative Relational to Resource Description Framework Mapping Language (R2RML). To meet the second challenge a federated querying approach is presented for accessing aggregated and pseudonymized data, and which supports analysis of AAV data in a manner which protects patient privacy. For additional security the federated querying approach is augmented with a method for auditing queries (and the uplift process) using the provenance ontology (PROV-O) to track when queries and changes occur and by whom. The main contribution of this work is the successful application of semantic web technologies and federated queries to provide a novel infrastructure that can readily incorporate additional registries, thus providing access to harmonised data relating to unprecedented numbers of patients with rare disease, while also meeting data privacy and security concerns
    corecore