13 research outputs found

    Secure and scalable deduplication of horizontally partitioned health data for privacy-preserving distributed statistical computation

    Get PDF
    Background Techniques have been developed to compute statistics on distributed datasets without revealing private information except the statistical results. However, duplicate records in a distributed dataset may lead to incorrect statistical results. Therefore, to increase the accuracy of the statistical analysis of a distributed dataset, secure deduplication is an important preprocessing step. Methods We designed a secure protocol for the deduplication of horizontally partitioned datasets with deterministic record linkage algorithms. We provided a formal security analysis of the protocol in the presence of semi-honest adversaries. The protocol was implemented and deployed across three microbiology laboratories located in Norway, and we ran experiments on the datasets in which the number of records for each laboratory varied. Experiments were also performed on simulated microbiology datasets and data custodians connected through a local area network. Results The security analysis demonstrated that the protocol protects the privacy of individuals and data custodians under a semi-honest adversarial model. More precisely, the protocol remains secure with the collusion of up to N − 2 corrupt data custodians. The total runtime for the protocol scales linearly with the addition of data custodians and records. One million simulated records distributed across 20 data custodians were deduplicated within 45 s. The experimental results showed that the protocol is more efficient and scalable than previous protocols for the same problem. Conclusions The proposed deduplication protocol is efficient and scalable for practical uses while protecting the privacy of patients and data custodians

    Snow integrated communicable disease prediction service

    Get PDF
    Objective: This thesis mainly focused on construction of an integrated infectious disease prediction service that predicts and visualizes prediction results in time and space. Methods: We have used weekly aggregated laboratory confirmed cases of various diseases collected from the Snow system, which is an infectious disease surveillance system that covers Troms and Finnmark counties of north Norway. Influenza A dataset is applied for modeling SIR(S) model and various diseases datasets applied to a Bayesian model. The infectious disease prediction service prototype was constructed following an iterative and incremental approach where the entire development process was composed of four activities. Results: The prediction service framework facilitates the process of integrating various models and allows their evaluation. Currently, the system contains two mathematical models that demonstrate the effectiveness of the architecture in integrating new models. Conclusion: The framework can significantly improve the status of disease prediction systems, investment and time of development. It also speeds up mathematical modeling through its integrated environment for testing and evaluating different mathematical models against other existing models. Thus, the project contributes to improve the overall disease prediction accuracy and increase the benefits from prediction. Keywords: Infectious disease, Influenza, Mathematical model, Prediction, Mathematical model evaluation, Spatiotemporal Epidemiological Modeler, Visualization, Integrated infectious disease prediction

    F-CBR: An Architecture for Federated Case-Based Reasoning

    No full text
    Case-based reasoning (CBR) is a problem-solving methodology in artificial intelligence that attempts to solve new problems using past experiences known as cases. Experiences collected in a single case base from an institution or geographical region are seldom sufficient to solve diverse problems, especially in rare situations. Additionally, many institutions do not promote peer-to-peer (p2p) communication or encourage data sharing through such networks to retain autonomy. The paper proposes a federated CBR (F-CBR) architecture to address these challenges. F-CBR enables solving new problems based on similar cases from multiple autonomous CBR systems without p2p communication. We also designed an algorithm to minimize (irrelevant or unsolicited) data sharing in an F-CBR system. We extend the F-CBR design to support institutions with organizational or geographical hierarchies. The F-CBR architecture was implemented and evaluated on two public datasets and a private real-world (non-specific musculoskeletal disorder patient) dataset. The findings demonstrate that the retrieval quality of F-CBR systems is comparable to or better than a single CBR system that persists all the cases on a centralized case base. F-CBR systems address data privacy by incorporating the data minimization principle. We foresee F-CBR as a viable real-world design that can aid in federating legacy CBR systems with minimal or no changes. The CBR systems used in this study are shared on GitHub to support reproducibility

    Secure and scalable deduplication of horizontally partitioned health data for privacy-preserving distributed statistical computation

    Get PDF
    Background: Techniques have been developed to compute statistics on distributed datasets without revealing private information except the statistical results. However, duplicate records in a distributed dataset may lead to incorrect statistical results. Therefore, to increase the accuracy of the statistical analysis of a distributed dataset, secure deduplication is an important preprocessing step. Methods: We designed a secure protocol for the deduplication of horizontally partitioned datasets with deterministic record linkage algorithms. We provided a formal security analysis of the protocol in the presence of semi-honest adversaries. The protocol was implemented and deployed across three microbiology laboratories located in Norway, and we ran experiments on the datasets in which the number of records for each laboratory varied. Experiments were also performed on simulated microbiology datasets and data custodians connected through a local area network. Results: The security analysis demonstrated that the protocol protects the privacy of individuals and data custodians under a semi-honest adversarial model. More precisely, the protocol remains secure with the collusion of up to N − 2 corrupt data custodians. The total runtime for the protocol scales linearly with the addition of data custodians and records. One million simulated records distributed across 20 data custodians were deduplicated within 45 s. The experimental results showed that the protocol is more efficient and scalable than previous protocols for the same problem. Conclusions: The proposed deduplication protocol is efficient and scalable for practical uses while protecting the privacy of patients and data custodians

    De-identifying Swedish EHR text using public resources in the general domain

    Get PDF
    Sensitive data is normally required to develop rule-based or train machine learning-based models for de-identifying electronic health record (EHR) clinical notes; and this presents important problems for patient privacy. In this study, we add non-sensitive public datasets to EHR training data; (i) scientific medical text and (ii) Wikipedia word vectors. The data, all in Swedish, is used to train a deep learning model using recurrent neural networks. Tests on pseudonymized Swedish EHR clinical notes showed improved precision and recall from 55.62% and 80.02% with the base EHR embedding layer, to 85.01% and 87.15% when Wikipedia word vectors are added. These results suggest that non-sensitive text from the general domain can be used to train robust models for de-identifying Swedish clinical text; and this could be useful in cases where the data is both sensitive and in low-resource languages

    Privacy-preserving audit and feedback on the antibiotic prescribing of General Practitioners

    No full text
    Background: Antibiotic resistance is a worldwide public health problem that is accelerated by the misuse and overuse of antibiotics. Studies have shown that audits and feedback enable clinicians to compare their personal clinical performance with that of their peers and are effective in reducing the inappropriate prescribing of antibiotics. However, privacy concerns make audits and feedback hard to implement in clinical settings. To solve this problem, we developed a privacy-preserving audit and feedback (A&F) system. Objective: This study aims to evaluate a privacy-preserving A&F system in clinical settings. Methods: A privacy-preserving A&F system was deployed at three primary care practices in Norway to generate feedback for 20 general practitioners (GPs) on their prescribing of antibiotics for selected respiratory tract infections. The GPs were asked to participate in a survey shortly after using the system. Results: A total of 14 GPs responded to the questionnaire, representing a 70% (14/20) response rate. The participants were generally satisfied with the usefulness of the feedback and the comparisons with peers, as well as the protection of privacy. The majority of the GPs (9/14, 64%) valued the protection of their own privacy as well as that of their patients. Conclusions: The system overcomes important privacy and scaling challenges that are commonly associated with the secondary use of electronic health record data and has the potential to improve antibiotic prescribing behavior; however, further study is required to assess its actual effect

    Electronic Health Use in a Representative Sample of 18,497 Respondents in Norway (The Seventh Tromsø Study - Part 1): Population-Based Questionnaire Study

    No full text
    Background: Electronic health (eHealth) services may help people obtain information and manage their health, and they are gaining attention as technology improves, and as traditional health services are placed under increasing strain. We present findings from the first representative, large-scale, population-based study of eHealth use in Norway. Objective: The objectives of this study were to examine the use of eHealth in a population above 40 years of age, the predictors of eHealth use, and the predictors of taking action following the use of these eHealth services. Methods: Data were collected through a questionnaire given to participants in the seventh survey of the Tromsø Study (Tromsø 7). The study involved a representative sample of the Norwegian population aged above 40 years old. A subset of the more extensive questionnaire was explicitly related to eHealth use. Data were analyzed using logistic regression analyses. Results: Approximately half (52.7%; 9752/18,497) of the respondents had used some form of eHealth services during the last year. About 58% (5624/9698) of the participants who had responded to a question about taking some type of action based on information gained from using eHealth services had done so. The variables of being a woman (OR 1.58; 95% CI 1.47-1.68), of younger age (40-49 year age group: OR 4.28, 95% CI 3.63-5.04), with a higher education (tertiary/long: OR 3.77, 95% CI 3.40-4.19), and a higher income (>1 million kr [US $100,000]: OR 2.19, 95% CI 1.77-2.70) all positively predicted the use of eHealth services. Not living with a spouse (OR 1.14, 95% CI 1.04-1.25), having seen a general practitioner (GP) in the last year (OR 1.66, 95% CI 1.53-1.80), and having had some disease (such as heart disease, cancer, asthma, etc; OR 1.29, 95% CI 1.18-1.41) also positively predicted eHealth use. Self-rated health status did not significantly influence eHealth use. Taking some action following eHealth use was predicted with the variables of being a woman (OR 1.16, 95% CI 1.07-1.27), being younger (40-49 year age group: OR 1.72, 95% CI 1.34-2.22), having a higher education (tertiary/long: OR 1.65, 95% CI 1.42-1.92), having seen a GP in the last year (OR 1.58, 95% CI 1.41-1.77), and having ever had a disease (such as heart disease, cancer or asthma; OR 1.26, 95% CI 1.14-1.39). Conclusions: eHealth appears to be an essential supplement to traditional health services for those aged above 40 years old, and especially so for the more resourceful. Being a woman, being younger, having higher education, having had a disease, and having seen a GP in the last year all positively predicted using the internet to get health information and taking some action based on this information

    Impact of Illness on Electronic Health Use (The Seventh Tromsø Study - Part 2): Population-Based Questionnaire Study

    No full text
    Background: Patients who suffer from different diseases may use different electronic health (eHealth) resources. Thus, those who plan eHealth interventions should take into account which eHealth resources are used most frequently by patients that suffer from different diseases. Objective: The aim of this study was to understand the associations between different groups of chronic diseases and the use of different eHealth resources. Methods: Data from the seventh survey of the Tromsø Study (Tromsø 7) were analyzed to determine how different diseases influence the use of different eHealth resources. Specifically, the eHealth resources considered were use of apps, search engines, video services, and social media. The analysis contained data from 21,083 participants in the age group older than 40 years. A total of 15,585 (15,585/21,083; 73.92%) participants reported to have suffered some disease, 10,604 (10,604/21,083; 50.29%) participants reported to have used some kind of eHealth resource in the last year, and 7854 (7854/21,083; 37.25%) participants reported to have used some kind of eHealth resource in the last year and suffered (or had suffered) from some kind of specified disease. Logistic regression was used to determine which diseases significantly predicted the use of each eHealth resource. Results: The use of apps was increased among those individuals that (had) suffered from psychological problems (odds ratio [OR] 1.39, 95% CI 1.23-1.56) and cardiovascular diseases (OR 1.12, 95% CI 1.01-1.24) and those part-time workers that (had) suffered from any of the diseases classified as others (OR 2.08, 95% CI 1.35-3.32). The use of search engines for accessing health information increased among individuals who suffered from psychological problems (OR 1.39, 95% CI 1.25-1.55), cancer (OR 1.26, 95% CI 1.11-1.44), or any of the diseases classified as other diseases (OR 1.27, 95% CI 1.13-1.42). Regarding video services, their use for accessing health information was more likely when the participant was a man (OR 1.31, 95% CI 1.13-1.53), (had) suffered from psychological problems (OR 1.70, 95% CI 1.43-2.01), or (had) suffered from other diseases (OR 1.43, 95% CI 1.20-1.71). The factors associated with an increase in the use of social media for accessing health information were as follows: (had) suffered from psychological problems (OR 1.65, 95% CI 1.42-1.91), working part time (OR 1.35, 95% CI 0.62-2.63), receiving disability benefits (OR 1.42, 95% CI 1.14-1.76), having received an upper secondary school education (OR 1.20, 95% CI 1.03-1.38), being a man with a high household income (OR 1.67, 95% CI 1.07-2.60), suffering from cardiovascular diseases and having a high household income (OR 3.39, 95% CI 1.62-8.16), and suffering from respiratory diseases while being retired (OR 1.95, 95% CI 1.28-2.97). Conclusions: Our findings show that different diseases are currently associated with the use of different eHealth resources. This knowledge is useful for those who plan eHealth interventions as they can take into account which type of eHealth resource may be used for gaining the attention of the different user groups

    Privacy-preserving architecture for providing feedback to clinicians on their clinical performance

    No full text
    Background - Learning from routine healthcare data is important for the improvement of the quality of care. Providing feedback on clinicians’ performance in comparison to their peers has been shown to be more efficient for quality improvements. However, the current methods for providing feedback do not fully address the privacy concerns of stakeholders. Methods - The paper proposes a distributed architecture for providing feedback to clinicians on their clinical performances while protecting their privacy. The indicators for the clinical performance of a clinician are computed within a healthcare institution based on pseudonymized data extracted from the electronic health record (EHR) system. Group-level indicators of clinicians across healthcare institutions are computed using privacy-preserving distributed data-mining techniques. A clinician receives feedback reports that compare his or her personal indicators with the aggregated indicators of the individual’s peers. Indicators aggregated across different geographical levels are the basis for monitoring changes in the quality of care. The architecture feasibility was practically evaluated in three general practitioner (GP) offices in Norway that consist of about 20,245 patients. The architecture was applied for providing feedback reports to 21 GPs on their antibiotic prescriptions for selected respiratory tract infections (RTIs). Each GP received one feedback report that covered antibiotic prescriptions between 2015 and 2018, stratified yearly. We assessed the privacy protection and computation time of the architecture. Results - Our evaluation indicates that the proposed architecture is feasible for practical use and protects the privacy of the patients, clinicians, and healthcare institutions. The architecture also maintains the physical access control of healthcare institutions over the patient data. We sent a single feedback report to each of the 21 GPs. A total of 14,396 cases were diagnosed with the selected RTIs during the study period across the institutions. Of these cases, 2924 (20.3%) were treated with antibiotics, where 40.8% (1194) of the antibiotic prescriptions were narrow-spectrum antibiotics. Conclusions - It is feasible to provide feedback to clinicians on their clinical performance in comparison to peers across healthcare institutions while protecting privacy. The architecture also enables monitoring changes in the quality of care following interventions

    Impact of the use of electronic health tools on the psychological and emotional well-being of electronic health service users (The Seventh Tromsø Study - Part 3): Population-based questionnaire study

    Get PDF
    Background: Electronic health (eHealth) has been described as a silver bullet for addressing how challenges of the current health care system may be solved by technological solutions in future strategies and visions for modern health care. However, the evidence of its effects on service quality and cost effectiveness remains unclear. In addition, patients’ psychological and emotional reactions to using eHealth tools are rarely addressed by the scientific literature. Objective: This study aimed to assess how the psychological and emotional well-being of eHealth service users is affected by the use of eHealth tools. Methods: We analyzed data from a population-based survey in Norway, conducted in the years 2015-2016 and representing 10,604 eHealth users aged over 40 years, to identify how the use of eHealth tools was associated with feeling anxious, confused, knowledgeable, or reassured. Associations between these four emotional outcomes and the use of four types of eHealth services (Web search engines, video search engines, health apps, and social media) were analyzed using logistic regression models. Results: The use of eHealth tools made 72.41% (6740/9308) of the participants feel more knowledgeable and 47.49% (4421/9308) of the participants feel more reassured about their health status. However, 25.69% (2392/9308) reported feeling more anxious and 27.88% (2595/9308) reported feeling more confused using eHealth tools. A high level of education and not having a full-time job were associated with positive reactions and emotions (feeling more knowledgeable and reassured), whereas low self-reported health status and not having enough friends who could provide help and support predicted negative reactions and emotions (ie, feeling anxious and confused). Overall, the positive emotional effects of eHealth use (feeling knowledgeable and reassured) were relatively more prevalent among users aged over 40 years than the negative emotional effects (ie, feeling anxious and confused). About one-fourth of eHealth users reported being more confused and anxious after using eHealth services. Conclusions: The search for health information on the internet can be motivated by a range of factors and needs (not studied in this study), and people may experience a range of reactions and feelings following health information searching on the Web. Drawing on prior studies, we categorized reactions as positive and negative reactions. Some participants had negative reactions, which is challenging to resolve and should be taken into consideration by eHealth service providers when designing services (ie, including concrete information about how users can get more help and support). There is a need for more studies examining a greater range of reactions to online health information and factors that might predict negative reactions to health information on the Web
    corecore