8 research outputs found

    Collaborative Cloud Computing Framework for Health Data with Open Source Technologies

    Full text link
    The proliferation of sensor technologies and advancements in data collection methods have enabled the accumulation of very large amounts of data. Increasingly, these datasets are considered for scientific research. However, the design of the system architecture to achieve high performance in terms of parallelization, query processing time, aggregation of heterogeneous data types (e.g., time series, images, structured data, among others), and difficulty in reproducing scientific research remain a major challenge. This is specifically true for health sciences research, where the systems must be i) easy to use with the flexibility to manipulate data at the most granular level, ii) agnostic of programming language kernel, iii) scalable, and iv) compliant with the HIPAA privacy law. In this paper, we review the existing literature for such big data systems for scientific research in health sciences and identify the gaps of the current system landscape. We propose a novel architecture for software-hardware-data ecosystem using open source technologies such as Apache Hadoop, Kubernetes and JupyterHub in a distributed environment. We also evaluate the system using a large clinical data set of 69M patients.Comment: This paper is accepted in ACM-BCB 202

    Elektronický systém pro podporu provádění klinických studií s možností zpracování dat pomocí umělé inteligence

    Get PDF
    An increasing amount of data are collected through wearable devices during ambulatory, and long-term monitoring of biological signals, adoption of persuasive technology and dynamics of clinical trials information sharing - all of that changes the possible clinical intervention. Moreover, more and more smartphone apps are hitting the market as they become a tool in daily life for many people around the globe. All of these applications are generating a tremendous amount of data, that is difficult to process using traditional methods, and asks for engagement of advanced methods of data processing. For recruiting patients, this calls for a shift from traditional methods of engaging patients to modern communication platforms such as social media, that are providing easy access to up- to-date information on an everyday basis. These factors make the clinical study progression demanding, in terms of unified participant management and processing of connected digital resources. Some clinical trials put a strong accent on remote sensing data and patient engagement through their smartphones. To facilitate this, a direct participant messaging, where the researchers give support, guidance and troubleshooting on a personal level using already adopted communication channels, needs to be implemented. Since the...Objem dat, který je generován nositelnými zařízeními v průběhu ambulatorního i dlouhodobého snímání biologických signálů, adopce pervazivních technologií a dynamika předávání informací v rámci klinických studií - to vše mění způsoby, kterým mohou prováděny klinické studie. Více a více aplikací, které přicházejí na trh se stávají pomůckou v denním životě lidí po celém světě. Všechny tyto aplikace produkují obrovské množství dat, jež je obtížné zpracovat tradičními metodami, a vyvstává tak nutnost využití pokročilých metod. Je také možné sledovat odvrat od tradičních metod náboru pacientů, k moderním komunikačním platformám jako sociální sítě, které usnadňují přístup k aktuálním informacím. Tyto faktory činí postup v klinické studii náročným s ohledem na management účastníků studie a zpracování informací ze zdrojů dat. Některé klinické studie kladou velký důraz na sběr dat ze senzorů a zapojení pacientů do studie prostřednictvím jejich mobilních telefonů. Pro usnadnění tohoto přístupu, je nutné využít přímou komunikací s pacientem, kdy administrátoři studie poskytují podporu a pomáhají řešit problémy, které se mohou v průběhu studie vyskytnout, a to za pomocí moderních komunikačních platforem a elektronických zpráv vedených přímo s účastníkem studie. Celý tento postup je nicméně časově náročný, a je...Centre for Practical Applications Support and Spin-off Companies of the 1st Faculty of Medicine Charles UniversityCentrum podpory aplikačních výstupů a spin-off firem 1. LF UK1. lékařská fakultaFirst Faculty of Medicin

    Cloud-based genomics pipelines for ophthalmology: Reviewed from research to clinical practice

    Get PDF
    Aim: To familiarize clinicians with clinical genomics, and to describe the potential of cloud computing for enabling the future routine use of genomics in eye hospital settings. Design: Review article exploring the potential for cloud-based genomic pipelines in eye hospitals. Methods: Narrative review of the literature relevant to clinical genomics and cloud computing, using PubMed and Google Scholar. A broad overview of these fields is provided, followed by key examples of their integration. Results: Cloud computing could benefit clinical genomics due to scalability of resources, potentially lower costs, and ease of data sharing between multiple institutions. Challenges include complex pricing of services, costs from mistakes or experimentation, data security, and privacy concerns. Conclusions and future perspectives: Clinical genomics is likely to become more routinely used in clinical practice. Currently this is delivered in highly specialist centers. In the future, cloud computing could enable delivery of clinical genomics services in non-specialist hospital settings, in a fast, cost-effective way, whilst enhancing collaboration between clinical and research teams

    Identifying risk patterns for suicide attempts in individuals with diabetes : a data-driven approach using LASSO regression

    Get PDF
    Diabetes is a major health concern in the United States, with 34.2 million Americans affected in 2020. Unfortunately, the risk of suicide is also elevated in individuals with diabetes, with around 90,000 people with diabetes committing suicide each year. People with type 1 diabetes are three to four times more likely to attempt suicide, and those with newly diagnosed type 2 diabetes are twice as likely to attempt suicide compared to the general population. However, poor mental health comorbidity is still neglected, and more recommendations are needed to support for people with diabetes. It is widely acknowledged that the comorbidity of depression with diabetes is considered a higher risk factor for suicide attempts Previous studies have used logistic regression to identify risk factors for suicide attempts in individuals with diabetes. However, this technique can be prone to overfitting when the number of variables is high. To address this issue, we used the LASSO (Least Absolute Shrinkage and Selection Operator), a regularization technique, to reduce overfitting in a logistic regression model. It works by adding a penalty term ([lambda]) to the log-likelihood function, which shrinks the estimates of the coefficients. This process allows LASSO to act as a feature selection method, effectively setting coefficients that contribute most to the error to zero. Because few studies have focused on un derstanding the relationship between suicide attempts and diabetes, we used association rule mining ARM an explainable rule based machine learning technique, for knowledge discovery to reveal previously unknown relationships between suicide attempts and diabetes. This approach has already proved useful in the medical field, where it has been applied to electronic health record (EHR) data to discover associations such as disease co-occurrences, drug-disease associations, and symptomatic patterns of disease. However, no previous studies have used ARM to determine risk factors and predict suicide attempts in people with diabetes. The aim of this dissertation is to identify patterns of risk factors for suicide attempts in individuals with diabetes, with the long term goal of developing a clinical decision support system that can be integrated into EHRs. This system would allow healthcare providers to identify patients with diabetes at high risk of suicide attempts and provide appropriate preventive measures during outpatient clinic visits. To achieve this goal, we have three specific aims: (1) to identify potential risk factors for suicide attempts in individuals with diabetes through a literature review; (2) to investigate risk factors for suicide attempts in individuals with diabetes using LASSO regression; (3) to identify risk patterns for suicide attempts in individuals with diabetes using association rule mining. In this dissertation, we have reviewed the literature and compiled a list of data elements for suicide attempts in people with diabetes. We then retrieved data on patients with diabetes from Cerner Real-World Data [trade mark]. LASSO regression was used for feature selection, and ARM was used for investigating the risk patterns. We discovered risk patterns that are understandable and practical for healthcare providers. The findings of this research can inform suicide prevention efforts for people with diabetes and contribute to improved mental health outcomes.Includes bibliographical references

    A Novel Correction for the Adjusted Box-Pierce Test — New Risk Factors for Emergency Department Return Visits within 72 hours for Children with Respiratory Conditions — General Pediatric Model for Understanding and Predicting Prolonged Length of Stay

    Get PDF
    This thesis represents the results of three research projects that underline the breadth and depth of my interests. Firstly, I devoted some efforts to the well-known Box-Pierce goodness-of-fit tests for time series models which has been an important research topic over the last few decades. All previously proposed tests are focused on changes of the test statistics. Instead, I adopted a different approach that takes the best performing test and modifying the rejection region. Thus, I developed a semiparametric correction of the Adjusted Box-Pierce test that attains the best I error rates for all sample sizes and lags and outperforms all previous global time series goodness-of-fit approaches. Secondly, I aimed to study and identify novel risk factors significantly associated with 72-hour return visits to emergency departments. I queried data consisting of 185,000 ED visits of patients less than 18 years in the United States using the Cerner® Health Facts Database. A nested mixed-effects logistic regression model to provide statistical inference on associated risk factors was built, and a representative set of machine learning algorithms for our predictive modeling task was selected. New respiratory conditions including acute bronchiolitis, pneumonia, and asthma were identified as risk factors for return visits to ED. Thirdly, I ambitioned to design and implement a comprehensive study to identify new clinical and demographic factors associated with prolonged length of stay (3˘e\u3e two weeks) among pediatric patients (aged 18 years and under) in a number of free-standing pediatric and mixed medical facilities. I implemented a mixed effect model to assess the statistical significance and effect sizes of age, race/ethnicity, number of medications, medical family history, presence of infection agents (fungi, bacteria, virus), cancer diagnoses, and other conditions as well as some clinical variables. A stochastic gradient model was also implemented for prediction. From the mixed-effects model, 11 main effect predictors were found to be significantly and statistically associated with an increase in the odds of prolonged length of stay. The area under the operator characteristic curve (AUROC) for the mixed-effects model was 0.887 (0.885, 0.889) and the extreme gradient boosting model attained an AUROC of 0.931 (0.930, 0.933)