11 research outputs found

    Diabetes and the direct secondary use of electronic health records : using routinely collected and stored data to drive research and understanding

    Get PDF
    Introduction Electronic health records provide an unparalleled opportunity for the use of patient data that is routinely collected and stored, in order to drive research and develop an epidemiological understanding of disease. Diabetes, in particular, stands to benefit, being a data-rich, chronic-disease state. This article aims to provide an understanding of the extent to which the healthcare sector is using routinely collected and stored data to inform research and epidemiological understanding of diabetes mellitus. Methods Narrative literature review of articles, published in both the medical- and engineering-based informatics literature. Results There has been a significant increase in the number of papers published, which utilise electronic health records as a direct data source for diabetes research. These articles consider a diverse range of research questions. Internationally, the secondary use of electronic health records, as a research tool, is most prominent in the USA. The barriers most commonly described in research studies include missing values and misclassification, alongside challenges of establishing the generalisability of results. Discussion Electronic health record research is an important and expanding area of healthcare research. Much of the research output remains in the form of conference abstracts and proceedings, rather than journal articles. There is enormous opportunity within the United Kingdom to develop these research methodologies, due to national patient identifiers. Such a healthcare context may enable UK researchers to overcome many of the barriers encountered elsewhere and thus to truly unlock the potential of electronic health records

    Building Data-Driven Pathways From Routinely Collected Hospital Data:A Case Study on Prostate Cancer

    Get PDF
    Background: Routinely collected data in hospitals is complex, typically heterogeneous, and scattered across multiple Hospital Information Systems (HIS). This big data, created as a byproduct of health care activities, has the potential to provide a better understanding of diseases, unearth hidden patterns, and improve services and cost. The extent and uses of such data rely on its quality, which is not consistently checked, nor fully understood. Nevertheless, using routine data for the construction of data-driven clinical pathways, describing processes and trends, is a key topic receiving increasing attention in the literature. Traditional algorithms do not cope well with unstructured processes or data, and do not produce clinically meaningful visualizations. Supporting systems that provide additional information, context, and quality assurance inspection are needed. Objective: The objective of the study is to explore how routine hospital data can be used to develop data-driven pathways that describe the journeys that patients take through care, and their potential uses in biomedical research; it proposes a framework for the construction, quality assessment, and visualization of patient pathways for clinical studies and decision support using a case study on prostate cancer. Methods: Data pertaining to prostate cancer patients were extracted from a large UK hospital from eight different HIS, validated, and complemented with information from the local cancer registry. Data-driven pathways were built for each of the 1904 patients and an expert knowledge base, containing rules on the prostate cancer biomarker, was used to assess the completeness and utility of the pathways for a specific clinical study. Software components were built to provide meaningful visualizations for the constructed pathways. Results: The proposed framework and pathway formalism enable the summarization, visualization, and querying of complex patient-centric clinical information, as well as the computation of quality indicators and dimensions. A novel graphical representation of the pathways allows the synthesis of such information. Conclusions: Clinical pathways built from routinely collected hospital data can unearth information about patients and diseases that may otherwise be unavailable or overlooked in hospitals. Data-driven clinical pathways allow for heterogeneous data (ie, semistructured and unstructured data) to be collated over a unified data model and for data quality dimensions to be assessed. This work has enabled further research on prostate cancer and its biomarkers, and on the development and application of methods to mine, compare, analyze, and visualize pathways constructed from routine data. This is an important development for the reuse of big data in hospitals

    Predictability Bounds of Electronic Health Records

    Get PDF
    The ability to intervene in disease progression given a person’s disease history has the potential to solve one of society’s most pressing issues: advancing health care delivery and reducing its cost. Controlling disease progression is inherently associated with the ability to predict possible future diseases given a patient’s medical history. We invoke an information-theoretic methodology to quantify the level of predictability inherent in disease histories of a large electronic health records dataset with over half a million patients. In our analysis, we progress from zeroth order through temporal informed statistics, both from an individual patient’s standpoint and also considering the collective effects. Our findings confirm our intuition that knowledge of common disease progressions results in higher predictability bounds than treating disease histories independently. We complement this result by showing the point at which the temporal dependence structure vanishes with increasing orders of the time-correlated statistic. Surprisingly, we also show that shuffling individual disease histories only marginally degrades the predictability bounds. This apparent contradiction with respect to the importance of time-ordered information is indicative of the complexities involved in capturing the health-care process and the difficulties associated with utilising this information in universal prediction algorithms.General Electric CompanyAT&T FoundationNational Science Foundation (U.S.)American Society for Engineering Education. National Defense Science and Engineering Graduate FellowshipAudi Volkswage

    Necessity of Analytics in Today’s Healthcare Revenue Cycle

    Get PDF
    Because of the recently growing pressures to improve quality and reduce costs, healthcare organizations are rapidly adopting IT in order to improve their operations and clinical care. As a result, an accumulation of vast amounts of data are becoming available for use. It is important for healthcare to use this data. Strome (2010) states that healthcare analytics is the application of statistical tools and techniques to healthcare-related data in order to study past situations (i.e., operational performance or clinical outcomes) to improve the quality and efficiency of clinical and business processes and performance. With the introduction of healthcare analytical tools, can the healthcare industry take its huge and exponentially growing amounts of data and learn from it? The purpose of this paper is to review the available literature on the use of analytical tools in the healthcare industry with a focus on the revenue cycle. Most literature available to be reviewed is centered around discussions and theories on the use of analytical tools in the industry. A survey of revenue cycle leaders was conducted to determine the prevalence and importance of analytical tools in conjunction with the revenue cycle. This information will be valuable to revenue cycle leaders in determining if others in the industry are adopting these tools and the potential benefits of using analytical tools in their own departments

    Population-aware Hierarchical Bayesian Domain Adaptation via Multiple-component Invariant Learning

    Full text link
    While machine learning is rapidly being developed and deployed in health settings such as influenza prediction, there are critical challenges in using data from one environment in another due to variability in features; even within disease labels there can be differences (e.g. "fever" may mean something different reported in a doctor's office versus in an online app). Moreover, models are often built on passive, observational data which contain different distributions of population subgroups (e.g. men or women). Thus, there are two forms of instability between environments in this observational transport problem. We first harness knowledge from health to conceptualize the underlying causal structure of this problem in a health outcome prediction task. Based on sources of stability in the model, we posit that for human-sourced data and health prediction tasks we can combine environment and population information in a novel population-aware hierarchical Bayesian domain adaptation framework that harnesses multiple invariant components through population attributes when needed. We study the conditions under which invariant learning fails, leading to reliance on the environment-specific attributes. Experimental results for an influenza prediction task on four datasets gathered from different contexts show the model can improve prediction in the case of largely unlabelled target data from a new environment and different constituent population, by harnessing both environment and population invariant information. This work represents a novel, principled way to address a critical challenge by blending domain (health) knowledge and algorithmic innovation. The proposed approach will have a significant impact in many social settings wherein who and where the data comes from matters

    Genetics of Diabetes Subtypes. Characterization of novel cluster-based diabetes subtypes.

    Get PDF
    BACKGROUND: Type 2 diabetes (T2D) has been reproducibly clustered into five subtypes based on six-clinical variables; age at diabetes onset, body mass index (BMI), Glutamic acid decarboxylase autoantibodies (GADA), glycated hemoglobin (HbA1c) and insulin secretion and resistance estimated as HOMA2B and HOMA2IR derived from fasting glucose and Cpeptide. These subtypes have different disease progression and risk of complications. The newly defined subtypes are called Severe Autoimmune Diabetes (SAID), Severe Insulin Deficient Diabetes (SIDD), Severe InsulinResistant Diabetes (SIRD), Mild Obesity-related Diabetes (MOD), and Mild Age-Related Diabetes (MARD). AIM: The main aim of the thesis was to characterize the subtypes using genetics and biomarkers to investigate potential etiological differences, identify subtype-specific genetic associations and determine the underlying mechanisms of kidney complications in the subtypes.METHODS: The project included individuals with diabetes (cases) from the Swedish cohort All New Diabetics In Scania (ANDIS, n=10927) and the Finnish cohort Diabetes Registry Vasa (DIREVA, n=4754) as well as diabetes-free individuals (controls) from the Swedish Malmö Diet and Cancer cohort (MDC,n=2744) and the Finnish Botnia cohort (n=1683). Clusters defined in Ahlqvist et al, 2018, were used for all analyses. The number of individuals in the subtypes were as follows: SAID (n=452, n=327), SIDD (n=1193, n=394), SIRD (n=1130, n=453), MOD (n=1374, n=596) and MARD (n=2861, n=1178), in ANDIS and DIREVA respectively. In Paper I and III, genome-wide association studies (GWAS) and genetic risk score (GRS) analyses were performed to compare underlying genetic drivers in the Swedish cohorts and replicated in the Finnish cohorts. In Paper III, the primary phenotype was estimated glomerular filtration rate (eGRF) reflecting chronic kidney disease. In Paper II, epidemiological and genetic analysis was performed using clustering, Cox regression models and GRS to compare GADA negative individuals with diabetes of Iraqi (n=286) and Swedish origin (n=10641) with respect to new diabetes subclassification and complications. In Paper IV, the proteomic profiles of the subtypes were studied using 1161 biomarkers measured on Olink panels. Machine learning algorithms were applied to prioritize biomarkers, followed by Menedelian Randomization. RESULTS: In Paper I, the HLA rs9273368 variant was significantly associated with SAID (OR=2.89,P=6.5x10-40), the TCF7L2 rs7903146 variant was significantly associated with SIDD (OR=1.56, P=8.6x10-15), MOD (OR=1.40, P=3.1x10-10) and MARD (OR=1.42,P=6.1x10-16). The rs10824307 variant near the LRMDA gene was uniquely associated with MOD (OR=1.35, P=1.3×10-09). GRS for fasting insulin showed a unique association with SIRD (OR=1.855, P=5.91x10-09). GRSs for BMI were associated with SIDD, SIRD and MOD but not MARD (OR=1.046, P=0.099). Paper II concluded thar individuals with diabetes from Iraq present with a more insulin-deficient subtype than native Swedes. They have a higher risk of coronary events but a lower risk of CKD. In Paper III, in ANDIS, eGFR was strongly associated with the A allele of rs77924615 in the well-established PDILT-UMOD locus (beta=0.126, p=6.61x10-13) in all T2D; MARD and SIDD but not in MOD or SIRD (p>0.05). In the SIRD subtype, eGFR was associated with the C allele of rs3770382 in the CTNNA2 gene at near genomewide significance (beta=-0.219, p=5.5x10- 08), but was not associated in any of the other subtypes. In DIREVA, the PDILTUMOD locus replicated in T2D, MARD, and SIDD, and was also associated in SIRD (beta=0.24, p=0.001) but not in MOD (beta=0.076, p=0.109). The CTNNA2 locus did not replicate in DIREVA. Paper IV, the diabetes subtypes were shown to have different proteomic profiles and a list of prioritized biomarkers was generated for future follow-up. CONCLUSION: The newly defined subtypes are partially distinct with genetically different backgrounds and SIRD is suggested to have more beta-cell independent pathogenesis. There is some suggestive support for different genetic backgrounds of DKD in diabetes subtypes. Biomarkers could be valuable for better discrimination of subtypes and cross cohort comparisons in larger datasets. The diabetes subclassification approach paves the way for individualized patient management and the development of new therapeutic targets

    Analysis of Family-Health-Related Topics on Wikipedia

    Get PDF
    New concepts, terms, and topics always emerge; and meanings of existing terms and topics keep changing all the time. These phenomena occur more frequently on social media than on conventional media because social media allows a huge number of users to generate information online. Retrieving relevant results in different time periods of a fast-changing topic becomes one of the most difficult challenges in the information retrieval field. Among numerous topics discussed on social media, health-related topics are a major category which attracts increasing attention from the general public. This study investigated and explored the evolution patterns of family-health-related topics on Wikipedia. Three family-health-related topics (Child Maltreatment, Family Planning, and Women’s Health) were selected from the World Health Organization Website and their associated entries were retrieved on Wikipedia. Historical numeric and text data of the entries from 2010 to 2017 were collected from a Wikipedia data dump and the Wikipedia Web pages. Four periods were defined: 2010 to 2011, 2012 to 2013, 2014 to 2015, and 2016 to 2017. Coding, subject analysis, descriptive statistical analysis, inferential statistical analysis, SOM approach, and n-gram approach were employed to explore the internal characteristics and external popularity evolutions of the topics. The findings illustrate that the external popularities of the family-health-related topics declined from 2010 to 2017, although their content on Wikipedia kept increasing. The emerged entries had three features: specialization, summarization, and internationalization. The subjects derived from the entries became increasingly diverse during the investigated periods. Meanwhile, the developing trajectories of the subjects varied from one to another. According to the developing trajectories, the subjects were grouped into three categories: growing subject, diminishing subject, and fluctuating subject. The popularities of the topics among the Wikipedia viewers were consistent, while among the editors were not. For each topic, its popularity trend among the editors and the viewers was inconsistent. Child Maltreatment was the most popular among the three topics, Women’s Health was the second most popular, while Family Planning was the least popular among the three. The implications of this study include: (1) helping health professionals and general users get a more comprehensive understanding of the investigated topics; (2) contributing to the developments of health ontologies and consumer health vocabularies; (3) assisting Website designers in organizing online health information and helping them identify popular family-health-related topics; (4) providing a new approach for query recommendation in information retrieval systems; (5) supporting temporal information retrieval by presenting the temporal changes of family-health-related topics; and (6) providing a new combination of data collection and analysis methods for researchers
    corecore