8 research outputs found

    Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data

    Get PDF
    Objective: To develop a conceptual prediction model framework containing standardized steps and describe the corresponding open-source software developed to consistently implement the framework across computational environments and observational healthcare databases to enable model sharing and reproducibility. Methods: Based on existing best practices we propose a 5 step standardized framework for: (1) transparently defining the problem; (2) selecting suitable datasets; (3) constructing variables from the observational data; (4) learning the predictive model; and (5) validating the model performance. We implemented this framework as open-source software utilizing the Observational Medical Outcomes Partnership Common Data Model to enable convenient sharing of models and reproduction of model evaluation across multiple observational datasets. The software implementation contains default covariates and classifiers but the framework enables customization and extension. Results: As a proof-of-concept, demonstrating the transparency and ease of model dissemination using the software, we developed prediction models for 21 different outcomes within a target population of people suffering from depression across 4 observational databases. All 84 models are available in an accessible online repository to be implemented by anyone with access to an observational database in the Common DataModel format. Conclusions: The proof-of-concept study illustrates the framework's ability to develop reproducible models that can be readily shared and offers the potential to perform extensive external validation of models, and improve their likelihood of clinical uptake. In future work the framework will be applied to perform an "all-by-all" prediction analysis to assess the observational data prediction domain across numerous target populations, outcomes and time, and risk settings

    Bayesian inference reveals host-specific contributions to the epidemic expansion of influenza A H5N1

    No full text
    Since its first isolation in 1996 in Guangdong, China, the highly pathogenic avian influenza virus (HPAIV) H5N1 has circulated in avian hosts for almost two decades and spread to more than 60 countries worldwide. The role of different avian hosts and the domestic-wild bird interface has been critical in shaping the complex HPAIV H5N1 disease ecology, but remains difficult to ascertain. To shed light on the large-scale H5N1 transmission patterns and disentangle the contributions of different avian hosts on the tempo and mode of HPAIV H5N1 dispersal, we apply Bayesian evolutionary inference techniques to comprehensive sets of hemagglutinin and neuraminidase gene sequences sampled between 1996 and 2011 throughout Asia and Russia. Our analyses demonstrate that the large-scale H5N1 transmission dynamics are structured according to different avian flyways, and that the incursion of the Central Asian flyway specifically was driven by Anatidae hosts coinciding with rapid rate of spread and an epidemic wavefront acceleration. This also resulted in longdistance dispersal that is likely to be explained by wild bird migration. We identify a significant degree of asymmetry in the large-scale transmission dynamics between Anatidae and Phasianidae, with the latter largely representing poultry as an evolutionary sink. A joint analysis of host dynamics and continuous spatial diffusion demonstrates that the rate of viral dispersal and host diffusivity is significantly higher for Anatidae compared with Phasianidae. These findings complement risk modeling studies and satellite tracking of wild birds in demonstrating a continental-scale structuring into areas of H5N1 persistence that are connected through migratory waterfowl.SCOPUS: ar.jinfo:eu-repo/semantics/publishe

    Interpreting observational studies: Why empirical calibration is needed to correct p-values

    Get PDF
    Often the literature makes assertions of medical product effects on the basis of ' p<0.05'. The underlying premise is that at this threshold, there is only a 5% probability that the observed effect would be seen by chance when in reality there is no effect. In observational studies, much more than in randomized trials, bias and confounding may undermine this premise. To test this premise, we selected three exemplar drug safety studies from literature, representing a case-control, a cohort, and a self-controlled case series design. We attempted to replicate these studies as best we could for the drugs studied in the original articles. Next, we applied the same three designs to sets of negative controls: drugs that are not believed to cause the outcome of interest. We observed how often p<0.05 when the null hypothesis is true, and we fitted distributions to the effect estimates. Using these distributions, we compute calibrated p-values that reflect the probability of observing the effect estimate under the null hypothesis, taking both random and systematic error into account. An automated analysis of scientific literature was performed to evaluate the potential impact of such a calibration. Our experiment provides evidence that the majority of observational studies would declare statistical significance when no effect is present. Empirical calibration was found to reduce spurious results to the desired 5% level. Applying these adjustments to literature suggests that at least 54% of findings with p<0.05 are not actually statistically significant and should be reevaluated

    Phylogeography Reveals Association between Swine Trade and the Spread of Porcine Epidemic Diarrhea Virus in China and across the World

    No full text
    The ongoing SARS (severe acute respiratory syndrome)-CoV (coronavirus)-2 pandemic has exposed major gaps in our knowledge on the origin, ecology, evolution, and spread of animal coronaviruses. Porcine epidemic diarrhea virus (PEDV) is a member of the genus Alphacoronavirus in the family Coronaviridae that may have originated from bats and leads to significant hazards and widespread epidemics in the swine population. The role of local and global trade of live swine and swine-related products in disseminating PEDV remains unclear, especially in developing countries with complex swine production systems. Here, we undertake an in-depth phylogeographic analysis of PEDV sequence data (including 247 newly sequenced samples) and employ an extension of this inference framework that enables formally testing the contribution of a range of predictor variables to the geographic spread of PEDV. Within China, the provinces of Guangdong and Henan were identified as primary hubs for the spread of PEDV, for which we estimate live swine trade to play a very important role. On a global scale, the United States and China maintain the highest number of PEDV lineages. We estimate that, after an initial introduction out of China, the United States acted as an important source of PEDV introductions into Japan, Korea, China, and Mexico. Live swine trade also explains the dispersal of PEDV on a global scale. Given the increasingly global trade of live swine, our findings have important implications for designing prevention and containment measures to combat a wide range of livestock coronaviruses.SCOPUS: ar.jinfo:eu-repo/semantics/publishe

    Genomic epidemiology, evolution, and transmission dynamics of porcine deltacoronavirus

    No full text
    The emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has shown once again that coronavirus (CoV) in animals are potential sources for epidemics in humans. Porcine deltacoronavirus (PDCoV) is an emerging enteropathogen of swine with a worldwide distribution. Here, we implemented and described an approach to analyze the epidemiology of PDCoV following its emergence in the pig population. We performed an integrated analysis of full genome sequence data from 21 newly sequenced viruses, along with comprehensive epidemiological surveillance data collected globally over the last 15 years. We found four distinct phylogenetic lineages of PDCoV, which differ in their geographic circulation patterns. Interestingly, we identified more frequent intra- and interlineage recombination and higher virus genetic diversity in the Chinese lineages compared with the USA lineage where pigs are raised in different farming systems and ecological environments. Most recombination breakpoints are located in the ORF1ab gene rather than in genes encoding structural proteins. We also identified five amino acids under positive selection in the spike protein suggesting a role for adaptive evolution. According to structural mapping, three positively selected sites are located in the N-terminal domain of the S1 subunit, which is the most likely involved in binding to a carbohydrate receptor, whereas the other two are located in or near the fusion peptide of the S2 subunit and thus might affect membrane fusion. Finally, our phylogeographic investigations highlighted notable South-North transmission as well as frequent long-distance dispersal events in China that could implicate human-mediated transmission. Our findings provide new insights into the evolution and dispersal of PDCoV that contribute to our understanding of the critical factors involved in CoVs emergence.SCOPUS: ar.jinfo:eu-repo/semantics/publishe

    Comparative safety and effectiveness of alendronate versus raloxifene in women with osteoporosis

    Get PDF
    Alendronate and raloxifene are among the most popular anti-osteoporosis medications. However, there is a lack of head-to-head comparative effectiveness studies comparing the two treatments. We conducted a retrospective large-scale multicenter study encompassing over 300 million patients across nine databases encoded in the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). The primary outcome was the incidence of osteoporotic hip fracture, while secondary outcomes were vertebral fracture, atypical femoral fracture (AFF), osteonecrosis of the jaw (ONJ), and esophageal cancer. We used propensity score trimming and stratification based on an expansive propensity score model with all pre-treatment patient characteritistcs. We accounted for unmeasured confounding using negative control outcomes to estimate and adjust for residual systematic bias in each data source. We identified 283,586 alendronate patients and 40,463 raloxifene patients. There were 7.48 hip fracture, 8.18 vertebral fracture, 1.14 AFF, 0.21 esophageal cancer and 0.09 ONJ events per 1,000 person-years in the alendronate cohort and 6.62, 7.36, 0.69, 0.22 and 0.06 events per 1,000 person-years, respectively, in the raloxifene cohort. Alendronate and raloxifene have a similar hip fracture risk (hazard ratio [HR] 1.03, 95% confidence interval [CI] 0.94–1.13), but alendronate users are more likely to have vertebral fractures (HR 1.07, 95% CI 1.01–1.14). Alendronate has higher risk for AFF (HR 1.51, 95% CI 1.23–1.84) but similar risk for esophageal cancer (HR 0.95, 95% CI 0.53–1.70), and ONJ (HR 1.62, 95% CI 0.78–3.34). We demonstrated substantial control of measured confounding by propensity score adjustment, and minimal residual systematic bias through negative control experiments, lending credibility to our effect estimates. Raloxifene is as effective as alendronate and may remain an option in the prevention of osteoporotic fracture

    Virus genomes reveal factors that spread and sustained the Ebola epidemic

    No full text
    The 2013-2016 West African epidemic caused by the Ebola virus was of unprecedented magnitude, duration and impact. Here we reconstruct the dispersal, proliferation and decline of Ebola virus throughout the region by analysing 1,610 Ebola virus genomes, which represent over 5% of the known cases. We test the association of geography, climate and demography with viral movement among administrative regions, inferring a classic 'gravity' model, with intense dispersal between larger and closer populations. Despite attenuation of international dispersal after border closures, cross-border transmission had already sown the seeds for an international epidemic, rendering these measures ineffective at curbing the epidemic. We address why the epidemic did not spread into neighbouring countries, showing that these countries were susceptible to substantial outbreaks but at lower risk of introductions. Finally, we reveal that this large epidemic was a heterogeneous and spatially dissociated collection of transmission clusters of varying size, duration and connectivity. These insights will help to inform interventions in future epidemics
    corecore