46 research outputs found

    Penalized regression calibration: a method for the prediction of survival outcomes using complex longitudinal and high-dimensional data

    Get PDF
    Longitudinal and high-dimensional measurements have become increasingly common in biomedical research. However, methods to predict survival outcomes using covariates that are both longitudinal and high-dimensional are currently missing. In this article, we propose penalized regression calibration (PRC), a method that can be employed to predict survival in such situations. PRC comprises three modeling steps: First, the trajectories described by the longitudinal predictors are flexibly modeled through the specification of multivariate mixed effects models. Second, subject-specific summaries of the longitudinal trajectories are derived from the fitted mixed models. Third, the time to event outcome is predicted using the subject-specific summaries as covariates in a penalized Cox model. To ensure a proper internal validation of the fitted PRC models, we furthermore develop a cluster bootstrap optimism correction procedure that allows to correct for the optimistic bias of apparent measures of predictiveness. PRC and the CBOCP are implemented in the R package pencal, available from CRAN. After studying the behavior of PRC via simulations, we conclude by illustrating an application of PRC to data from an observational study that involved patients affected by Duchenne muscular dystrophy, where the goal is predict time to loss of ambulation using longitudinal blood biomarkers.Comment: The article is now published in Statistics in Medicine (with Open Access

    Penalized regression calibration: a method for the prediction of survival outcomes using complex longitudinal and high-dimensional data

    Get PDF
    Longitudinal and high-dimensional measurements have become increasingly common in biomedical research. However, methods to predict survival outcomes using covariates that are both longitudinal and high-dimensional are currently missing. In this article, we propose penalized regression calibration (PRC), a method that can be employed to predict survival in such situations. PRC comprises three modeling steps: First, the trajectories described by the longitudinal predictors are flexibly modeled through the specification of multivariate mixed effects models. Second, subject-specific summaries of the longitudinal trajectories are derived from the fitted mixed models. Third, the time to event outcome is predicted using the subject-specific summaries as covariates in a penalized Cox model. To ensure a proper internal validation of the fitted PRC models, we furthermore develop a cluster bootstrap optimism correction procedure that allows to correct for the optimistic bias of apparent measures of predictiveness. PRC and the CBOCP are implemented in the R package pencal, available from CRAN. After studying the behavior of PRC via simulations, we conclude by illustrating an application of PRC to data from an observational study that involved patients affected by Duchenne muscular dystrophy, where the goal is predict time to loss of ambulation using longitudinal blood biomarkers.Development and application of statistical models for medical scientific researc

    Population genomics of cardiometabolic traits: design of the University College London-London School of Hygiene and Tropical Medicine-Edinburgh-Bristol (UCLEB) Consortium.

    Get PDF
    Substantial advances have been made in identifying common genetic variants influencing cardiometabolic traits and disease outcomes through genome wide association studies. Nevertheless, gaps in knowledge remain and new questions have arisen regarding the population relevance, mechanisms, and applications for healthcare. Using a new high-resolution custom single nucleotide polymorphism (SNP) array (Metabochip) incorporating dense coverage of genomic regions linked to cardiometabolic disease, the University College-London School-Edinburgh-Bristol (UCLEB) consortium of highly-phenotyped population-based prospective studies, aims to: (1) fine map functionally relevant SNPs; (2) precisely estimate individual absolute and population attributable risks based on individual SNPs and their combination; (3) investigate mechanisms leading to altered risk factor profiles and CVD events; and (4) use Mendelian randomisation to undertake studies of the causal role in CVD of a range of cardiovascular biomarkers to inform public health policy and help develop new preventative therapies

    The Role of Host Genetics in Susceptibility to Influenza: A Systematic Review

    Get PDF
    Background: The World Health Organization has identified studies of the role of host genetics on susceptibility to severe influenza as a priority. A systematic review was conducted to summarize the current state of evidence on the role of host genetics in susceptibility to influenza (PROSPERO registration number: CRD42011001380). Methods and Findings: PubMed, Web of Science, the Cochrane Library, and OpenSIGLE were searched using a pre-defined strategy for all entries up to the date of the search. Two reviewers independently screened the title and abstract of 1,371 unique articles, and 72 full text publications were selected for inclusion. Mouse models clearly demonstrate that host genetics plays a critical role in susceptibility to a range of human and avian influenza viruses. The Mx genes encoding interferon inducible proteins are the best studied but their relevance to susceptibility in humans is unknown. Although the MxA gene should be considered a candidate gene for further study in humans, over 100 other candidate genes have been proposed. There are however no data associating any of these candidate genes to susceptibility in humans, with the only published study in humans being under-powered. One genealogy study presents moderate evidence of a heritable component to the risk of influenza-associated death, and while the marked familial aggregation of H5N1 cases is suggestive of host genetic factors, this remains unproven. Conclusion: The fundamental question ‘‘Is susceptibility to severe influenza in humans heritable?’ ’ remains unanswered. No

    A Conserved Developmental Patterning Network Produces Quantitatively Different Output in Multiple Species of Drosophila

    Get PDF
    Differences in the level, timing, or location of gene expression can contribute to alternative phenotypes at the molecular and organismal level. Understanding the origins of expression differences is complicated by the fact that organismal morphology and gene regulatory networks could potentially vary even between closely related species. To assess the scope of such changes, we used high-resolution imaging methods to measure mRNA expression in blastoderm embryos of Drosophila yakuba and Drosophila pseudoobscura and assembled these data into cellular resolution atlases, where expression levels for 13 genes in the segmentation network are averaged into species-specific, cellular resolution morphological frameworks. We demonstrate that the blastoderm embryos of these species differ in their morphology in terms of size, shape, and number of nuclei. We present an approach to compare cellular gene expression patterns between species, while accounting for varying embryo morphology, and apply it to our data and an equivalent dataset for Drosophila melanogaster. Our analysis reveals that all individual genes differ quantitatively in their spatio-temporal expression patterns between these species, primarily in terms of their relative position and dynamics. Despite many small quantitative differences, cellular gene expression profiles for the whole set of genes examined are largely similar. This suggests that cell types at this stage of development are conserved, though they can differ in their relative position by up to 3–4 cell widths and in their relative proportion between species by as much as 5-fold. Quantitative differences in the dynamics and relative level of a subset of genes between corresponding cell types may reflect altered regulatory functions between species. Our results emphasize that transcriptional networks can diverge over short evolutionary timescales and that even small changes can lead to distinct output in terms of the placement and number of equivalent cells

    Characterisation of barley resistance to rhynchosporium on chromosome 6HS

    Get PDF
    Key Message: Major resistance gene to rhynchosporium, Rrs18, maps close to the telomere on the short arm of chromosome 6H in barley. Rhynchosporium or barley scald caused by a fungal pathogen Rhynchosporium commune is one of the most destructive and economically important diseases of barley in the world. Testing of Steptoe × Morex and CIho 3515 × Alexis doubled haploid populations has revealed a large effect QTL for resistance to R. commune close to the telomere on the short arm of chromosome 6H, present in both populations. Mapping markers flanking the QTL from both populations onto the 2017 Morex genome assembly revealed a rhynchosporium resistance locus independent of Rrs13 that we named Rrs18. The causal gene was fine mapped to an interval of 660 Kb using Steptoe × Morex backcross 1 S₂ and S₃ lines with molecular markers developed from Steptoe exome capture variant calling. Sequencing RNA from CIho 3515 and Alexis revealed that only 4 genes within the Rrs18 interval were transcribed in leaf tissue with a serine/threonine protein kinase being the most likely candidate for Rrs18.Max Coulter, Bianca Büttner, Kerstin Hofmann, Micha Bayer, Luke Ramsay, Günther Schweizer, Robbie Waugh, Mark E. Looseley, Anna Avrov

    Unity in defence: honeybee workers exhibit conserved molecular responses to diverse pathogens

    Get PDF
    This is the final version of the article. Available from the publisher via the DOI in this record.Background: Organisms typically face infection by diverse pathogens, and hosts are thought to have developed specific responses to each type of pathogen they encounter. The advent of transcriptomics now makes it possible to test this hypothesis and compare host gene expression responses to multiple pathogens at a genome-wide scale. Here, we performed a meta-analysis of multiple published and new transcriptomes using a newly developed bioinformatics approach that filters genes based on their expression profile across datasets. Thereby, we identified common and unique molecular responses of a model host species, the honey bee (Apis mellifera), to its major pathogens and parasites: the Microsporidia Nosema apis and Nosema ceranae, RNA viruses, and the ectoparasitic mite Varroa destructor, which transmits viruses. Results: We identified a common suite of genes and conserved molecular pathways that respond to all investigated pathogens, a result that suggests a commonality in response mechanisms to diverse pathogens. We found that genes differentially expressed after infection exhibit a higher evolutionary rate than non-differentially expressed genes. Using our new bioinformatics approach, we unveiled additional pathogen-specific responses of honey bees; we found that apoptosis appeared to be an important response following microsporidian infection, while genes from the immune signalling pathways, Toll and Imd, were differentially expressed after Varroa/virus infection. Finally, we applied our bioinformatics approach and generated a gene co-expression network to identify highly connected (hub) genes that may represent important mediators and regulators of anti-pathogen responses. Conclusions: Our meta-analysis generated a comprehensive overview of the host metabolic and other biological processes that mediate interactions between insects and their pathogens. We identified key host genes and pathways that respond to phylogenetically diverse pathogens, representing an important source for future functional studies as well as offering new routes to identify or generate pathogen resilient honey bee stocks. The statistical and bioinformatics approaches that were developed for this study are broadly applicable to synthesize information across transcriptomic datasets. These approaches will likely have utility in addressing a variety of biological questions.This article is a joint effort of the working group TRANSBEE and an outcome of two workshops kindly supported by sDiv, the Synthesis Centre for Biodiversity Sciences within the German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, funded by the German Science Foundation (FZT 118). New datasets were performed thanks to the Insect Pollinators Initiative (IPI grant BB/I000100/1 and BB/I000151/1), with participation of the UK-USA exchange funded by the BBSRC BB/I025220/1 (datasets #4, 11 and 14). The IPI is funded jointly by the Biotechnology and Biological Sciences Research Council, the Department for Environment, Food and Rural Affairs, the Natural Environment Research Council, the Scottish Government and the Wellcome Trust, under the Living with Environmental Change Partnershi

    Quantitative Models of the Mechanisms That Control Genome-Wide Patterns of Transcription Factor Binding during Early Drosophila Development

    Get PDF
    Transcription factors that drive complex patterns of gene expression during animal development bind to thousands of genomic regions, with quantitative differences in binding across bound regions mediating their activity. While we now have tools to characterize the DNA affinities of these proteins and to precisely measure their genome-wide distribution in vivo, our understanding of the forces that determine where, when, and to what extent they bind remains primitive. Here we use a thermodynamic model of transcription factor binding to evaluate the contribution of different biophysical forces to the binding of five regulators of early embryonic anterior-posterior patterning in Drosophila melanogaster. Predictions based on DNA sequence and in vitro protein-DNA affinities alone achieve a correlation of ∼0.4 with experimental measurements of in vivo binding. Incorporating cooperativity and competition among the five factors, and accounting for spatial patterning by modeling binding in every nucleus independently, had little effect on prediction accuracy. A major source of error was the prediction of binding events that do not occur in vivo, which we hypothesized reflected reduced accessibility of chromatin. To test this, we incorporated experimental measurements of genome-wide DNA accessibility into our model, effectively restricting predicted binding to regions of open chromatin. This dramatically improved our predictions to a correlation of 0.6–0.9 for various factors across known target genes. Finally, we used our model to quantify the roles of DNA sequence, accessibility, and binding competition and cooperativity. Our results show that, in regions of open chromatin, binding can be predicted almost exclusively by the sequence specificity of individual factors, with a minimal role for protein interactions. We suggest that a combination of experimentally determined chromatin accessibility data and simple computational models of transcription factor binding may be used to predict the binding landscape of any animal transcription factor with significant precision

    Assessing interactions between the associations of common genetic susceptibility variants, reproductive history and body mass index with breast cancer risk in the breast cancer association consortium: a combined case-control study.

    Get PDF
    INTRODUCTION: Several common breast cancer genetic susceptibility variants have recently been identified. We aimed to determine how these variants combine with a subset of other known risk factors to influence breast cancer risk in white women of European ancestry using case-control studies participating in the Breast Cancer Association Consortium. METHODS: We evaluated two-way interactions between each of age at menarche, ever having had a live birth, number of live births, age at first birth and body mass index (BMI) and each of 12 single nucleotide polymorphisms (SNPs) (10q26-rs2981582 (FGFR2), 8q24-rs13281615, 11p15-rs3817198 (LSP1), 5q11-rs889312 (MAP3K1), 16q12-rs3803662 (TOX3), 2q35-rs13387042, 5p12-rs10941679 (MRPS30), 17q23-rs6504950 (COX11), 3p24-rs4973768 (SLC4A7), CASP8-rs17468277, TGFB1-rs1982073 and ESR1-rs3020314). Interactions were tested for by fitting logistic regression models including per-allele and linear trend main effects for SNPs and risk factors, respectively, and single-parameter interaction terms for linear departure from independent multiplicative effects. RESULTS: These analyses were applied to data for up to 26,349 invasive breast cancer cases and up to 32,208 controls from 21 case-control studies. No statistical evidence of interaction was observed beyond that expected by chance. Analyses were repeated using data from 11 population-based studies, and results were very similar. CONCLUSIONS: The relative risks for breast cancer associated with the common susceptibility variants identified to date do not appear to vary across women with different reproductive histories or body mass index (BMI). The assumption of multiplicative combined effects for these established genetic and other risk factors in risk prediction models appears justified.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are
    corecore