165 research outputs found

    Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study

    Get PDF
    Background: There is no consensus on the most appropriate approach to handle missing covariate data within prognostic modelling studies. Therefore a simulation study was performed to assess the effects of different missing data techniques on the performance of a prognostic model. Methods: Datasets were generated to resemble the skewed distributions seen in a motivating breast cancer example. Multivariate missing data were imposed on four covariates using four different mechanisms; missing completely at random (MCAR), missing at random (MAR), missing not at random (MNAR) and a combination of all three mechanisms. Five amounts of incomplete cases from 5% to 75% were considered. Complete case analysis (CC), single imputation (SI) and five multiple imputation (MI) techniques available within the R statistical software were investigated: a) data augmentation (DA) approach assuming a multivariate normal distribution, b) DA assuming a general location model, c) regression switching imputation, d) regression switching with predictive mean matching (MICE-PMM) and e) flexible additive imputation models. A Cox proportional hazards model was fitted and appropriate estimates for the regression coefficients and model performance measures were obtained. Results: Performing a CC analysis produced unbiased regression estimates, but inflated standard errors, which affected the significance of the covariates in the model with 25% or more missingness. Using SI, underestimated the variability; resulting in poor coverage even with 10% missingness. Of the MI approaches, applying MICE-PMM produced, in general, the least biased estimates and better coverage for the incomplete covariates and better model performance for all mechanisms. However, this MI approach still produced biased regression coefficient estimates for the incomplete skewed continuous covariates when 50% or more cases had missing data imposed with a MCAR, MAR or combined mechanism. When the missingness depended on the incomplete covariates, i.e. MNAR, estimates were biased with more than 10% incomplete cases for all MI approaches. Conclusion: The results from this simulation study suggest that performing MICE-PMM may be the preferred MI approach provided that less than 50% of the cases have missing data and the missing data are not MNAR

    A monodisperse transmembrane α-helical peptide barrel

    Get PDF
    The fabrication of monodisperse transmembrane barrels formed from short synthetic peptides has not been demonstrated previously. This is in part because of the complexity of the interactions between peptides and lipids within the hydrophobic environment of a membrane. Here we report the formation of a transmembrane pore through the self-assembly of 35 amino acid α-helical peptides. The design of the peptides is based on the C-terminal D4 domain of the Escherichia coli polysaccharide transporter Wza. By using single-channel current recording, we define discrete assembly intermediates and show that the pore is most probably a helix barrel that contains eight D4 peptides arranged in parallel. We also show that the peptide pore is functional and capable of conducting ions and binding blockers. Such α-helix barrels engineered from peptides could find applications in nanopore technologies such as single-molecule sensing and nucleic-acid sequencing

    Mitotic Illegitimate Recombination Is a Mechanism for Novel Changes in High-Molecular-Weight Glutenin Subunits in Wheat-Rye Hybrids

    Get PDF
    Wide hybrids can have novel traits or changed expression of a quantitative trait that their parents do not have. These phenomena have long been noticed, yet the mechanisms are poorly understood. High-molecular-weight glutenin subunits (HMW-GS) are seed storage proteins encoded by Glu-1 genes that only express in endosperm in wheat and its related species. Novel HMW-GS compositions have been observed in their hybrids. This research elucidated the molecular mechanisms by investigating the causative factors of novel HMW-GS changes in wheat-rye hybrids. HMW-GS compositions in the endosperm and their coding sequences in the leaves of F1 and F2 hybrids between wheat landrace Shinchunaga and rye landrace Qinling were investigated. Missing and/or additional novel HMW-GSs were observed in the endosperm of 0.5% of the 2078 F1 and 22% of 36 F2 hybrid seeds. The wildtype Glu-1Ax null allele was found to have 42 types of short repeat sequences of 3-60 bp long that appeared 2 to 100 times. It also has an in-frame stop codon in the central repetitive region. Analyzing cloned allele sequences of HMW-GS coding gene Glu-1 revealed that deletions involving the in-frame stop codon had happened, resulting in novel ∼1.8-kb Glu-1Ax alleles in some F1 and F2 plants. The cloned mutant Glu-1Ax alleles were expressed in Escherichia coli, and the HMW-GSs produced matched the novel HMW-GSs found in the hybrids. The differential changes between the endosperm and the plant of the same hybrids and the data of E. coli expression of the cloned deletion alleles both suggested that mitotic illegitimate recombination between two copies of a short repeat sequence had resulted in the deletions and thus the changed HMW-GS compositions. Our experiments have provided the first direct evidence to show that mitotic illegitimate recombination is a mechanism that produces novel phenotypes in wide hybrids

    Recurrent Signature Patterns in HIV-1 B Clade Envelope Glycoproteins Associated with either Early or Chronic Infections

    Get PDF
    Here we have identified HIV-1 B clade Envelope (Env) amino acid signatures from early in infection that may be favored at transmission, as well as patterns of recurrent mutation in chronic infection that may reflect common pathways of immune evasion. To accomplish this, we compared thousands of sequences derived by single genome amplification from several hundred individuals that were sampled either early in infection or were chronically infected. Samples were divided at the outset into hypothesis-forming and validation sets, and we used phylogenetically corrected statistical strategies to identify signatures, systematically scanning all of Env. Signatures included single amino acids, glycosylation motifs, and multi-site patterns based on functional or structural groupings of amino acids. We identified signatures near the CCR5 co-receptor-binding region, near the CD4 binding site, and in the signal peptide and cytoplasmic domain, which may influence Env expression and processing. Two signatures patterns associated with transmission were particularly interesting. The first was the most statistically robust signature, located in position 12 in the signal peptide. The second was the loss of an N-linked glycosylation site at positions 413–415; the presence of this site has been recently found to be associated with escape from potent and broad neutralizing antibodies, consistent with enabling a common pathway for immune escape during chronic infection. Its recurrent loss in early infection suggests it may impact fitness at the time of transmission or during early viral expansion. The signature patterns we identified implicate Env expression levels in selection at viral transmission or in early expansion, and suggest that immune evasion patterns that recur in many individuals during chronic infection when antibodies are present can be selected against when the infection is being established prior to the adaptive immune response

    Pan-cancer analysis of whole genomes

    Get PDF
    Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale(1-3). Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4-5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter(4); identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation(5,6); analyses timings and patterns of tumour evolution(7); describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity(8,9); and evaluates a range of more-specialized features of cancer genomes(8,10-18).Peer reviewe
    corecore