16 research outputs found

    Positive Selection Differs between Protein Secondary Structure Elements in Drosophila

    Get PDF
    Different protein secondary structure elements have different physicochemical properties and roles in the protein, which may determine their evolutionary flexibility. However, it is not clear to what extent protein structure affects the way Darwinian selection acts at the amino acid level. Using phylogeny-based likelihood tests for positive selection, we have examined the relationship between protein secondary structure and selection across six species of Drosophila. We find that amino acids that form disordered regions, such as random coils, are far more likely to be under positive selection than expected from their proportion in the proteins, and residues in helices and β-structures are subject to less positive selection than predicted. In addition, it appears that sites undergoing positive selection are more likely than expected to occur close to one another in the protein sequence. Finally, on a genome-wide scale, we have determined that positively selected sites are found more frequently toward the gene ends. Our results demonstrate that protein structures with a greater degree of organization and strong hydrophobicity, represented here as helices and β-structures, are less tolerant to molecular adaptation than disordered, hydrophilic regions, across a diverse set of proteins

    Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel

    Get PDF
    Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low depth (average 7x), aiming to exhaustively characterize genetic variation down to 0.1% minor allele frequency in the British population. Here we demonstrate the value of this resource for improving imputation accuracy at rare and low-frequency variants in both a UK and an Italian population. We show that large increases in imputation accuracy can be achieved by re-phasing WGS reference panels after initial genotype calling. We also present a method for combining WGS panels to improve variant coverage and downstream imputation accuracy, which we illustrate by integrating 7,562 WGS haplotypes from the UK10K project with 2,184 haplotypes from the 1000 Genomes Project. Finally, we introduce a novel approximation that maintains speed without sacrificing imputation accuracy for rare variants

    Whole-genome sequencing of chronic lymphocytic leukemia identifies subgroups with distinct biological and clinical features

    Get PDF
    The value of genome-wide over targeted driver analyses for predicting clinical outcomes of cancer patients is debated. Here, we report the whole-genome sequencing of 485 chronic lymphocytic leukemia patients enrolled in clinical trials as part of the United Kingdom's 100,000 Genomes Project. We identify an extended catalog of recurrent coding and noncoding genetic mutations that represents a source for future studies and provide the most complete high-resolution map of structural variants, copy number changes and global genome features including telomere length, mutational signatures and genomic complexity. We demonstrate the relationship of these features with clinical outcome and show that integration of 186 distinct recurrent genomic alterations defines five genomic subgroups that associate with response to therapy, refining conventional outcome prediction. While requiring independent validation, our findings highlight the potential of whole-genome sequencing to inform future risk stratification in chronic lymphocytic leukemia

    Genome-wide analysis of selection in insects, mammals and fungi

    No full text
    Characterising and understanding factors that affect the rate of molecular evolution in proteins has played a major part in the development of evolutionary theory. The early analyses of amino acid substitutions stimulated the development of the neutral theory of molecular evolution, which later evolved into the nearly neutral theory. More recent work has lead to a better understanding of the role selection plays at the molecular level, but there is still limited understanding of how higher levels of protein organisation affect the way natural selection acts. The investigation of this question is the central aim of this thesis, which is addressed via the analysis of selective pressures in secondary protein structures in insects, mammals and fungi. The analyses for the first two groups were conducted using publically available datasets. To conduct the analyses in fungi, genome sequence data from the fungal genus Microbotryum (sequenced in our laboratory) was assembled and annotated, resulting in the development of a number of bioinformatics tools which are described here. The fungal, insect and mammalian datasets were interrogated with regard to a number of structural features, such as protein secondary structure, position of a site with regard to adaptively evolving sites, hydropathy and solvent-accessibility. These features were correlated with the signals of positive and purifying selection detected using phylogenetic maximum likelihood and Bayesian approaches. I conclude that all of the factors examined can have an effect on the rate of molecular evolution. In particular, disordered and hydrophilic regions of the protein are found to experience fewer physiochemical constraints and contain a higher proportion of adaptively evolving sites. It is also revealed that positively selected residues are ‘clustered’ together spatially, and these trends persist in the three taxa. Finally, I show that this variation in adaptive evolution is a result of both selective events and physiochemical constraint.</p

    Genome-wide analysis of selection in mammals, insects and fungi

    No full text
    Characterising and understanding factors that affect the rate of molecular evolution in proteins has played a major part in the development of evolutionary theory. The early analyses of amino acid substitutions stimulated the development of the neutral theory of molecular evolution, which later evolved into the nearly neutral theory. More recent work has lead to a better understanding of the role selection plays at the molecular level, but there is still limited understanding of how higher levels of protein organisation affect the way natural selection acts. The investigation of this question is the central aim of this thesis, which is addressed via the analysis of selective pressures in secondary protein structures in insects, mammals and fungi. The analyses for the first two groups were conducted using publically available datasets. To conduct the analyses in fungi, genome sequence data from the fungal genus Microbotryum (sequenced in our laboratory) was assembled and annotated, resulting in the development of a number of bioinformatics tools which are described here. The fungal, insect and mammalian datasets were interrogated with regard to a number of structural features, such as protein secondary structure, position of a site with regard to adaptively evolving sites, hydropathy and solvent-accessibility. These features were correlated with the signals of positive and purifying selection detected using phylogenetic maximum likelihood and Bayesian approaches. I conclude that all of the factors examined can have an effect on the rate of molecular evolution. In particular, disordered and hydrophilic regions of the protein are found to experience fewer physiochemical constraints and contain a higher proportion of adaptively evolving sites. It is also revealed that positively selected residues are ‘clustered’ together spatially, and these trends persist in the three taxa. Finally, I show that this variation in adaptive evolution is a result of both selective events and physiochemical constraint.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Genome-wide analysis of selection in insects, mammals and fungi

    No full text
    Characterising and understanding factors that affect the rate of molecular evolution in proteins has played a major part in the development of evolutionary theory. The early analyses of amino acid substitutions stimulated the development of the neutral theory of molecular evolution, which later evolved into the nearly neutral theory. More recent work has lead to a better understanding of the role selection plays at the molecular level, but there is still limited understanding of how higher levels of protein organisation affect the way natural selection acts. The investigation of this question is the central aim of this thesis, which is addressed via the analysis of selective pressures in secondary protein structures in insects, mammals and fungi. The analyses for the first two groups were conducted using publically available datasets. To conduct the analyses in fungi, genome sequence data from the fungal genus Microbotryum (sequenced in our laboratory) was assembled and annotated, resulting in the development of a number of bioinformatics tools which are described here. The fungal, insect and mammalian datasets were interrogated with regard to a number of structural features, such as protein secondary structure, position of a site with regard to adaptively evolving sites, hydropathy and solvent-accessibility. These features were correlated with the signals of positive and purifying selection detected using phylogenetic maximum likelihood and Bayesian approaches. I conclude that all of the factors examined can have an effect on the rate of molecular evolution. In particular, disordered and hydrophilic regions of the protein are found to experience fewer physiochemical constraints and contain a higher proportion of adaptively evolving sites. It is also revealed that positively selected residues are ‘clustered’ together spatially, and these trends persist in the three taxa. Finally, I show that this variation in adaptive evolution is a result of both selective events and physiochemical constraint.This thesis is not currently available in ORA

    Early Sex-Chromosome Evolution in the Diploid Dioecious Plant Mercurialis annua

    Get PDF
    Suppressed recombination allows divergence between homologous sex chromosomes and the functionality of their genes. Here, we reveal patterns of the earliest stages of sex-chromosome evolution in the diploid dioecious herb Mercurialis annua on the basis of cytological analysis, de novo genome assembly and annotation, genetic mapping, exome resequencing of natural populations, and transcriptome analysis. The genome assembly contained 34,105 expressed genes, of which 10,076 were assigned to linkage groups. Genetic mapping and exome resequencing of individuals across the species range both identified the largest linkage group, LG1, as the sex chromosome. Although the sex chromosomes of M. annua are karyotypically homomorphic, we estimate that about one-third of the Y chromosome, containing 568 transcripts and spanning 22.3 cM in the corresponding female map, has ceased recombining. Nevertheless, we found limited evidence for Y-chromosome degeneration in terms of gene loss and pseudogenization, and most X- and Y-linked genes appear to have diverged in the period subsequent to speciation between M. annua and its sister species M. huetii, which shares the same sex-determining region. Taken together, our results suggest that the M. annua Y chromosome has at least two evolutionary strata: a small old stratum shared with M. huetii, and a more recent larger stratum that is probably unique to M. annua and that stopped recombining similar to 1 MYA. Patterns of gene expression within the nonrecombining region are consistent with the idea that sexually antagonistic selection may have played a role in favoring suppressed recombination.Peer reviewe
    corecore