7,227 research outputs found

    A Model-Based Analysis of GC-Biased Gene Conversion in the Human and Chimpanzee Genomes

    Get PDF
    GC-biased gene conversion (gBGC) is a recombination-associated process that favors the fixation of G/C alleles over A/T alleles. In mammals, gBGC is hypothesized to contribute to variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations, but its prevalence and general functional consequences remain poorly understood. gBGC is difficult to incorporate into models of molecular evolution and so far has primarily been studied using summary statistics from genomic comparisons. Here, we introduce a new probabilistic model that captures the joint effects of natural selection and gBGC on nucleotide substitution patterns, while allowing for correlations along the genome in these effects. We implemented our model in a computer program, called phastBias, that can accurately detect gBGC tracts about 1 kilobase or longer in simulated sequence alignments. When applied to real primate genome sequences, phastBias predicts gBGC tracts that cover roughly 0.3% of the human and chimpanzee genomes and account for 1.2% of human-chimpanzee nucleotide differences. These tracts fall in clusters, particularly in subtelomeric regions; they are enriched for recombination hotspots and fast-evolving sequences; and they display an ongoing fixation preference for G and C alleles. They are also significantly enriched for disease-associated polymorphisms, suggesting that they contribute to the fixation of deleterious alleles. The gBGC tracts provide a unique window into historical recombination processes along the human and chimpanzee lineages. They supply additional evidence of long-term conservation of megabase-scale recombination rates accompanied by rapid turnover of hotspots. Together, these findings shed new light on the evolutionary, functional, and disease implications of gBGC. The phastBias program and our predicted tracts are freely available. Ā© 2013 Capra et al

    Data incongruence and the problem of avian louse phylogeny

    Get PDF
    Recent studies based on different types of data (i.e. morphological and molecular) have supported conflicting phylogenies for the genera of avian feather lice (Ischnocera: Phthiraptera). We analyse new and published data from morphology and from mitochondrial (12S rRNA and COI) and nuclear (EF1-) genes to explore the sources of this incongruence and explain these conflicts. Character convergence, multiple substitutions at high divergences, and ancient radiation over a short period of time have contributed to the problem of resolving louse phylogeny with the data currently available. We show that apparent incongruence between the molecular datasets is largely attributable to rate variation and nonstationarity of base composition. In contrast, highly significant character incongruence leads to topological incongruence between the molecular and morphological data. We consider ways in which biases in the sequence data could be misleading, using several maximum likelihood models and LogDet corrections. The hierarchical structure of the data is explored using likelihood mapping and SplitsTree methods. Ultimately, we concede there is strong discordance between the molecular and morphological data and apply the conditional combination approach in this case. We conclude that higher level phylogenetic relationships within avian Ischnocera remain extremely problematic. However, consensus between datasets is beginning to converge on a stable phylogeny for avian lice, at and below the familial rank

    Steps in Metagenomics: Letā€™s Avoid Garbage in and Garbage Out

    Get PDF
    Is metagenomics a revolution or a new fad? Metagenomics is tightly associated with the availability of next-generation sequencing in all its implementations. The key feature of these new technologies, moving beyond the Sanger-based DNA sequencing approach, is the depth of nucleotide sequencing per sample.1 Knowing much more about a sample changes the traditional paradigms of ā€œWhat is the most abundant?ā€ or ā€œWhat is the most significant?ā€ to ā€œWhat is present and potentially sigĀ­nificant that might influence the situation and outcome?ā€ Letā€™s take the case of identifying proper biomarkers of disease state in the context of chronic disease prevention. Prevention has been deemed as a viable option to avert human chronic diseases and to curb healthĀ­care management costs.2 The actual implementation of any effective preventive measures has proven to be rather difficult. In addition to the typically poor compliance of the general public, the vagueness of the successful validation of habit modification on the long-term risk, points to the need of defining new biomarkers of disease state. Scientists and the public are accepting the fact that humans are super-organisms, harboring both a human genome and a microbial genome, the latter being much bigger in size and diversity, and key for the health of individuals.3,4 It is time to investigate the intricate relationship between humans and their associated microbiota and how this relationship modĀ­ulates or affects both partners.5 These remarks can be expanded to the animal and plant kingdoms, and holistically to the Earthā€™s biome. By its nature, the evolution and function of all the Earthā€™s biomes are influenced by a myriad of interactions between and among microbes (planktonic, in biofilms or host associated) and the surrounding physical environment. The general definition of metagenomics is the cultivation-indepenĀ­dent analysis of the genetic information of the collective genomes of the microbes within a given environment based on its sampling. It focuses on the collection of genetic information through sequencing that can target DNA, RNA, or both. The subsequent analyses can be solely foĀ­cused on sequence conservation, phylogenetic, phylogenomic, function, or genetic diversity representation including yet-to-be annotated genes. The diversity of hypotheses, questions, and goals to be accomplished is endless. The primary design is based on the nature of the material to be analyzed and its primary function

    METHODS FOR HIGH-THROUGHPUT COMPARATIVE GENOMICS AND DISTRIBUTED SEQUENCE ANALYSIS

    Get PDF
    High-throughput sequencing has accelerated applications of genomics throughout the world. The increased production and decentralization of sequencing has also created bottlenecks in computational analysis. In this dissertation, I provide novel computational methods to improve analysis throughput in three areas: whole genome multiple alignment, pan-genome annotation, and bioinformatics workflows. To aid in the study of populations, tools are needed that can quickly compare multiple genome sequences, millions of nucleotides in length. I present a new multiple alignment tool for whole genomes, named Mugsy, that implements a novel method for identifying syntenic regions. Mugsy is computationally efficient, does not require a reference genome, and is robust in identifying a rich complement of genetic variation including duplications, rearrangements, and large-scale gain and loss of sequence in mixtures of draft and completed genome data. Mugsy is evaluated on the alignment of several dozen bacterial chromosomes on a single computer and was the fastest program evaluated for the alignment of assembled human chromosome sequences from four individuals. A distributed version of the algorithm is also described and provides increased processing throughput using multiple CPUs. Numerous individual genomes are sequenced to study diversity, evolution and classify pan-genomes. Pan-genome annotations contain inconsistencies and errors that hinder comparative analysis, even within a single species. I introduce a new tool, Mugsy-Annotator, that identifies orthologs and anomalous gene structure across a pan-genome using whole genome multiple alignments. Identified anomalies include inconsistently located translation initiation sites and disrupted genes due to draft genome sequencing or pseudogenes. An evaluation of pan-genomes indicates that such anomalies are common and alternative annotations suggested by the tool can improve annotation consistency and quality. Finally, I describe the Cloud Virtual Resource, CloVR, a desktop application for automated sequence analysis that improves usability and accessibility of bioinformatics software and cloud computing resources. CloVR is installed on a personal computer as a virtual machine and requires minimal installation, addressing challenges in deploying bioinformatics workflows. CloVR also seamlessly accesses remote cloud computing resources for improved processing throughput. In a case study, I demonstrate the portability and scalability of CloVR and evaluate the costs and resources for microbial sequence analysis

    Conserved noncoding sequences highlight shared components of regulatory networks in dicotyledonous plants

    Get PDF
    Conserved noncoding sequences (CNSs) in DNA are reliable pointers to regulatory elements controlling gene expression. Using a comparative genomics approach with four dicotyledonous plant species (Arabidopsis thaliana, papaya [Carica papaya], poplar [Populus trichocarpa], and grape [Vitis vinifera]), we detected hundreds of CNSs upstream of Arabidopsis genes. Distinct positioning, length, and enrichment for transcription factor binding sites suggest these CNSs play a functional role in transcriptional regulation. The enrichment of transcription factors within the set of genes associated with CNS is consistent with the hypothesis that together they form part of a conserved transcriptional network whose function is to regulate other transcription factors and control development. We identified a set of promoters where regulatory mechanisms are likely to be shared between the model organism Arabidopsis and other dicots, providing areas of focus for further research

    Molecular studies on intraspecific diversity and phylogenetic position of Coniothyrium minitans

    Get PDF
    Simple sequence repeat (SSR)Ā±PCR amplification using a microsatellite primer (GACA)% and ribosomal RNA gene sequencing were used to examine the intraspecific diversity in the mycoparasite Coniothyrium minitans based on 48 strains, representing eight colony types, from 17 countries world-wide. Coniothyrium cerealis, C. fuckelii and C. sporulosum were used for interspecific comparison. The SSRĀ±PCR technique revealed a relatively low level of polymorphism within C. minitans but did allow some differentiation between strains. While there was no relationship between SSRĀ±PCR profiles and colony type, there was some limited correlation between these profiles and country of origin. Sequences of the ITS 1 and ITS 2 regions and the 5Ā±8S gene of rRNA genes were identical in all twenty-four strains of C. minitans examined irrespective of colony type and origin. These results indicate that C. minitans is genetically not very variable despite phenotypic differences. ITS and 5Ā±8S rRNA gene sequence analyses showed that C. minitans had similarities of 94% with C. fuckelii and C. sporulosum (which were identical to each other) and only 64% with C. cerealis. Database searches failed to show any similarity with the ITS 1 sequence for C. minitans although the 5Ā±8S rRNA gene and ITS 2 sequences revealed an 87% similarity with Aporospora terricola. The ITS sequence including the 5Ā±8S rRNA gene sequence of Coniothyrium cerealis showed 91% similarity to Phaeosphaeria microscopica. Phylogenetic analyses using database information suggest that C. minitans, C. sporulosum, C. fuckelii and A. terricola cluster in one clade, grouping with Helminthosporium species and 'Leptosphaeria' bicolor. Coniothyrium cerealis grouped with Ampelomyces quisqualis and formed a major cluster with members of the Phaeosphaeriacae and Phaeosphaeria microscopica
    • ā€¦
    corecore