72 research outputs found

    The mysterious orphans of Mycoplasmataceae

    Full text link
    Background: The length of a protein sequence is largely determined by its function, i.e. each functional group is associated with an optimal size. However, comparative genomics revealed that proteins length may be affected by additional factors. In 2002 it was shown that in bacterium Escherichia coli and the archaeon Archaeoglobus fulgidus, protein sequences with no homologs are, on average, shorter than those with homologs. Most experts now agree that the length distributions are distinctly different between protein sequences with and without homologs in bacterial and archaeal genomes. In this study, we examine this postulate by a comprehensive analysis of all annotated prokaryotic genomes and focusing on certain exceptions. Results: We compared lengths distributions of having homologs proteins (HHPs) and non-having homologs proteins (orphans or ORFans) in all currently annotated completely sequenced prokaryotic genomes. As expected, the HHPs and ORFans have strikingly different length distributions in almost all genomes. As previously established, the HHPs, indeed, are, on average, longer than the ORFans, and the length distributions for the ORFans have a relatively narrow peak, in contrast to the HHPs, whose lengths spread over a wider range of values. However, about thirty genomes do not obey these rules. Practically all genomes of Mycoplasma and Ureaplasma have atypical ORFans distributions, with the mean lengths of ORFan larger than the mean lengths of HHPs. These genera constitute over 80% of atypical genomes. Conclusions: We confirmed on a ubiquitous set of genomes the previous observation that HHPs and ORFans have different gene length distributions. We also showed that Mycoplasmataceae genomes have distinctive distributions of ORFans lengths. We offer several possible biological explanations of this phenomenon

    GC3 biology in corn, rice, sorghum and other grasses

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The third, or wobble, position in a codon provides a high degree of possible degeneracy and is an elegant fault-tolerance mechanism. Nucleotide biases between organisms at the wobble position have been documented and correlated with the abundances of the complementary tRNAs. We and others have noticed a bias for cytosine and guanine at the third position in a subset of transcripts within a single organism. The bias is present in some plant species and warm-blooded vertebrates but not in all plants, or in invertebrates or cold-blooded vertebrates.</p> <p>Results</p> <p>Here we demonstrate that in certain organisms the amount of GC at the wobble position (GC<sub>3</sub>) can be used to distinguish two classes of genes. We highlight the following features of genes with high GC<sub>3 </sub>content: they (1) provide more targets for methylation, (2) exhibit more variable expression, (3) more frequently possess upstream TATA boxes, (4) are predominant in certain classes of genes (e.g., stress responsive genes) and (5) have a GC<sub>3 </sub>content that increases from 5'to 3'. These observations led us to formulate a hypothesis to explain GC<sub>3 </sub>bimodality in grasses.</p> <p>Conclusions</p> <p>Our findings suggest that high levels of GC<sub>3 </sub>typify a class of genes whose expression is regulated through DNA methylation or are a legacy of accelerated evolution through gene conversion. We discuss the three most probable explanations for GC<sub>3 </sub>bimodality: biased gene conversion, transcriptional and translational advantage and gene methylation.</p

    Local ancestry prediction with PyLAE

    Get PDF
    We developed PyLAE, a new tool for determining local ancestry along a genome using whole-genome sequencing data or high-density genotyping experiments. PyLAE can process an arbitrarily large number of ancestral populations (with or without an informative prior). Since PyLAE does not involve estimating many parameters, it can process thousands of genomes within a day. PyLAE can run on phased or unphased genomic data. We have shown how PyLAE can be applied to the identification of differentially enriched pathways between populations. The local ancestry approach results in higher enrichment scores compared to whole-genome approaches. We benchmarked PyLAE using the 1000 Genomes dataset, comparing the aggregated predictions with the global admixture results and the current gold standard program RFMix. Computational efficiency, minimal requirements for data pre-processing, straightforward presentation of results, and ease of installation make PyLAE a valuable tool to study admixed populations

    benchNGS : An approach to benchmark short reads alignment tools

    Full text link
    In the last decade a number of algorithms and associated software have been developed to align next generation sequencing (NGS) reads with relevant reference genomes. The accuracy of these programs may vary significantly, especially when the NGS reads are quite different from the available reference genome. We propose a benchmark to assess accuracy of short reads mapping based on the pre-computed global alignment of related genome sequences. In this paper we propose a benchmark to assess accuracy of the short reads mapping based on the pre-computed global alignment of closely related genome sequences. We outline the method and also present a short report of an experiment performed on five popular alignment tools based on the pairwise alignments of Escherichia coli O6 CFT073 genome with genomes of seven other bacteria.Comment: 1 figur

    Genome-wide analysis of genetic diversity and artificial selection in Large White pigs in Russia

    Get PDF
    Breeding practices adopted at different farms are aimed at maximizing the profitability of pig farming. In this work, we have analyzed the genetic diversity of Large White pigs in Russia. We compared genomes of historic and modern Large White Russian breeds using 271 pig samples. We have identified 120 candidate regions associated with the differentiation of modern and historic pigs and analyzed genomic differences between the modern farms. The identified genes were associated with height, fitness, conformation, reproductive performance, and meat quality

    Toward high-resolution population genomics using archaeological samples

    Get PDF
    The term ‘ancient DNA’ (aDNA) is coming of age, with over 1,200 hits in the PubMed database, beginning in the early 1980s with the studies of ‘molecular paleontology’. Rooted in cloning and limited sequencing of DNA from ancient remains during the pre-PCR era, the field has made incredible progress since the introduction of PCR and next-generation sequencing. Over the last decade, aDNA analysis ushered in a new era in genomics and became the method of choice for reconstructing the history of organisms, their biogeography, and migration routes, with applications in evolutionary biology, population genetics, archaeogenetics, paleoepidemiology, and many other areas. This change was brought by development of new strategies for coping with the challenges in studying aDNA due to damage and fragmentation, scarce samples, significant historical gaps, and limited applicability of population genetics methods. In this review, we describe the state-of-the-art achievements in aDNA studies, with particular focus on human evolution and demographic history. We present the current experimental and theoretical procedures for handling and analysing highly degraded aDNA. We also review the challenges in the rapidly growing field of ancient epigenomics. Advancement of aDNA tools and methods signifies a new era in population genetics and evolutionary medicine research

    Evidence-based gene models for structural and functional annotations of the oil palm genome

    Get PDF
    The advent of rapid and inexpensive DNA sequencing has led to an explosion of data waiting to be transformed into knowledge about genome organization and function. Gene prediction is customarily the starting point for genome analysis. This paper presents a bioinformatics study of the oil palm genome, including comparative genomics analysis, database and tools development, and mining of biological data for genes of interest. We have annotated 26,059 oil palm genes integrated from two independent gene-prediction pipelines, Fgenesh++ and Seqping. This integrated annotation constitutes a significant improvement in comparison to the preliminary annotation published in 2013. We conducted a comprehensive analysis of intronless, resistance and fatty acid biosynthesis genes, and demonstrated that the high quality of the current genome annotation. 3,658 intronless genes were identified in the oil palm genome, an important resource for evolutionary study. Further analysis of the oil palm genes revealed 210 candidate resistance genes involved in pathogen defense. Fatty acids have diverse applications ranging from food to industrial feedstocks, and we identified 42 key genes involved in fatty acid biosynthesis in oil palm. These results provide an important resource for studies of plant genomes and a theoretical foundation for marker-assisted breeding of oil palm and related crops
    • …
    corecore