73 research outputs found
The mysterious orphans of Mycoplasmataceae
Background: The length of a protein sequence is largely determined by its
function, i.e. each functional group is associated with an optimal size.
However, comparative genomics revealed that proteins length may be affected by
additional factors. In 2002 it was shown that in bacterium Escherichia coli and
the archaeon Archaeoglobus fulgidus, protein sequences with no homologs are, on
average, shorter than those with homologs. Most experts now agree that the
length distributions are distinctly different between protein sequences with
and without homologs in bacterial and archaeal genomes. In this study, we
examine this postulate by a comprehensive analysis of all annotated prokaryotic
genomes and focusing on certain exceptions.
Results: We compared lengths distributions of having homologs proteins (HHPs)
and non-having homologs proteins (orphans or ORFans) in all currently annotated
completely sequenced prokaryotic genomes. As expected, the HHPs and ORFans have
strikingly different length distributions in almost all genomes. As previously
established, the HHPs, indeed, are, on average, longer than the ORFans, and the
length distributions for the ORFans have a relatively narrow peak, in contrast
to the HHPs, whose lengths spread over a wider range of values. However, about
thirty genomes do not obey these rules. Practically all genomes of Mycoplasma
and Ureaplasma have atypical ORFans distributions, with the mean lengths of
ORFan larger than the mean lengths of HHPs. These genera constitute over 80% of
atypical genomes.
Conclusions: We confirmed on a ubiquitous set of genomes the previous
observation that HHPs and ORFans have different gene length distributions. We
also showed that Mycoplasmataceae genomes have distinctive distributions of
ORFans lengths. We offer several possible biological explanations of this
phenomenon
GC3 biology in corn, rice, sorghum and other grasses
<p>Abstract</p> <p>Background</p> <p>The third, or wobble, position in a codon provides a high degree of possible degeneracy and is an elegant fault-tolerance mechanism. Nucleotide biases between organisms at the wobble position have been documented and correlated with the abundances of the complementary tRNAs. We and others have noticed a bias for cytosine and guanine at the third position in a subset of transcripts within a single organism. The bias is present in some plant species and warm-blooded vertebrates but not in all plants, or in invertebrates or cold-blooded vertebrates.</p> <p>Results</p> <p>Here we demonstrate that in certain organisms the amount of GC at the wobble position (GC<sub>3</sub>) can be used to distinguish two classes of genes. We highlight the following features of genes with high GC<sub>3 </sub>content: they (1) provide more targets for methylation, (2) exhibit more variable expression, (3) more frequently possess upstream TATA boxes, (4) are predominant in certain classes of genes (e.g., stress responsive genes) and (5) have a GC<sub>3 </sub>content that increases from 5'to 3'. These observations led us to formulate a hypothesis to explain GC<sub>3 </sub>bimodality in grasses.</p> <p>Conclusions</p> <p>Our findings suggest that high levels of GC<sub>3 </sub>typify a class of genes whose expression is regulated through DNA methylation or are a legacy of accelerated evolution through gene conversion. We discuss the three most probable explanations for GC<sub>3 </sub>bimodality: biased gene conversion, transcriptional and translational advantage and gene methylation.</p
Local ancestry prediction with PyLAE
We developed PyLAE, a new tool for determining local ancestry along a genome using whole-genome sequencing data or high-density genotyping experiments. PyLAE can process an arbitrarily large number of ancestral populations (with or without an informative prior). Since PyLAE does not involve estimating many parameters, it can process thousands of genomes within a day. PyLAE can run on phased or unphased genomic data. We have shown how PyLAE can be applied to the identification of differentially enriched pathways between populations. The local ancestry approach results in higher enrichment scores compared to whole-genome approaches. We benchmarked PyLAE using the 1000 Genomes dataset, comparing the aggregated predictions with the global admixture results and the current gold standard program RFMix. Computational efficiency, minimal requirements for data pre-processing, straightforward presentation of results, and ease of installation make PyLAE a valuable tool to study admixed populations
benchNGS : An approach to benchmark short reads alignment tools
In the last decade a number of algorithms and associated software have been
developed to align next generation sequencing (NGS) reads with relevant
reference genomes. The accuracy of these programs may vary significantly,
especially when the NGS reads are quite different from the available reference
genome. We propose a benchmark to assess accuracy of short reads mapping based
on the pre-computed global alignment of related genome sequences.
In this paper we propose a benchmark to assess accuracy of the short reads
mapping based on the pre-computed global alignment of closely related genome
sequences. We outline the method and also present a short report of an
experiment performed on five popular alignment tools based on the pairwise
alignments of Escherichia coli O6 CFT073 genome with genomes of seven other
bacteria.Comment: 1 figur
Genome-wide analysis of genetic diversity and artificial selection in Large White pigs in Russia
Breeding practices adopted at different farms are aimed at maximizing the profitability of pig farming. In this work, we have analyzed the genetic diversity of Large White pigs in Russia. We compared genomes of historic and modern Large White Russian breeds using 271 pig samples. We have identified 120 candidate regions associated with the differentiation of modern and historic pigs and analyzed genomic differences between the modern farms. The identified genes were associated with height, fitness, conformation, reproductive performance, and meat quality
Evidence-based gene models for structural and functional annotations of the oil palm genome
The advent of rapid and inexpensive DNA sequencing has led to an explosion of
data waiting to be transformed into knowledge about genome organization and
function. Gene prediction is customarily the starting point for genome
analysis. This paper presents a bioinformatics study of the oil palm genome,
including comparative genomics analysis, database and tools development, and
mining of biological data for genes of interest. We have annotated 26,059 oil
palm genes integrated from two independent gene-prediction pipelines, Fgenesh++
and Seqping. This integrated annotation constitutes a significant improvement
in comparison to the preliminary annotation published in 2013. We conducted a
comprehensive analysis of intronless, resistance and fatty acid biosynthesis
genes, and demonstrated that the high quality of the current genome annotation.
3,658 intronless genes were identified in the oil palm genome, an important
resource for evolutionary study. Further analysis of the oil palm genes
revealed 210 candidate resistance genes involved in pathogen defense. Fatty
acids have diverse applications ranging from food to industrial feedstocks, and
we identified 42 key genes involved in fatty acid biosynthesis in oil palm.
These results provide an important resource for studies of plant genomes and a
theoretical foundation for marker-assisted breeding of oil palm and related
crops
Toward high-resolution population genomics using archaeological samples
The term ‘ancient DNA’ (aDNA) is coming of age, with over 1,200 hits in the PubMed database,
beginning in the early 1980s with the studies of ‘molecular paleontology’. Rooted in cloning
and limited sequencing of DNA from ancient remains during the pre-PCR era, the field has
made incredible progress since the introduction of PCR and next-generation sequencing. Over
the last decade, aDNA analysis ushered in a new era in genomics and became the method of
choice for reconstructing the history of organisms, their biogeography, and migration routes,
with applications in evolutionary biology, population genetics, archaeogenetics, paleoepidemiology,
and many other areas. This change was brought by development of new strategies
for coping with the challenges in studying aDNA due to damage and fragmentation, scarce
samples, significant historical gaps, and limited applicability of population genetics methods. In this review, we describe the state-of-the-art achievements in aDNA studies, with particular focus
on human evolution and demographic history. We present the current experimental and theoretical
procedures for handling and analysing highly degraded aDNA. We also review the challenges
in the rapidly growing field of ancient epigenomics. Advancement of aDNA tools and
methods signifies a new era in population genetics and evolutionary medicine research
- …