Search CORE

73 research outputs found

The mysterious orphans of Mycoplasmataceae

Author: Bolshoy Alexander
Lysnyansky Inna
Nikolsky Yuri V.
Tatarinova Tatiana V.
Publication venue
Publication date: 28/08/2015
Field of study

Background: The length of a protein sequence is largely determined by its function, i.e. each functional group is associated with an optimal size. However, comparative genomics revealed that proteins length may be affected by additional factors. In 2002 it was shown that in bacterium Escherichia coli and the archaeon Archaeoglobus fulgidus, protein sequences with no homologs are, on average, shorter than those with homologs. Most experts now agree that the length distributions are distinctly different between protein sequences with and without homologs in bacterial and archaeal genomes. In this study, we examine this postulate by a comprehensive analysis of all annotated prokaryotic genomes and focusing on certain exceptions. Results: We compared lengths distributions of having homologs proteins (HHPs) and non-having homologs proteins (orphans or ORFans) in all currently annotated completely sequenced prokaryotic genomes. As expected, the HHPs and ORFans have strikingly different length distributions in almost all genomes. As previously established, the HHPs, indeed, are, on average, longer than the ORFans, and the length distributions for the ORFans have a relatively narrow peak, in contrast to the HHPs, whose lengths spread over a wider range of values. However, about thirty genomes do not obey these rules. Practically all genomes of Mycoplasma and Ureaplasma have atypical ORFans distributions, with the mean lengths of ORFan larger than the mean lengths of HHPs. These genera constitute over 80% of atypical genomes. Conclusions: We confirmed on a ubiquitous set of genomes the previous observation that HHPs and ORFans have different gene length distributions. We also showed that Mycoplasmataceae genomes have distinctive distributions of ORFans lengths. We offer several possible biological explanations of this phenomenon

arXiv.org e-Print Archive

Springer - Publisher Connector

GC3 biology in corn, rice, sorghum and other grasses

Author: Alexandrov Nickolai N
Bouck John B
Feldmann Kenneth A
Tatarinova Tatiana V
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The third, or wobble, position in a codon provides a high degree of possible degeneracy and is an elegant fault-tolerance mechanism. Nucleotide biases between organisms at the wobble position have been documented and correlated with the abundances of the complementary tRNAs. We and others have noticed a bias for cytosine and guanine at the third position in a subset of transcripts within a single organism. The bias is present in some plant species and warm-blooded vertebrates but not in all plants, or in invertebrates or cold-blooded vertebrates. Results Here we demonstrate that in certain organisms the amount of GC at the wobble position (GC3) can be used to distinguish two classes of genes. We highlight the following features of genes with high GC3 content: they (1) provide more targets for methylation, (2) exhibit more variable expression, (3) more frequently possess upstream TATA boxes, (4) are predominant in certain classes of genes (e.g., stress responsive genes) and (5) have a GC3 content that increases from 5'to 3'. These observations led us to formulate a hypothesis to explain GC3 bimodality in grasses. Conclusions Our findings suggest that high levels of GC3 typify a class of genes whose expression is regulated through DNA methylation or are a legacy of accelerated evolution through gene conversion. We discuss the three most probable explanations for GC3 bimodality: biased gene conversion, transcriptional and translational advantage and gene methylation.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

University of South Wales Research Explorer

PubMed Central

Local ancestry prediction with PyLAE

Author: Moshkov Nikita
Smetanin Aleksandr
Tatarinova Tatiana V.
Publication venue: 'PeerJ'
Publication date: 01/01/2021
Field of study

We developed PyLAE, a new tool for determining local ancestry along a genome using whole-genome sequencing data or high-density genotyping experiments. PyLAE can process an arbitrarily large number of ancestral populations (with or without an informative prior). Since PyLAE does not involve estimating many parameters, it can process thousands of genomes within a day. PyLAE can run on phased or unphased genomic data. We have shown how PyLAE can be applied to the identification of differentially enriched pathways between populations. The local ancestry approach results in higher enrichment scores compared to whole-genome approaches. We benchmarked PyLAE using the 1000 Genomes dataset, comparing the aggregated predictions with the global admixture results and the current gold standard program RFMix. Computational efficiency, minimal requirements for data pre-processing, straightforward presentation of results, and ease of installation make PyLAE a valuable tool to study admixed populations

PubMed Central

Repository of the Academy's Library

benchNGS : An approach to benchmark short reads alignment tools

Author: Alexandrov Nickolai
Dubchak Inna
Hassan Mehedi
Kryshchenko Alona
Rahman Farzana
Tatarinova Tatiana V.
Publication venue
Publication date: 24/04/2015
Field of study

In the last decade a number of algorithms and associated software have been developed to align next generation sequencing (NGS) reads with relevant reference genomes. The accuracy of these programs may vary significantly, especially when the NGS reads are quite different from the available reference genome. We propose a benchmark to assess accuracy of short reads mapping based on the pre-computed global alignment of related genome sequences. In this paper we propose a benchmark to assess accuracy of the short reads mapping based on the pre-computed global alignment of closely related genome sequences. We outline the method and also present a short report of an experiment performed on five popular alignment tools based on the pairwise alignments of Escherichia coli O6 CFT073 genome with genomes of seven other bacteria.Comment: 1 figur

arXiv.org e-Print Archive

Kingston University Research Repository

Genome-wide analysis of genetic diversity and artificial selection in Large White pigs in Russia

Author: Alexander Usatov
Lyubov Getmantseva
Nekruz Bakoev
Olga Kostyunina
Siroj Bakoev
Tatiana V. Tatarinova
Yuri Prytkov
Publication venue: 'PeerJ'
Publication date: 01/07/2021
Field of study

Breeding practices adopted at different farms are aimed at maximizing the profitability of pig farming. In this work, we have analyzed the genetic diversity of Large White pigs in Russia. We compared genomes of historic and modern Large White Russian breeds using 271 pig samples. We have identified 120 candidate regions associated with the differentiation of modern and historic pigs and analyzed genomic differences between the modern farms. The identified genes were associated with height, fitness, conformation, reproductive performance, and meat quality

Directory of Open Access Journals

Editorial: Population and ancestry specific variation in disease susceptibility

Author: Ekaterina A. Savina
Ekaterina A. Savina
Ranajit Das
Tatiana V. Tatarinova
Yuriy L. Orlov
Yuriy L. Orlov
Publication venue: Frontiers Media S.A.
Publication date: 01/09/2023
Field of study

Directory of Open Access Journals

Evidence-based gene models for structural and functional annotations of the oil palm genome

Author: Amiruddin Nadzirah
Azizi Norazah
Firdaus-Raih Mohd
Halim Mohd Amin Ab
Lim Chan Kuang
Murphy Denis
Nagappan Jayanthi
Ponomarenko Petr
Rosli Rozana
Sambanthamurthi Ravigadevi
Sanusi Nik Shazana Nik Mohd
Solovyev Victor
Tatarinova Tatiana V.
Ti Leslie Low Eng
Triska Martin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/04/2017
Field of study

The advent of rapid and inexpensive DNA sequencing has led to an explosion of data waiting to be transformed into knowledge about genome organization and function. Gene prediction is customarily the starting point for genome analysis. This paper presents a bioinformatics study of the oil palm genome, including comparative genomics analysis, database and tools development, and mining of biological data for genes of interest. We have annotated 26,059 oil palm genes integrated from two independent gene-prediction pipelines, Fgenesh++ and Seqping. This integrated annotation constitutes a significant improvement in comparison to the preliminary annotation published in 2013. We conducted a comprehensive analysis of intronless, resistance and fatty acid biosynthesis genes, and demonstrated that the high quality of the current genome annotation. 3,658 intronless genes were identified in the oil palm genome, an important resource for evolutionary study. Further analysis of the oil palm genes revealed 210 candidate resistance genes involved in pathogen defense. Fatty acids have diverse applications ranging from food to industrial feedstocks, and we identified 42 key genes involved in fatty acid biosynthesis in oil palm. These results provide an important resource for studies of plant genomes and a theoretical foundation for marker-assisted breeding of oil palm and related crops

arXiv.org e-Print Archive

Crossref

University of South Wales Research Explorer

Directory of Open Access Journals

FigShare

Toward high-resolution population genomics using archaeological samples

Author: Alexander S. Mikheyev
Ancha Baranova
Cano
Egor Prokhortchouk
Elhaik
Eran Elhaik
Evgeny Rogaev
Falush
GaneshPrasad ArunKumar
Hosseinali Asgharian
Irina Morozova
Li
Lindahl
Meyer
Nerlich
Olalde
Pavel Flegontov
Petr Ponomarenko
Sergey Bruskin
Tatiana V. Tatarinova
Taubenberger
Veeramah
Vladimir Klyuchnikov
Yuri Nikolsky
Yuriy Gankin
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2016
Field of study

The term ‘ancient DNA’ (aDNA) is coming of age, with over 1,200 hits in the PubMed database, beginning in the early 1980s with the studies of ‘molecular paleontology’. Rooted in cloning and limited sequencing of DNA from ancient remains during the pre-PCR era, the field has made incredible progress since the introduction of PCR and next-generation sequencing. Over the last decade, aDNA analysis ushered in a new era in genomics and became the method of choice for reconstructing the history of organisms, their biogeography, and migration routes, with applications in evolutionary biology, population genetics, archaeogenetics, paleoepidemiology, and many other areas. This change was brought by development of new strategies for coping with the challenges in studying aDNA due to damage and fragmentation, scarce samples, significant historical gaps, and limited applicability of population genetics methods. In this review, we describe the state-of-the-art achievements in aDNA studies, with particular focus on human evolution and demographic history. We present the current experimental and theoretical procedures for handling and analysing highly degraded aDNA. We also review the challenges in the rapidly growing field of ancient epigenomics. Advancement of aDNA tools and methods signifies a new era in population genetics and evolutionary medicine research

Lund University Publications

Crossref

PubMed Central

ZORA

White Rose Research Online