Search CORE

31 research outputs found

The parasite Trichomonas vaginalis expresses thousands of pseudogenes and long non-coding RNAs independently from functional neighbouring genes

Author: Christian Woehle
Claudia Radine
Dan Graur
Gary Kusdian
Giddy Landan
Sven B Gould
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

BACKGROUND: The human pathogen Trichomonas vaginalis is a parabasalian flagellate that is estimated to infect 3% of the world’s population annually. With a 160 megabase genome and up to 60,000 genes residing in six chromosomes, the parasite has the largest genome among sequenced protists. Although it is thought that the genome size and unusual large coding capacity is owed to genome duplication events, the exact reason and its consequences are less well studied. RESULTS: Among transcriptome data we found thousands of instances, in which reads mapped onto genomic loci not annotated as genes, some reaching up to several kilobases in length. At first sight these appear to represent long non-coding RNAs (lncRNAs), however, about half of these lncRNAs have significant sequence similarities to genomic loci annotated as protein-coding genes. This provides evidence for the transcription of hundreds of pseudogenes in the parasite. Conventional lncRNAs and pseudogenes are expressed in Trichomonas through their own transcription start sites and independently from flanking genes in Trichomonas. Expression of several representative lncRNAs was verified through reverse-transcriptase PCR in different T. vaginalis strains and case studies exclude the use of alternative start codons or stop codon suppression for the genes analysed. CONCLUSION: Our results demonstrate that T. vaginalis expresses thousands of intergenic loci, including numerous transcribed pseudogenes. In contrast to yeast these are expressed independently from neighbouring genes. Our results furthermore illustrate the effect genome duplication events can have on the transcriptome of a protist. The parasite’s genome is in a steady state of changing and we hypothesize that the numerous lncRNAs could offer a large pool for potential innovation from which novel proteins or regulatory RNA units could evolve. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-906) contains supplementary material, which is available to authorized users

Springer - Publisher Connector

PubMed Central

Estimates of Positive Darwinian Selection Are Inflated by Errors in Sequencing, Annotation, and Alignment

Author: Adrian Schneider
Alexander Souvorov
Anisimova
Arbiza
Bakewell
Cannarozzi
Clark
Dan Graur
Dessimoz
Gaston H. Gonnet
Gibbs
Giddy Landan
Gonnet
Gonnet
Hill
Hubbard
Hughes
Jorgensen
Kosiol
Landan
Li
Murphy
Niv Sabath
Rom
Schneider
Studer
Yang
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Published estimates of the proportion of positively selected genes (PSGs) in human vary over three orders of magnitude. In mammals, estimates of the proportion of PSGs cover an even wider range of values. We used 2,980 orthologous protein-coding genes from human, chimpanzee, macaque, dog, cow, rat, and mouse as well as an established phylogenetic topology to infer the fraction of PSGs in all seven terminal branches. The inferred fraction of PSGs ranged from 0.9% in human through 17.5% in macaque to 23.3% in dog. We found three factors that influence the fraction of genes that exhibit telltale signs of positive selection: the quality of the sequence, the degree of misannotation, and ambiguities in the multiple sequence alignment. The inferred fraction of PSGs in sequences that are deficient in all three criteria of coverage, annotation, and alignment is 7.2 times higher than that in genes with high trace sequencing coverage, “known” annotation status, and perfect alignment scores. We conclude that some estimates on the prevalence of positive Darwinian selection in the literature may be inflated and should be treated with caution

Repository for Publications and Research Data

Crossref

PubMed Central

Identifying compositionally homogeneous and nonhomogeneous domains within the human genome using a novel segmentation algorithm

Author: Benjamini
Bernaola-Galván
Bernardi
Bernardi
Bernardi
Bernardi
Cohen
Costantini
Costantini
Cuny
Dan Graur
Elhaik
Eran Elhaik
Feller
Filipski
Fukagawa
Giddy Landan
Guéguen
Haiminen
Haiminen
Krešimir Josić
Lander
Li
Li
Li
Li
Li
Li
Lin
Macaya
Oliver
Oliver
Oliver
Peng
Peng
Schmidt
Sokal
Stanley
Thiery
Zar
Publication venue: Oxford University Press
Publication date: 22/06/2010
Field of study

It has been suggested that the mammalian genome is composed mainly of long compositionally homogeneous domains. Such domains are frequently identified using recursive segmentation algorithms based on the Jensen–Shannon divergence. However, a common difficulty with such methods is deciding when to halt the recursive partitioning and what criteria to use in deciding whether a detected boundary between two segments is real or not. We demonstrate that commonly used halting criteria are intrinsically biased, and propose IsoPlotter, a parameter-free segmentation algorithm that overcomes such biases by using a simple dynamic halting criterion and tests the homogeneity of the inferred domains. IsoPlotter was compared with an alternative segmentation algorithm, DJS, using two sets of simulated genomic sequences. Our results show that IsoPlotter was able to infer both long and short compositionally homogeneous domains with low GC content dispersion, whereas DJS failed to identify short compositionally homogeneous domains and sequences with low compositional dispersion. By segmenting the human genome with IsoPlotter, we found that one-third of the genome is composed of compositionally nonhomogeneous domains and the remaining is a mixture of many short compositionally homogeneous domains and relatively few long ones

Crossref

Lund University Publications

PubMed Central

White Rose Research Online

Chaperones Divide Yeast Proteins into Classes of Expression Level and Evolutionary Rate

Author: Akaike
Ashburner
Bogumil
Cannarozzi
Cherry
Conz
Darriba
David Bogumil
Dobson
Drummond
Drummond
Ellis
Esser
Fares
Fraser
Gautschi
Geiler-Samerotte
Ghaemmaghami
Giddy Landan
Gong
Grantham
Guindon
Gupta
Hartl
Horwich
Huh
Jones
Judith Ilhan
Kampinga
Katoh
Kerner
Krylov
Newman
Pál
Pál
Queitsch
Rospert
Rutherford
Sharp
Shorter
Stark
Szklarczyk
Tal Dagan
Tokuriki
Tuller
Voos
Wapinski
Warnecke
Yang
Young
Publication venue: Oxford University Press
Publication date
Field of study

Crossref

PubMed Central

A Method for the Simultaneous Estimation of Selection Intensities in Overlapping Genes

Author: A Narechania
A Pavesi
A Pavesi
AL Hughes
AL Hughes
AM Pedersen
BG Barrell
CE Jones
Dan Graur
DC Krakauer
EC Holmes
F Lillo
Giddy Landan
H Okamoto
HL Zaaijer
I Makalowska
IB Rogozin
J Hein
J Montoya
J Zhang
JC Obenauer
KR Sakharkar
KS Li
L Campitelli
M Nei
N Goldman
Niv Sabath
Oliver G. Pybus
P Pamilo
PK Keese
PR Cooper
R Belshaw
R Nielsen
RA Smith
S de Groot
S de Groot
S Guyader
S McCauley
S McCauley
S Normark
SB Needleman
T Miyata
WH Li
Y Bao
Y Suzuki
Z Yang
Z Yang
Z Yang
ZI Johnson
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Inferring the intensity of positive selection in protein-coding genes is important since it is used to shed light on the process of adaptation. Recently, it has been reported that overlapping genes, which are ubiquitous in all domains of life, seem to exhibit inordinate degrees of positive selection. Here, we present a new method for the simultaneous estimation of selection intensities in overlapping genes. We show that the appearance of positive selection is caused by assuming that selection operates independently on each gene in an overlapping pair, thereby ignoring the unique evolutionary constraints on overlapping coding regions. Our method uses an exact evolutionary model, thereby voiding the need for approximation or intensive computation. We test the method by simulating the evolution of overlapping genes of different types as well as under diverse evolutionary scenarios. Our results indicate that the independent estimation approach leads to the false appearance of positive selection even though the gene is in reality subject to negative selection. Finally, we use our method to estimate selection in two influenza A genes for which positive selection was previously inferred. We find no evidence for positive selection in both cases

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Characterization of pairwise and multiple sequence alignment errors. Gene 441:141–147

Author: Dan Graur
Giddy Landan
Publication venue
Publication date: 01/01/2009
Field of study

We characterize pairwise and multiple sequence alignment (MSA) errors by comparing true alignments from simulations of sequence evolution with reconstructed alignments. The vast majority of reconstructed alignments contain many errors. Error rates rapidly increase with sequence divergence, thus, for even intermediate degrees of sequence divergence, more than half of the columns of a reconstructed alignment may be expected to be erroneous. In closely related sequences, most errors consist of the erroneous positioning of a single indel event and their effect is local. As sequences diverge, errors become more complex as a result of the simultaneous mis-reconstruction of many indel events, and the lengths of the affected MSA segments increase dramatically. We found a systematic bias towards underestimation of the number of gaps, which leads to the reconstructed MSA being on average shorter than the true one. Alignment errors are unavoidable even when the evolutionary parameters are known in advance. Correct reconstruction can only be guaranteed when the likelihood of true alignment is uniquely optimal. However, true alignment features are very frequently sub-optimal or co-optimal, with the result that optimal albeit erroneous features are incorporated into the reconstructed MSA. Progressive MSA utilizes a guide-tree in the reconstruction of MSAs. The quality of the guide-tree was found to affect MSA error levels only marginally

CiteSeerX

Same-strand overlapping genes in bacteria: compositional determinants of phase bias

Author: Graur Dan
Landan Giddy
Sabath Niv
Publication venue: BMC
Publication date: 01/01/2008
Field of study

Abstract Background Same-strand overlapping genes may occur in frameshifts of one (phase 1) or two nucleotides (phase 2). In previous studies of bacterial genomes, long phase-1 overlaps were found to be more numerous than long phase-2 overlaps. This bias was explained by either genomic location or an unspecified selection advantage. Models that focused on the ability of the two genes to evolve independently did not predict this phase bias. Here, we propose that a purely compositional model explains the phase bias in a more parsimonious manner. Same-strand overlapping genes may arise through either a mutation at the termination codon of the upstream gene or a mutation at the initiation codon of the downstream gene. We hypothesized that given these two scenarios, the frequencies of initiation and termination codons in the two phases may determine the number for overlapping genes. Results We examined the frequencies of initiation- and termination-codons in the two phases, and found that termination codons do not significantly differ between the two phases, whereas initiation codons are more abundant in phase 1. We found that the primary factors explaining the phase inequality are the frequencies of amino acids whose codons may combine to form start codons in the two phases. We show that the frequencies of start codons in each of the two phases, and, hence, the potential for the creation of overlapping genes, are determined by a universal amino-acid frequency and species-specific codon usage, leading to a correlation between long phase-1 overlaps and genomic GC content. Conclusion Our model explains the phase bias in same-strand overlapping genes by compositional factors without invoking selection. Therefore, it can be used as a null model of neutral evolution to test selection hypotheses concerning the evolution of overlapping genes. Reviewers This article was reviewed by Bill Martin, Itai Yanai, and Mikhail Gelfand.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Data from: Concatenated alignments and the case of the disappearing tree

Author: Landan Giddy
Martin William F.
Thiergart Thorsten
Publication venue
Publication date: 30/12/2014
Field of study

BackgroundAnalyzed individually, gene trees for a given taxon set tend to harbour incongruent or conflicting signals. One popular approach to deal with this circumstance is to use concatenated data. But especially in prokaryotes, where lateral gene transfer (LGT) is a natural mechanism of generating genetic diversity, there are open questions as to whether concatenation amplifies or averages phylogenetic signals residing in individual genes. Here we investigate concatenations of prokaryotic and eukaryotic datasets to investigate possible sources of incongruence in phylogenetic trees and to examine the level of overlap between individual and concatenated alignments.ResultsWe analyzed prokaryotic datasets comprising 248 invidual gene trees from 315 genomes at three taxonomic depths spanning gammaproteobacteria, proteobacteria, and prokaryotes (bacteria plus archaea), and eukaryotic datasets comprising 279 invidual gene trees from 85 genomes at two taxonomic depths: across plants-animals-fungi and within fungi. Consistent with previous findings, the branches in trees made from concatenated alignments are, in general, not supported by any of their underlying individual gene trees, even though the concatenation trees tend to possess high bootstrap proportions values. For the prokaryote data, this observation is independent of phylogenetic depth and sequence conservation. The eukaryotic data show much better agreement between concatenation and single gene trees. LGT frequencies in trees were estimated using established methods. Sequence length in individual alignments, but not sequence divergence, was found to correlate with the generation of branches that correspond to the concatenated tree.ConclusionsThe weak correspondence of concatenation trees with single gene trees gives rise to the question where the phylogenetic signal in concatenated trees is coming from. The eukaryote data reveals a better correspondence between individual and concatenation trees than the prokaryote data. The question of whether the lack of correspondence between individual genes and the concatenation tree in the prokaryotic data is due to LGT or phylogenetic artefacts is remains unanswered. If LGT is the cause of incongruence between concatenation and individual trees, we would have expected to see greater degrees of incongruence for more divergent prokaryotic data sets, which was not observed, although estimated rates of LGT suggest that LGT is responsible for at least some of the observed incongruence

ZENODO

Dryad Digital Repository (Duke University)

Electronic Archiving System

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

resources

Author: Giddy Landan (375435)
Thorsten Thiergart (3311754)
William Martin (2830)
Publication venue
Publication date: 03/01/2015
Field of study

The Zip-file include 8 Folders. Each containing phylogenetic trees and alignments for one dataset

Dryad Digital Repository (Duke University)

FigShare

Estimation of selection intensity () by independent and simultaneous estimation.

Author: Dan Graur (230446)
Giddy Landan (375435)
Niv Sabath (375434)
Publication venue
Publication date
Field of study

aNumber of pairwise alignments of NS1 – NS2 overlaps is 10,569 and 8,745 for H5N1 and H9N2 subtypes, respectively; Number of pairwise alignments of PB1-F2 – PB1 overlaps is 16,112 and 33,720 for H5N1 and H9N2 subtypes, respectively.bMedian of over all pairwise comparisons. Lower and upper quartiles are noted in parentheses.cValues of selection intensity in PB1-F2 and NS1 genes that appear as positive selection by independent estimation are bolded.</p

FigShare