22 research outputs found
<i>d<sub>N</sub>/d<sub>S</sub></i> Cumulative Frequency Distribution for Orthologues, Paralogues, and Pseudogenes Predicted by PhyOP
<p>Predicted pseudogenes exhibit median <i>d<sub>N</sub>/d<sub>S</sub></i> ratios of 0.22 when compared with their orthologues, 0.55 with functional in-paralogues, and 0.65 with in-paralogues that are themselves also candidate pseudogenes. The 1:1 orthologues have a median <i>d<sub>N</sub>/d<sub>S</sub></i> of 0.11. Assuming a constant mutation rate, the <i>d<sub>N</sub>/d<sub>S</sub></i> after loss of function in pseudogenes should relax towards approximately 0.55 (the average of 1.00 for no selection and 0.11 for purifying selection) when compared with a functional homologue, and towards 1.00 when compared with a homologue which is also a pseudogene. The <i>d<sub>N</sub>/d<sub>S</sub></i> distribution between in-paralogues (dashed lines) is greatly shifted upwards, suggesting that the changes in selective constraints for both functional and pseudogene paralogues tend to be much more recent than the dog–human divergence.</p
Calculating Minimum Syntenic Distance for Orthologues
<p>The minimum syntenic distance is the smallest difference in gene order between neighbours of its orthologues in the other species. Starting from human gene H<sub>1</sub>, the chromosomal location of its dog orthologue D<sub>1</sub> is noted (step 1). The flanking genes (within a window of 20 sets of orthologues) are searched for the nearest neighbouring human gene with an orthologue on the same chromosome as D<sub>1</sub>. Thus, the immediate neighbour to the right of H<sub>1</sub> can be ignored because it does not have an orthologue on the same chromosome as D<sub>1</sub> (step 2). The subsequent gene H<sub>2</sub> has a dog orthologue (step 3) D<sub>2</sub> on the same chromosome as D<sub>1</sub>. The syntenic distance for gene H<sub>1</sub> in the downstream direction is calculated to be four genes, by counting the number of intervening genes (using Ensembl gene loci) between D<sub>1</sub> and D<sub>2</sub> (step 4). Upstream of H<sub>1</sub> and D<sub>1</sub>, however, no genes have been inserted after the next orthologous genes H<sub>3</sub> and D<sub>3</sub>. The minimum syntenic distance for H<sub>1</sub> is thus 1.</p
Venn Diagram Comparing Orthology Relationships Predicted by PhyOP, Ensembl, and Inparanoid
<div><p>(A) Most 1:1 orthologue predictions are shared between the three methods: PhyOP (solid rectangle), Ensembl (striped rectangle), and Inparanoid (hollow rectangle).</p><p>(B) Orthology predictions that involve lineage-specific duplications, however, differ markedly between PhyOP and Ensembl. Most Inparanoid predictions are a subset of those from Ensembl.</p><p>(C) The same is true for predicted paralogy relationships.</p></div
The Assignment of Orthology by Ensembl
<div><p>(A) Shows the true phylogenetic relationships for three dog (D<sub>1–3</sub>) and three human gene homologues (H<sub>1–3</sub>). D<sub>3</sub> and H<sub>3</sub> are 1:1 orthologues, having being derived from a single gene at the last common ancestor (marked “S” for speciation point). D<sub>1</sub>, D<sub>2</sub> and H<sub>1</sub>, H<sub>2</sub> are likewise orthologues of each other but in a many-to-many relationship.</p><p>(B) Shows that D<sub>1</sub> and H<sub>1</sub> and D<sub>3</sub> and H<sub>3</sub> are BLAST reciprocal best hits (solid arrows; “UBRH” in Ensembl terminology). Because the D<sub>2</sub> and H<sub>2</sub> loci are closely linked neighbours of the H<sub>1</sub> loci, their orthology relationships are also predicted by Ensembl on the basis of their BLAST nonreciprocal best hits: H<sub>1</sub> is the best hit for D<sub>2</sub>, and D<sub>2</sub> is the best hit in turn for H<sub>2</sub> (dashed red arrows; “RHS” in Ensembl terminology). Because of this lack of reciprocity, H<sub>1</sub> is simultaneously in a many-to-one relationship with D<sub>2</sub> (and H<sub>2</sub>) and a one-to-many relationship with D<sub>1</sub> and D<sub>2</sub>. As orthology is, by definition, a transitive property between genes of two species, this inconsistency can be reconciled by linking all four genes together into a single set of orthologues, in effect adding the missing link between D<sub>1</sub> and H<sub>2</sub>. Many such inconsistencies can be found in version 27.1 of the Ensembl Compara database, for example, ENSCAFG00000009718, ENSCAFG00000009724, ENSG00000180305, and ENSG00000182931 are found in relationships illustrated by D<sub>1</sub>, D<sub>2</sub>, H<sub>1</sub>, and H<sub>2</sub>, respectively.</p><p>(C) Human gene H<sub>3</sub> has not been predicted. The highest-scoring BLAST alignment for its orphaned orthologue D<sub>3</sub> becomes H<sub>2</sub> (dashed red arrow). This erroneous assignment of orthology for D<sub>3</sub> arises because Ensembl does not distinguish between adjacent in-paralogues such as H<sub>1</sub> and H<sub>2</sub>, and out-paralogues such as H<sub>3</sub>.</p></div
PhyOP, Ensembl, and Inparanoid <i>d<sub>S</sub></i> Cumulative Frequency Distributions
<p>These include orthologues which have (manys) or have not (1:1) been involved in lineage specific duplications. The <i>d<sub>S</sub></i> distributions for 1:1 orthologues are similar for the three methods. The distributions for Ensembl and Inparanoid 1:1 orthologues are indistinguishable, and the median <i>d<sub>S</sub></i> for PhyOP 1:1 orthologues is only slightly smaller. This is mainly because most of the predictions are common to all. PhyOP “manys” orthologues have a larger median <i>d<sub>S</sub></i> than do 1:1 orthologues. The <i>d<sub>S</sub></i> distributions for “manys” orthologues predicted by Inparanoid and Ensembl are very much shifted to the right, indicating that a large proportion of these genes may have diverged well before the dog and human lineages separated.</p
Deriving Orthology via Transcript Phylogeny
<div><p>(A,B) Phylogenetic relationships for a dog (D<sub>1</sub>) and three human (H<sub>1</sub>, H<sub>2</sub>, and H<sub>3</sub>) genes. D<sub>1</sub> is the orthologue to H<sub>1</sub> and H<sub>2</sub>. H<sub>3</sub> has been orphaned by the loss of its dog orthologue. Each gene has two splice variants A and B (B), and their transcripts are subscripted accordingly.</p><p>(C) Phylogenetic relationships for all transcripts. Each group or clade of orthologous transcripts recapitulates the gene orthology in (A). The transcripts A and B for the orphaned gene H<sub>3</sub> are also themselves orphaned on the transcript tree. The transcripts from clade 1 are selected to represent the three genes (D<sub>1</sub>, H<sub>1</sub>, and H<sub>2</sub>) because phylogenetic distance between orthologues (arrow 1) is smaller than that for clade 2 (arrow 2).</p><p>(D) How orthology is predicted when transcripts are missing. D<sub>1A</sub> and H<sub>2A</sub> are selected as the representative transcripts for their genes because the <i>d<sub>S</sub></i> between these orthologues is smaller than that for D<sub>1B</sub> and H<sub>1B</sub>. The transcripts in clade 1 are used to predict orthology between D<sub>1</sub> and H<sub>2</sub>. Though H<sub>1</sub> also has transcripts in orthologous relationships with D<sub>1</sub>, orthology between these two genes is not predicted, leaving H<sub>1</sub> as an orphan. No orthology predictions are made for the gene H<sub>3</sub>, which remains as an orphan.</p></div
Dotplot of PhyOP Orthologues Showing Conserved Synteny in the Dog and Human Chromosomes
<div><p>(A) Synteny between CFAX and HSAX.</p><p>(B) Synteny between CFA9 and HSA17.</p><p>Genes are plotted in consecutive gene order along each chromosome. The two X chromosomes are in a single conserved syntenic block. However, known human-specific paralogues of SSX, MAGE, opsin, and TEX28 families have been highlighted. The sequence containing the opsin and TEX28 families is highly polymorphic in the human population [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.0020133#pcbi-0020133-b065" target="_blank">65</a>]. The human X chromosome genome sequence contains two copies of the green-cone photoreceptor pigment gene in the opsin family interdigitated with three full-length copies of TEX 28. The plot of orthologous gene positions between CFA9 and HSA17 recapitulates known syntenic rearrangements in the human lineage [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.0020133#pcbi-0020133-b006" target="_blank">6</a>].</p></div
Oxford Grid of PhyOP Orthologues Showing Dog–Human Genomic Synteny
<p>Genes are plotted in consecutive gene order along the dog chromosomes CFA 1–38 and CFA X, and along the human chromosomes HSA 1–22 and HSA X. One-to-one, one-to-many, many-to-one, and many-to-many dog-to-human orthologues are displayed as red, green, blue, and black dots, respectively. Diagonal lines represent genomic segments with conserved synteny.</p
Distinct Dog Genes from Ensembl that Have Been Mispredicted as a Single Merged Chimera
<div><p>(A) Ten predicted transcripts for a single Ensembl dog gene (ENSCAFG00000017952) on CFA 6. PhyOP orthology predictions suggest that only transcripts 1–4 highlighted in red are correct, and that these represent four distinct nonoverlapping dog in-paralogues (shaded in grey). Resolution of the transcript phylogeny strongly indicates that this one predicted gene is instead a composite of four true paralogous genes (in red; ENSCAFT00000028541, ENSCAFT00000028547, ENSCAFT00000028555, and ENSCAFT00000028561) and one pseudogene. At least five of the transcripts are chimeric constructs of exons from separate genes. In each and every case we examined, putative merged genes were the result of chimeric predicted transcripts sampling different combinations of exons from adjacent true paralogues.</p><p>(B) The corresponding genomic region on CFA 6 with the distinct genes and their transcriptional orientations indicated by the black pentagons. Below this is the orthologous genomic region from HSA 16 showing five human orthologues (numbered 1–5: ENSG00000005187, ENSG000000166743, ENSG000000166747, ENSG000000066813, and ENSG000000183549). The orthology predictions are indicated with solid black lines. Thus, the dog orthologue for transcript 3 (gene 4) has acquired an extra tandem duplicate (gene 3). Only fragmentary exons on dog CFA6, corresponding to a pseudogene (marked with a cross), can be found for human gene 2, which, therefore, is assigned as an orphan. The human orthologue for the dog gene for transcript 2 unusually appears to have been translocated to HSA 12, as corroborated by BLASTZ [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.0020133#pcbi-0020133-b064" target="_blank">64</a>] genome alignments. Apart from this, gene order and strand have been conserved among orthologues of both lineages, including those for an unrelated orthologue pair (hollow triangles) in the middle of the paralogue cluster (ENSCAFG00000017985 and ENSG000000066654).</p></div
Overview of the PhyOP Orthology Prediction Process
<div><p>(A) Creation of transcript-based phylogenies. An all-versus-all BLASTP search is run for all proteins from two species (step 1) with an <i>E</i> value upper threshold of 10<sup>−5</sup> and an alignment length threshold of 50 residues. Proteins pairs are linked together in initial clusters (step 2) if the alignment covers >60% of the residues of both sequences. Any remaining proteins are linked to the initial clusters if they align to >50% of the residues of either sequence (step 3). <i>d<sub>S</sub></i> values are calculated from the pairwise alignments (step 4), and unsaturated transcript pairs (<i>d<sub>S</sub></i> < 5.0) grouped first by single linkage and then hierarchically clustered using UPGMA (step 5). Phylogenies are created from cluster branches corresponding to <i>d<sub>S</sub></i> < 2.5 by applying a modified version of the Fitch-Margoliash criterion (step 6).</p><p>(B) Prediction of orthology from transcript phylogenies. Transcripts outside of clades of orthologous transcripts are discarded (step 7), and merged genes within orthologous clades are separated (step 8). Transcript clades were separated into three groups: unambiguous clades (step 9) containing genes with no other remaining splice variant; consistent sets of clades (step 10) with identical gene complements; and inconsistent clades (step 11) with different gene orthology relationships suggested by different sets of orthologous transcripts. The inconsistencies are resolved by separating merged genes and choosing transcripts with the lowest <i>d<sub>S</sub></i> to its orthologous transcripts (step 12). Candidate pseudogenes are then discarded to give the final set of orthologous and paralogous genes (step 13).</p></div