202 research outputs found

    Identification of gene-oriented exon orthology between human and mouse

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene orthology has been well studied in the evolutionary area and is thought to be an important implication to functional genome annotations. As the accumulation of transcriptomic data, alternative splicing is taken into account in the assignments of gene orthologs and the orthology is suggested to be further considered at transcript level. Whether gene or transcript orthology, exons are the basic units that represent the whole gene structure; however, there is no any reported study on how to build exon level orthology in a whole genome scale. Therefore, it is essential to establish a gene-oriented exon orthology dataset.</p> <p>Results</p> <p>Using a customized pipeline, we first build exon orthologous relationships from assigned gene orthologs pairs in two well-annotated genomes: human and mouse. More than 92% of non-overlapping exons have at least one ortholog between human and mouse and only a small portion of them own more than one ortholog. The exons located in the coding region are more conserved in terms of finding their ortholog counterparts. Within the untranslated region, the 5' UTR seems to have more diversity than the 3' UTR according to exon orthology designations. Interestingly, most exons located in the coding region are also conserved in length but this conservation phenomenon dramatically drops down in untranslated regions. In addition, we allowed multiple assignments in exon orthologs and a subset of exons with possible fusion/split events were defined here after a thorough analysis procedure.</p> <p>Conclusions</p> <p>Identification of orthologs at the exon level is essential to provide a detailed way to interrogate gene orthology and splicing analysis. It could be used to extend the genome annotation as well. Besides examining the one-to-one orthologous relationship, we manage the one-to-multi exon pairs to represent complicated exon generation behavior. Our results can be further applied in many research fields studying intron-exon structure and alternative/constitutive exons in functional genomic areas.</p

    Testis-specific glyceraldehyde-3-phosphate dehydrogenase: origin and evolution

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Glyceraldehyde-3-phosphate dehydrogenase (GAPD) catalyses one of the glycolytic reactions and is also involved in a number of non-glycolytic processes, such as endocytosis, DNA excision repair, and induction of apoptosis. Mammals are known to possess two homologous GAPD isoenzymes: GAPD-1, a well-studied protein found in all somatic cells, and GAPD-2, which is expressed solely in testis. GAPD-2 supplies energy required for the movement of spermatozoa and is tightly bound to the sperm tail cytoskeleton by the additional N-terminal proline-rich domain absent in GAPD-1. In this study we investigate the evolutionary history of GAPD and gain some insights into specialization of GAPD-2 as a testis-specific protein.</p> <p>Results</p> <p>A dataset of GAPD sequences was assembled from public databases and used for phylogeny reconstruction by means of the Bayesian method. Since resolution in some clades of the obtained tree was too low, syntenic analysis was carried out to define the evolutionary history of GAPD more precisely. The performed selection tests showed that selective pressure varies across lineages and isoenzymes, as well as across different regions of the same sequences.</p> <p>Conclusions</p> <p>The obtained results suggest that GAPD-1 and GAPD-2 emerged after duplication during the early evolution of chordates. GAPD-2 was subsequently lost by most lineages except lizards, mammals, as well as cartilaginous and bony fishes. In reptilians and mammals, GAPD-2 specialized to a testis-specific protein and acquired the novel N-terminal proline-rich domain anchoring the protein in the sperm tail cytoskeleton. This domain is likely to have originated by exonization of a microsatellite genomic region. Recognition of the proline-rich domain by cytoskeletal proteins seems to be unspecific. Besides testis, GAPD-2 of lizards was also found in some regenerating tissues, but it lacks the proline-rich domain due to tissue-specific alternative splicing.</p

    Refining orthologue groups at the transcript level

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Orthologues are genes in different species that are related through divergent evolution from a common ancestor and are expected to have similar functions. Many databases have been created to describe orthologous genes based on existing sequence data. However, alternative splicing (in eukaryotes) is usually disregarded in the determination of orthologue groups and the functional consequences of alternative splicing have not been considered. Most multi-exon genes can encode multiple protein isoforms which often have different functions and can be disease-related. Extending the definition of orthologue groups to take account of alternate splicing and the functional differences it causes requires further examination.</p> <p>Results</p> <p>A subset of the orthologous gene groups between human and mouse was selected from the InParanoid database for this study. Each orthologue group was divided into sub-clusters, at the transcript level, using a method based on the sequence similarity of the isoforms. Transcript based sub-clusters were verified by functional signatures of the cluster members in the InterPro database. Functional similarity was higher within than between transcript-based sub-clusters of a defined orthologous group. In certain cases, cancer-related isoforms of a gene could be distinguished from other isoforms of the gene. Predictions of intrinsic disorder in protein regions were also correlated with the isoform sub-clusters within an orthologue group.</p> <p>Conclusions</p> <p>Sub-clustering of orthologue groups at the transcript level is an important step to more accurately define functionally equivalent orthologue groups. This work appears to be the first effort to refine orthologous groupings of genes based on the consequences of alternative splicing on function. Further investigation and refinement of the methodology to classify and verify isoform sub-clusters is needed, particularly to extend the technique to more distantly related species.</p

    Refining orthologue groups at the transcript level

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Orthologues are genes in different species that are related through divergent evolution from a common ancestor and are expected to have similar functions. Many databases have been created to describe orthologous genes based on existing sequence data. However, alternative splicing (in eukaryotes) is usually disregarded in the determination of orthologue groups and the functional consequences of alternative splicing have not been considered. Most multi-exon genes can encode multiple protein isoforms which often have different functions and can be disease-related. Extending the definition of orthologue groups to take account of alternate splicing and the functional differences it causes requires further examination.</p> <p>Results</p> <p>A subset of the orthologous gene groups between human and mouse was selected from the InParanoid database for this study. Each orthologue group was divided into sub-clusters, at the transcript level, using a method based on the sequence similarity of the isoforms. Transcript based sub-clusters were verified by functional signatures of the cluster members in the InterPro database. Functional similarity was higher within than between transcript-based sub-clusters of a defined orthologous group. In certain cases, cancer-related isoforms of a gene could be distinguished from other isoforms of the gene. Predictions of intrinsic disorder in protein regions were also correlated with the isoform sub-clusters within an orthologue group.</p> <p>Conclusions</p> <p>Sub-clustering of orthologue groups at the transcript level is an important step to more accurately define functionally equivalent orthologue groups. This work appears to be the first effort to refine orthologous groupings of genes based on the consequences of alternative splicing on function. Further investigation and refinement of the methodology to classify and verify isoform sub-clusters is needed, particularly to extend the technique to more distantly related species.</p

    Genic regions of a large salamander genome contain long introns and novel genes

    Get PDF
    BACKGROUND: The basis of genome size variation remains an outstanding question because DNA sequence data are lacking for organisms with large genomes. Sixteen BAC clones from the Mexican axolotl (Ambystoma mexicanum: c-value = 32 x 10(9) bp) were isolated and sequenced to characterize the structure of genic regions. RESULTS: Annotation of genes within BACs showed that axolotl introns are on average 10x longer than orthologous vertebrate introns and they are predicted to contain more functional elements, including miRNAs and snoRNAs. Loci were discovered within BACs for two novel EST transcripts that are differentially expressed during spinal cord regeneration and skin metamorphosis. Unexpectedly, a third novel gene was also discovered while manually annotating BACs. Analysis of human-axolotl protein-coding sequences suggests there are 2% more lineage specific genes in the axolotl genome than the human genome, but the great majority (86%) of genes between axolotl and human are predicted to be 1:1 orthologs. Considering that axolotl genes are on average 5x larger than human genes, the genic component of the salamander genome is estimated to be incredibly large, approximately 2.8 gigabases! CONCLUSION: This study shows that a large salamander genome has a correspondingly large genic component, primarily because genes have incredibly long introns. These intronic sequences may harbor novel coding and non-coding sequences that regulate biological processes that are unique to salamanders
    corecore