32 research outputs found

    Non-excitable fluorescent protein orthologs found in ctenophores

    Get PDF
    Background: Fluorescent proteins are optically active proteins found across many clades in metazoans. A fluorescent protein was recently identified in a ctenophore, but this has been suggested to derive from a cnidarian, raising again the question of origins of this group of proteins. Results: Through analysis of transcriptome data from 30 ctenophores, we identified a member of an orthologous group of proteins similar to fluorescent proteins in each of them, as well as in the genome of Mnemiopsis leidyi. These orthologs lack canonical residues involved in chromophore formation, suggesting another function. Conclusions: The phylogenetic position of the ctenophore protein family among fluorescent proteins suggests that this gene was present in the common ancestor of all ctenophores and that the fluorescent protein previously found in a ctenophore actually derives from a siphonophore

    A comparison across non-model animals suggests an optimal sequencing depth for de novo transcriptome assembly

    Get PDF
    Background: The lack of genomic resources can present challenges for studies of non-model organisms. Transcriptome sequencing offers an attractive method to gather information about genes and gene expression without the need for a reference genome. However, it is unclear what sequencing depth is adequate to assemble the transcriptome de novo for these purposes. Results: We assembled transcriptomes of animals from six different phyla (Annelids, Arthropods, Chordates, Cnidarians, Ctenophores, and Molluscs) at regular increments of reads using Velvet/Oases and Trinity to determine how read count affects the assembly. This included an assembly of mouse heart reads because we could compare those against the reference genome that is available. We found qualitative differences in the assemblies of whole-animals versus tissues. With increasing reads, whole-animal assemblies show rapid increase of transcripts and discovery of conserved genes, while single-tissue assemblies show a slower discovery of conserved genes though the assembled transcripts were often longer. A deeper examination of the mouse assemblies shows that with more reads, assembly errors become more frequent but such errors can be mitigated with more stringent assembly parameters. Conclusions: These assembly trends suggest that representative assemblies are generated with as few as 20 million reads for tissue samples and 30 million reads for whole-animals for RNA-level coverage. These depths provide a good balance between coverage and noise. Beyond 60 million reads, the discovery of new genes is low and sequencing errors of highly-expressed genes are likely to accumulate. Finally, siphonophores (polymorphic Cnidarians) are an exception and possibly require alternate assembly strategies

    Conserved novel ORFs in the mitochondrial genome of the ctenophore Beroe forskalii

    Get PDF
    To date, five ctenophore species’ mitochondrial genomes have been sequenced, and each contains open reading frames (ORFs) that if translated have no identifiable orthologs. ORFs with no identifiable orthologs are called unidentified reading frames (URFs). If truly protein-coding, ctenophore mitochondrial URFs represent a little understood path in early-diverging metazoan mitochondrial evolution and metabolism. We sequenced and annotated the mitochondrial genomes of three individuals of the beroid ctenophore Beroe forskalii and found that in addition to sharing the same canonical mitochondrial genes as other ctenophores, the B. forskalii mitochondrial genome contains two URFs. These URFs are conserved among the three individuals but not found in other sequenced species. We developed computational tools called pauvre and cuttlery to determine the likelihood that URFs are protein coding. There is evidence that the two URFs are under negative selection, and a novel Bayesian hypothesis test of trinucleotide frequency shows that the URFs are more similar to known coding genes than noncoding intergenic sequence. Protein structure and function prediction of all ctenophore URFs suggests that they all code for transmembrane transport proteins. These findings, along with the presence of URFs in other sequenced ctenophore mitochondrial genomes, suggest that ctenophores may have uncharacterized transmembrane proteins present in their mitochondria

    Symplectin evolved from multiple duplications in bioluminescent squid

    Get PDF
    The squid Sthenoteuthis oualaniensis, formerly Symplectoteuthis oualaniensis, generates light using the luciferin coelenterazine and a unique enzyme, symplectin. Genetic information is limited for bioluminescent cephalopod species, so many proteins, including symplectin, occur in public databases only as sequence isolates with few identifiable homologs. As the distribution of the symplectin/pantetheinase protein family in Metazoa remains mostly unexplored, we have sequenced the transcriptomes of four additional luminous squid, and make use of publicly available but unanalyzed data of other cephalopods, to examine the occurrence and evolution of this protein family. While the majority of spiralians have one or two copies of this protein family, four well-supported groups of proteins are found in cephalopods, one of which corresponds to symplectin. A cysteine that is critical for symplectin functioning is conserved across essentially all members of the protein family, even those unlikely to be used for bioluminescence. Conversely, active site residues involved in pantetheinase catalysis are also conserved across essentially all of these proteins, suggesting that symplectin may have multiple functions including hydrolase activity, and that the evolution of the luminous phenotype required other changes in the protein outside of the main binding pocket

    Occurrence of Isopenicillin-N-Synthase Homologs in Bioluminescent Ctenophores and Implications for Coelenterazine Biosynthesis

    No full text
    <div><p>The biosynthesis of the luciferin coelenterazine has remained a mystery for decades. While not all organisms that use coelenterazine appear to make it themselves, it is thought that ctenophores are a likely producer. Here we analyze the transcriptome data of 24 species of ctenophores, two of which have published genomes. The natural precursors of coelenterazine have been shown to be the amino acids L-tyrosine and L-phenylalanine, with the most likely biosynthetic pathway involving cyclization and further modification of the tripeptide Phe-Tyr-Tyr (“FYY”). Therefore, we searched the ctenophore transcriptome data for genes with the short peptide “FYY” as part of their coding sequence. We recovered a group of candidate genes for coelenterazine biosynthesis in the luminous species which encode a set of highly conserved non-heme iron oxidases similar to isopenicillin-N-synthase. These genes were absent in the transcriptomes and genome of the two non-luminous species. Pairwise identities and substitution rates reveal an unusually high degree of identity even between the most unrelated species. Additionally, two related groups of non-heme iron oxidases were found across all ctenophores, including those which are non-luminous, arguing against the involvement of these two gene groups in luminescence. Important residues for iron-binding are conserved across all proteins in the three groups, suggesting this function is still present. Given the known functions of other members of this protein superfamily are involved in heterocycle formation, we consider these genes to be top candidates for laboratory characterization or gene knockouts in the investigation of coelenterazine biosynthesis.</p></div
    corecore