18 research outputs found
Screening non-coding RNAs in transcriptomes from neglected species using PORTRAIT: case study of the pathogenic fungus Paracoccidioides brasiliensis
<p>Abstract</p> <p>Background</p> <p>Transcriptome sequences provide a complement to structural genomic information and provide snapshots of an organism's transcriptional profile. Such sequences also represent an alternative method for characterizing neglected species that are not expected to undergo whole-genome sequencing. One difficulty for transcriptome sequencing of these organisms is the low quality of reads and incomplete coverage of transcripts, both of which compromise further bioinformatics analyses. Another complicating factor is the lack of known protein homologs, which frustrates searches against established protein databases. This lack of homologs may be caused by divergence from well-characterized and over-represented model organisms. Another explanation is that non-coding RNAs (ncRNAs) may be caught during sequencing. NcRNAs are RNA sequences that, unlike messenger RNAs, do not code for protein products and instead perform unique functions by folding into higher order structural conformations. There is ncRNA screening software available that is specific for transcriptome sequences, but their analyses are optimized for those transcriptomes that are well represented in protein databases, and also assume that input ESTs are full-length and high quality.</p> <p>Results</p> <p>We propose an algorithm called PORTRAIT, which is suitable for ncRNA analysis of transcriptomes from poorly characterized species. Sequences are translated by software that is resistant to sequencing errors, and the predicted putative proteins, along with their source transcripts, are evaluated for coding potential by a support vector machine (SVM). Either of two SVM models may be employed: if a putative protein is found, a protein-dependent SVM model is used; if it is not found, a protein-independent SVM model is used instead. Only <it>ab initio </it>features are extracted, so that no homology information is needed. We illustrate the use of PORTRAIT by predicting ncRNAs from the transcriptome of the pathogenic fungus <it>Paracoccidoides brasiliensis </it>and five other related fungi.</p> <p>Conclusion</p> <p>PORTRAIT can be integrated into pipelines, and provides a low computational cost solution for ncRNA detection in transcriptome sequencing projects.</p
Examples of sequence conservation analyses capture a subset of mouse long non-coding RNAs sharing homology with fish conserved genomic elements
Background: Long non-coding RNAs (lncRNA) are a major class of non-coding RNAs. They are involved in diverse intra-cellular mechanisms like molecular scaffolding, splicing and DNA methylation. Through these mechanisms they are reported to play a role in cellular differentiation and development. They show an enriched expression in the brain where they are implicated in maintaining cellular identity, homeostasis, stress responses and plasticity. Low sequence conservation and lack of functional annotations make it difficult to identify homologs of mammalian lncRNAs in other vertebrates. A computational evaluation of the lncRNAs through systematic conservation analyses of both sequences as well as their genomic architecture is required.Results: Our results show that a subset of mouse candidate lncRNAs could be distinguished from random sequences based on their alignment with zebrafish phastCons elements. Using ROC analyses we were able to define a measure to select significantly conserved lncRNAs. Indeed, starting from ~2,800 mouse lncRNAs we could predict that between 4 and 11% present conserved sequence fragments in fish genomes. Gene ontology (GO) enrichment analyses of protein coding genes, proximal to the region of conservation, in both organisms highlighted similar GO classes like regulation of transcription and central nervous system development. The proximal coding genes in both the species show enrichment of their expression in brain. In summary, we show that interesting genomic regions in zebrafish could be marked based on their sequence homology to a mouse lncRNA, overlap with ESTs and proximity to genes involved in nervous system development.Conclusions: Conservation at the sequence level can identify a subset of putative lncRNA orthologs. The similar protein-coding neighborhood and transcriptional information about the conserved candidates provide support to the hypothesis that they share functional homology. The pipeline herein presented represents a proof of principle showing that a portion between 4 and 11% of lncRNAs retains region of conservation between mammals and fishes. We believe this study will result useful as a reference to analyze the conservation of lncRNAs in newly sequenced genomes and transcriptomes. \uc2\ua9 2013 Basu et al.; licensee BioMed Central Ltd
De Novo assembly and transcriptome analysis of the mediterranean fruit fly ceratitis capitata early embryos
The agricultural pest Ceratitis capitata, also known as the Mediterranean fruit fly or Medfly, belongs to the Tephritidae family, which includes a large number of other damaging pest species. The Medfly has been the first non-drosophilid fly species which has been genetically transformed paving the way for designing geneticbased pest control strategies. Furthermore, it is an experimentally tractable model, in which transient and transgene-mediated RNAi have been successfully used. We applied Illumina sequencing to total RNA preparations of 8-10 hours old embryos of C. capitata, This developmental window corresponds to the blastoderm cellularization stage. In summary, we assembled 42,614 transcripts which cluster in 26,319 unique transcripts of which 11,045 correspond to protein coding genes; we identified several hundreds of long ncRNAs; we found an enrichment of transcripts encoding RNA binding proteins among the highly expressed transcripts, such as CcTRA-2, known to be necessary to establish and, most likely, to maintain female sex of C. capitata. Our study is the first de novo assembly performed for Ceratitis capitata based on Illumina NGS technology during embryogenesis and it adds novel data to the previously published C. capitata EST databases. We expect that it will be useful for a variety of applications such as gene cloning and phylogenetic analyses, as well as to advance genetic research and biotechnological applications in the Medfly and other related Tephritidae
Importância da região amino-terminal da proteína RolA de Agrobacterium rhizogenes em sua atividade biológica.
A proteina RolA e originaria de Agrobacterium rhizogenes, uma bacteria fitopatogenica que provoca a doenca conhecida como raiz em cabeleira (?ghairy root?h). RolA atua no processo de infeccao, sendo codificada no T-DNA do plasmideo Ri da agrobacteria. Embora a expressao de RolA em plantas implique em severas alteracoes morfologicas e fisiologicas, seu mecanismo de acao e desconhecido. A identificacao da regiao de RolA responsavel pela sua funcao e sua localizacao subcelular sao dados importantes na elucidacao de seu papel biologico. Com o objetivo de abordar esta questao, foram feitas fusoes traducionais da regiao codificadora de RolA completa e truncada com a regiao codificadora da enzima ?À-glucuronidase, sob regulacao do promotor CaMV35S. Plantas de fumo transformadas com essas fusoes traducionais revelam que a proteina quimerica contendo a RolA completa (RolA(100)::Gus) e capaz de induzir o fenotipo rolA caracteristico. As plantas expressando a fusao proteica onde apenas os 60 primeiros residuos de aminoacidos de RolA estao presentes (RolA(N-60)::Gus) apresentam o fenotipo rolA atenuado em relacao a proteina completa. O menor segmento de RolA capaz de induzir alguma alteracao morfologica e aquele que apresenta os 37 primeiros residuos de aminoacidos (RolA(N-37)::Gus) que, em algumas plantas de fumo, induz a um discreto enrugamento foliar. Quando apenas os 63 residuos de aminoacidos da regiao C-terminal de RolA estao presentes (RolA(C-63)::Gus), as plantas transgenicas nao apresentam qualquer alteracao morfologica. O fato da proteina RolA(N-60)::Gus induzir fenotipo rolA atenuado, poderia ser explicado pela ausencia dos 40 residuos de aminoacidos da regiao C-terminal, ou pelo menor acumulo da proteina em relacao a RolA(100)::Gus, ou ambos os fatos. A regiao N-terminal de RolA e extremamente conservada quando comparada a tres outras proteinas RolA provenientes de cepas diferentes de A. rhizogenes, apresentando estrutura secundaria em conformacao ?¿-helice, compativel com motivos transmembranicos. Todas as proteinas quimericas RolA::Gus apresentam a ?À-glucuronidase ativa, o que torna viaveis os ensaios citoquimicos visando sua localizacao subcelular. Estes resultados sugerem que a porcao N-terminal de RolA tem importante papel em sua funcao biologica.bitstream/CENARGEN/26827/1/cot142.pd