Search CORE

2 research outputs found

A semi-automatic methodology for localization of short mitochondrial genes in long sequences

Author: Ana L C Bazzan
José Humberto
Luciana Campos Paulino
Machado Tambor
Rafael Santos
Publication venue
Publication date: 06/03/2020
Field of study

Abstract. Identification of short genes in long sequences using similarity measures (e-values and scores in BLAST queries) can be difficult in mitochondrial genomes since the similarity results of some genes can be shadowed by neighbor matches with higher similarity values. The same could happen for genes with relatively low similarity but which can be considered of interest in a particular study. In order to locate and identify those genes, a manual analysis of the similarity search results must be done, which can be time-consuming and error-prone. In this report we present a methodology which aids researchers on the location of those genes by semi-automatically masking subsequences corresponding to genes that were already identificated and limiting subsequent searches to the regions that did not present any result in previous steps. A tool that implements this methodology was created and used in some database searches using a sequence obtained from a mitochondrial genome. We expected that analysis using this tool would be easier if not faster than the manual analysis. Some results of the use of the tool are presented and compared with results obtained by manual similarity searching of BLAST results. As expected, the proposed tool didn't present new results (i.e. different from the ones found in the manual analysis), since both rely on the same search mechanism, input and parameters, but the results were clearer in the sense of not being cluttered with similar results, and the shorter genes could be located more easily on the final similarity report. Some comments on the classification of this tool as a software agent are also shown. Suggestions for improvements of the methodology and tool will also be presented. 7

CiteSeerX

Analysis and Functional Annotation of an Expressed Sequence Tag Collection for Tropical Crop Sugarcane

Author: Angella Aline
Arruda Paulo
Bacci Maurício
Braga Marilia D.V.
Camargo Luis E.A.
Cara Frank A.A.
Carraro Dirce M.
Carrer Helaine
Colombo Carlos A.
Coscrato Virginia E.
Coutinho Luiz L.
da Silva Aline M.
da Silva Felipe R.
de Araujo Paula G.
de Oliveira Regina C.
de Rosa Vicente E.
Di Mauro Sonia M.Z.
Ferro Jesus A.
Ferro Maria Inês T.
França Suzelei C.
Giglioti Éder A.
Goldman Gustavo H.
Goldman Maria Helena S.
Gomes Suely L.
Grivet Laurent
Henrique-Silva Flavio
Kemper Edson L.
Kuramae Eiko E.
Lemos Eliana G.M.
Lemos Manoel V.F.
Lima Marleide M.A.
Machado Marcos A.
Marini Danyelle C.
Marino Celso L.
Martins Vanderlei G.
Meidanis João
Menck Carlos F.M.
Monteiro-Vitorello Claudia B.
Nobrega Francisco G.
Nobrega Marina P.
Nunes Luiz R.
Pedrosa Guilherme
Roberto Patrícia G.
Rossi Magdalena
Santelli Roberto V.
Sculaccio Susana A.
Silveira Henrique C.S.
Siqueira Walter J.
Siviero Fábio
Souza Glaucia M.
Tambor José H.M.
Targon Maria L.P.N.
Telles Guilherme P.
Thiemann Otavio H.
Truffi Daniela
Van Sluys Marie-Anne
Vettore André L.
Publication venue: Cold Spring Harbor Laboratory Press
Publication date: 01/12/2003
Field of study

To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

PubMed Central

Repositorio da Producao Cientifica e Intelectual da Unicamp