132,583 research outputs found
Data Mining for Simple Sequence Repeats in Oil Palm Expressed Sequence Tags
Expressed Sequence Tags or ESTs are small pieces of DNA sequence that are generated by sequencing either one or both ends of an expressed gene. ESTs provide researchers with a quick and inexpensive route for discovering new genes, for obtaining data on gene expression and regulation, and for constructing genome maps. Oil palm EST sequences as available in public domain are downloaded. They were grouped and made contigs using CAP3 and Phrap. Microsatellite repeats are located using 5 softwares (MISA, TRA, TROLL, SSRIT, SSR primer). Among the 5 methods MISA is found to be the best. It can elucidate the compound repeat also. Frequency and total number (202) of SSR were detected. Mononucleotide repeat is more abundant especially ‘A/T’ repeats in Oil palm. Flanking primers were designed using primer3, SSR primers. The results of the study are given as an online database ‘MEMCO’ to help Oil palm researchers
Generation and analysis of expressed sequence tags from Botrytis cinerea
http://www.scielo.cl/scielo.php?script=sci_arttext&pid=S0716-97602006000200018&lng=es&nrm=isoBotrytis cinerea is a filamentous plant pathogen of a wide range of plant species, and its infection may cause enormous damage both during plant growth and in the post-harvest phase. We have constructed a cDNA library from an isolate of B. cinerea and have sequenced 11,482 expressed sequence tags that were assembled into 1,003 contigs sequences and 3,032 singletons. Approximately 81% of the unigenes showed significant similarity to genes coding for proteins with known functions: more than 50% of the sequences code for genes involved in cellular metabolism, 12% for transport of metabolites, and approximately 10% for cellular organization. Other functional categories include responses to biotic and abiotic stimuli, cell communication, cell homeostasis, and cell development. We carried out pair-wise comparisons with fungal databases to determine the B. cinerea unisequence set with relevant similarity to genes in other fungal pathogenic counterparts. Among the 4,035 non-redundant B. cinerea unigenes, 1,338 (23%) have significant homology with Fusarium verticillioides unigenes. Similar values were obtained for Saccharomyces cerevisiae and Aspergillus nidulans (22% and 24%, respectively). The lower percentages of homology were with Magnaporthe grisae and Neurospora crassa (13% and 19%, respectively). Several genes involved in putative and known fungal virulence and general pathogenicity were identified. The results provide important information for future research on this fungal pathogen
Single nucleotide polymorphisms from Theobroma cacao expressed sequence tags associated with witches' broom disease in cacao
In order to increase the efficiency of cacao tree resistance to witches¿ broom disease, which is caused by Moniliophthora perniciosa (Tricholomataceae), we looked for molecular markers that could help in the selection of resistant cacao genotypes. Among the different markers useful for developing marker-assisted selection, single nucleotide polymorphisms (SNPs) constitute the most common type of sequence difference between alleles and can be easily detected by in silico analysis from expressed sequence tag libraries. We report the first detection and analysis of SNPs from cacao-M. perniciosa interaction expressed sequence tags, using bioinformatics. Selection based on analysis of these SNPs should be useful for developing cacao varieties resistant to this devastating disease. (Résumé d'auteur
EST analysis of gene expression in early cleavage-stage sea urchin embryos
A set of 956 expressed sequence tags derived from 7-hour (mid-cleavage) sea urchin embryos was analyzed to assess biosynthetic functions and to illuminate the structure of the message population at this stage. About a quarter of the expressed sequence tags represented repetitive sequence transcripts typical of early embryos, or ribosomal and mitochondrial RNAs, while a majority of the remainder contained significant open reading frames. A total of 232 sequences, including 153 different proteins, produced significant matches when compared against GenBank. The majority of these identified sequences represented ‘housekeeping’ proteins, i.e., cytoskeletal proteins, metabolic enzymes, transporters and proteins involved in cell division. The most interesting finds were components of signaling systems and transcription factors not previously reported in early sea urchin embryos, including components of Notch and TGF signal transduction pathways. As expected from earlier kinetic analyses of the embryo mRNA populations, no very prevalent protein-coding species were encountered; the most highly represented such sequences were cDNAs encoding cyclins A and B. The frequency of occurrence of all sequences within the database was used to construct a sequence prevalence distribution. The result, confirming earlier mRNA population analyses, indicated that the poly(A) RNA of the early embryo consists mainly of a very complex set of low-copy-number transcripts
Characterization of Expressed Sequence Tags (ESTs) from Stylophora pistillata
Coral reefs are the most productive marine ecosystems with highest species
diversity. During the last decades, reefs have been facing multiples global and
anthropogenic stressors leading to bleaching and death of entire reefs
Recommended from our members
Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius
Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ~40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism
Identification of a Novel 81-kDa Component of the Xenopus Origin Recognition Complex
The Xenopus origin recognition complex is essential for chromosomal DNA replication in cell-free extracts. We have immunopurified the Xenopus origin recognition complex with anti-Xorc2 antibodies and analyzed its composition and properties. Xorc2 (p63) is specifically associated with Xorc1 (p115) and up to four additional polypeptides (p81, p78, p45, and p40). The cDNA encoding p81 is highly homologous to various expressed sequence tags from humans and mice encoding a protein of previously unknown function. Immunodepletion of p81 from Xenopus egg extracts, which also results in the removal of Xorc2, completely abolishes chromosomal DNA replication. Thus, p81 appears to play a crucial role at S phase in higher eukaryotes
Expressed sequence tags from the oomycete fish pathogen Saprolegnia parasitica reveal putative virulence factors
Peer reviewedPublisher PD
Simple sequence repeats in zebra finch (Taeniopygia guttata) expressed sequence tags: a new resource for evolutionary genetic studies of passerines
Background
Passerines (perching birds) are widely studied across many biological disciplines including ecology, population biology, neurobiology, behavioural ecology and evolutionary biology. However, understanding the molecular basis of relevant traits is hampered by the paucity of passerine genomics tools. Efforts to address this problem are underway, and the zebra finch (Taeniopygia guttata) will be the first passerine to have its genome sequenced. Here we describe a bioinformatic analysis of zebra finch expressed sequence tag (EST) Genbank entries.
Results
A total of 48,862 ESTs were downloaded from GenBank and assembled into contigs, representing an estimated 17,404 unique sequences. The unique sequence set contained 638 simple sequence repeats (SSRs) or microsatellites of length ≥20 bp and purity ≥90% and 144 simple sequence repeats of length ≥30 bp. A chromosomal location for the majority of SSRs was predicted by BLASTing against assembly 2.1 of the chicken genome sequence. The relative exonic location (5' untranslated region, coding region or 3' untranslated region) was predicted for 218 of the SSRs, by BLAST search against the ENSEMBL chicken peptide database. Ten loci were examined for polymorphism in two zebra finch populations and two populations of a distantly related passerine, the house sparrow Passer domesticus. Linkage was confirmed for four loci that were predicted to reside on the passerine homologue of chicken chromosome 7.
Conclusion
We show that SSRs are abundant within zebra finch ESTs, and that their genomic location can be predicted from sequence similarity with the assembled chicken genome sequence. We demonstrate that a useful proportion of zebra finch EST-SSRs are likely to be polymorphic, and that they can be used to build a linkage map. Finally, we show that many zebra finch EST-SSRs are likely to be useful in evolutionary genetic studies of other passerines
A compression mechanism for sequence databases to improve the efficiency of conventional tools
This paper describes a method to compress molecular biology databases that are characterized by an increasing proportion of data derived from genome projects. The performance of our tool has been tested on various data files of the EMBL nucleotide sequence database. The best compression ratios were achieved on EST (Expressed Sequence Tags) data, typically derived from large-scale sequence projects. The compression of sequence database updates was tested in combination with the common Unix compression program ‘compress'. Our tool improved the efficiency of ‘compress' on average by 16
- …
