2,101 research outputs found

    XenDB: Full length cDNA prediction and cross species mapping in Xenopus laevis

    Get PDF
    BACKGROUND: Research using the model system Xenopus laevis has provided critical insights into the mechanisms of early vertebrate development and cell biology. Large scale sequencing efforts have provided an increasingly important resource for researchers. To provide full advantage of the available sequence, we have analyzed 350,468 Xenopus laevis Expressed Sequence Tags (ESTs) both to identify full length protein encoding sequences and to develop a unique database system to support comparative approaches between X. laevis and other model systems. DESCRIPTION: Using a suffix array based clustering approach, we have identified 25,971 clusters and 40,877 singleton sequences. Generation of a consensus sequence for each cluster resulted in 31,353 tentative contig and 4,801 singleton sequences. Using both BLASTX and FASTY comparison to five model organisms and the NR protein database, more than 15,000 sequences are predicted to encode full length proteins and these have been matched to publicly available IMAGE clones when available. Each sequence has been compared to the KOG database and ~67% of the sequences have been assigned a putative functional category. Based on sequence homology to mouse and human, putative GO annotations have been determined. CONCLUSION: The results of the analysis have been stored in a publicly available database XenDB . A unique capability of the database is the ability to batch upload cross species queries to identify potential Xenopus homologues and their associated full length clones. Examples are provided including mapping of microarray results and application of 'in silico' analysis. The ability to quickly translate the results of various species into 'Xenopus-centric' information should greatly enhance comparative embryological approaches. Supplementary material can be found at

    GeneHopper: a web-based search engine to link gene-expression platforms through GenBank accession numbers

    Get PDF
    Global gene-expression analysis is carried out using different technologies that are either array- or sequence-tag-based. To compare experiments that are performed on these different platforms, array probes and sequence tags need to be linked. An additional challenge is cross-referencing between species, to compare human profiles with those obtained in a mouse model, for example. We have developed the web-based search engine GeneHopper to link different expression resources based on UniGene clusters and HomoloGene orthologs databases of the National Center for Biotechnology Information (NCBI)

    Generation and analysis of large-scale expressed sequence tags (ESTs) from a full-length enriched cDNA library of porcine backfat tissue

    Get PDF
    BACKGROUND: Genome research in farm animals will expand our basic knowledge of the genetic control of complex traits, and the results will be applied in the livestock industry to improve meat quality and productivity, as well as to reduce the incidence of disease. A combination of quantitative trait locus mapping and microarray analysis is a useful approach to reduce the overall effort needed to identify genes associated with quantitative traits of interest. RESULTS: We constructed a full-length enriched cDNA library from porcine backfat tissue. The estimated average size of the cDNA inserts was 1.7 kb, and the cDNA fullness ratio was 70%. In total, we deposited 16,110 high-quality sequences in the dbEST division of GenBank (accession numbers: DT319652-DT335761). For all the expressed sequence tags (ESTs), approximately 10.9 Mb of porcine sequence were generated with an average length of 674 bp per EST (range: 200–952 bp). Clustering and assembly of these ESTs resulted in a total of 5,008 unique sequences with 1,776 contigs (35.46%) and 3,232 singleton (65.54%) ESTs. From a total of 5,008 unique sequences, 3,154 (62.98%) were similar to other sequences, and 1,854 (37.02%) were identified as having no hit or low identity (<95%) and 60% coverage in The Institute for Genomic Research (TIGR) gene index of Sus scrofa. Gene ontology (GO) annotation of unique sequences showed that approximately 31.7, 32.3, and 30.8% were assigned molecular function, biological process, and cellular component GO terms, respectively. A total of 1,854 putative novel transcripts resulted after comparison and filtering with the TIGR SsGI; these included a large percentage of singletons (80.64%) and a small proportion of contigs (13.36%). CONCLUSION: The sequence data generated in this study will provide valuable information for studying expression profiles using EST-based microarrays and assist in the condensation of current pig TCs into clusters representing longer stretches of cDNA sequences. The isolation of genes expressed in backfat tissue is the first step toward a better understanding of backfat tissue on a genomic basis

    XHM: A system for detection of potential cross hybridizations in DNA microarrays

    Get PDF
    BACKGROUND: Microarrays have emerged as the preferred platform for high throughput gene expression analysis. Cross-hybridization among genes with high sequence similarities can be a source of error reducing the reliability of DNA microarray results. RESULTS: We have developed a tool called XHM (cross hybridization on microarrays) for assessment of the reliability of hybridization signals by detecting potential cross-hybridizations on DNA microarrays. This is done by comparing the sequences of the probes against an extensive database representing the transcriptome of the organism in question. XHM is available online at . CONCLUSIONS: Using XHM with its user-adjustable parameters will enable scientists to check their lists of differentially expressed genes from microarray experiments for potential cross-hybridizations. This provides information that may be useful in the validation of the microarray results

    WormBase 2007

    Get PDF
    WormBase (www.wormbase.org) is the major publicly available database of information about Caenorhabditis elegans, an important system for basic biological and biomedical research. Derived from the initial ACeDB database of C. elegans genetic and sequence information, WormBase now includes the genomic, anatomical and functional information about C. elegans, other Caenorhabditis species and other nematodes. As such, it is a crucial resource not only for C. elegans biologists but the larger biomedical and bioinformatics communities. Coverage of core areas of C. elegans biology will allow the biomedical community to make full use of the results of intensive molecular genetic analysis and functional genomic studies of this organism. Improved search and display tools, wider cross-species comparisons and extended ontologies are some of the features that will help scientists extend their research and take advantage of other nematode species genome sequences

    Towards the understanding of the cocoa transcriptome: Production and analysis of an exhaustive dataset of ESTs of Theobroma cacao L. generated from various tissues and under various conditions

    Get PDF
    Theobroma cacao L., is a tree originated from the tropical rainforest of South America. It is one of the major cash crops for many tropical countries. T. cacao is mainly produced on smallholdings, providing resources for 14 million farmers. Disease resistance and T. cacao quality improvement are two important challenges for all actors of cocoa and chocolate production. T. cacao is seriously affected by pests and fungal diseases, responsible for more than 40% yield losses and quality improvement, nutritional and organoleptic, is also important for consumers. An international collaboration was formed to develop an EST genomic resource database for cacao. Fifty-six cDNA libraries were constructed from different organs, different genotypes and different environmental conditions. A total of 149,650 valid EST sequences were generated corresponding to 48,594 unigenes, 12,692 contigs and 35,902 singletons. A total of 29,849 unigenes shared significant homology with public sequences from other species. Gene Ontology (GO) annotation was applied to distribute the ESTs among the main GO categories. A specific information system (ESTtik) was constructed to process, store and manage this EST collection allowing the user to query a database. To check the representativeness of our EST collection, we looked for the genes known to be involved in two different metabolic pathways extensively studied in other plant species and important for T. cacao qualities: the flavonoid and the terpene pathways. Most of the enzymes described in other crops for these two metabolic pathways were found in our EST collection. A large collection of new genetic markers was provided by this ESTs collection. This EST collection displays a good representation of the T. cacao transcriptome, suitable for analysis of biochemical pathways based on oligonucleotide microarrays derived from these ESTs. It will provide numerous genetic markers that will allow the construction of a high density gene map of T. cacao. This EST collection represents a unique and important molecular resource for T. cacao study and improvement, facilitating the discovery of candidate genes for important T. cacao trait variation. (Résumé d'auteur

    Genome-wide analysis of splicing related genes and alternative splicing in plants

    Get PDF
    The phenomenon of pre-mRNA splicing in eukaryotes has been mostly studied in mammalian and yeast systems. The splicing machinery in plants is thought to be largely conserved relative to animal and fungal organisms. This thesis encompasses systematic studies of splicing-related genes and alternative splicing (AS) in plants. A total of 74 snRNA genes and 395 genes encoding splicing related proteins were identified in Arabidopsis, including the previously elusive U4atac snRNA gene. About 50% of the splicing related genes are duplicated in plants. The duplication ratios for splicing regulators are even higher, indicating that the splicing mechanism is generally conserved among plants, but that the regulation of splicing may be more variable and flexible.;Over 30% of the splicing related genes can be alternatively spliced. Overall, both Arabidopsis and rice have about 22% of the expressed genes being alternatively spliced, and both have about 55% AS events to be intron retention (IntronR). The consistent high frequency of IntronR suggests prevalence of splice site recognition by intron definition in plants. 40% of Arabidopsis AS genes are also alternatively spliced in rice, with some examples strongly suggesting a role of the AS event as an evolutionary conserved mechanism of post-transcriptional regulation.;U2AF is an essential splicing factor in animals. The two copies of Arabidopsis U2AF1 (AUSa and AUSb) were experimentally characterized as a case study. AUSa expressed at a higher level than AUSb in most tissues. Altered expression levels of AUSa or AUSb cause pleiotropic phenotypes and splicing pattern changes for some pre-mRNA, indicating the importance of AUSa/b for correct splice site recognition. A novel C-terminal domain (SERE) is highly conserved in all seed plant U2AF1 homologs, suggesting its important function specific to higher plants.;All together, similarities as well as differences were revealed between the splicing mechanisms in plants and mammalians, demonstrating that organisms have evolved special mechanisms to ensure the efficient and accurate splicing in different environments. Two databases (Arabidopsis Splicing Related Genes (ASRG), http://www.plantgdb.org/SRGD/ASRG/, and Alternative Splicing in Plants (ASIP), http://www.plantgdb.org/ASIP/) were constructed for the community to use and will facilitate studies of plant splicing mechanisms

    Derivation of species-specific hybridization-like knowledge out of cross-species hybridization results

    Get PDF
    BACKGROUND: One of the approaches for conducting genomics research in organisms without extant microarray platforms is to profile their expression patterns by using Cross-Species Hybridization (CSH). Several different studies using spotted microarray and CSH produced contradicting conclusions in the ability of CSH to reflect biological processes described by species-specific hybridization (SSH). RESULTS: We used a tomato-spotted cDNA microarray to examine the ability of CSH to reflect SSH data. Potato RNA was hybridized to spotted cDNA tomato and potato microarrays to generate CSH and SSH data, respectively. Difficulties arose in obtaining transcriptomic data from CSH that reflected those obtained from SSH. Nevertheless, once the data was filtered for those corresponding to matching probe sets, by restricting proper cutoffs of probe homology, the CSH transcriptome data showed improved reflection of those of the SSH. CONCLUSIONS: This study evaluated the relative performance of CSH compared to SSH, and proposes methods to ensure that CSH closely reflects the biological process analyzed by SSH

    OryzaPG-DB: Rice Proteome Database based on Shotgun Proteogenomics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Proteogenomics aims to utilize experimental proteome information for refinement of genome annotation. Since mass spectrometry-based shotgun proteomics approaches provide large-scale peptide sequencing data with high throughput, a data repository for shotgun proteogenomics would represent a valuable source of gene expression evidence at the translational level for genome re-annotation.</p> <p>Description</p> <p>Here, we present OryzaPG-DB, a rice proteome database based on shotgun proteogenomics, which incorporates the genomic features of experimental shotgun proteomics data. This version of the database was created from the results of 27 nanoLC-MS/MS runs on a hybrid ion trap-orbitrap mass spectrometer, which offers high accuracy for analyzing tryptic digests from undifferentiated cultured rice cells. Peptides were identified by searching the product ion spectra against the protein, cDNA, transcript and genome databases from Michigan State University, and were mapped to the rice genome. Approximately 3200 genes were covered by these peptides and 40 of them contained novel genomic features. Users can search, download or navigate the database per chromosome, gene, protein, cDNA or transcript and download the updated annotations in standard GFF3 format, with visualization in PNG format. In addition, the database scheme of OryzaPG was designed to be generic and can be reused to host similar proteogenomic information for other species. OryzaPG is the first proteogenomics-based database of the rice proteome, providing peptide-based expression profiles, together with the corresponding genomic origin, including the annotation of novelty for each peptide.</p> <p>Conclusions</p> <p>The OryzaPG database was constructed and is freely available at <url>http://oryzapg.iab.keio.ac.jp/</url>.</p

    Making sense of EST sequences by CLOBBing them

    Get PDF
    BACKGROUND: Expressed sequence tags (ESTs) are single pass reads from randomly selected cDNA clones. They provide a highly cost-effective method to access and identify expressed genes. However, they are often prone to sequencing errors and typically define incomplete transcripts. To increase the amount of information obtainable from ESTs and reduce sequencing errors, it is necessary to cluster ESTs into groups sharing significant sequence similarity. RESULTS: As part of our ongoing EST programs investigating 'orphan' genomes, we have developed a clustering algorithm, CLOBB (Cluster on the basis of BLAST similarity) to identify and cluster ESTs. CLOBB may be used incrementally, preserving original cluster designations. It tracks cluster-specific events such as merging, identifies 'superclusters' of related clusters and avoids the expansion of chimeric clusters. Based on the Perl scripting language, CLOBB is highly portable relying only on a local installation of NCBI's freely available BLAST executable and can be usefully applied to > 95 % of the current EST datasets. Analysis of the Danio rerio EST dataset demonstrates that CLOBB compares favourably with two less portable systems, UniGene and TIGR Gene Indices. CONCLUSIONS: CLOBB provides a highly portable EST clustering solution and is freely downloaded from: http://www.nematodes.org/CLOB
    corecore