171 research outputs found

    PSP_MCSVM: brainstorming consensus prediction of protein secondary structures using two-stage multiclass support vector machines

    Get PDF
    Secondary structure prediction is a crucial task for understanding the variety of protein structures and performed biological functions. Prediction of secondary structures for new proteins using their amino acid sequences is of fundamental importance in bioinformatics. We propose a novel technique to predict protein secondary structures based on position-specific scoring matrices (PSSMs) and physico-chemical properties of amino acids. It is a two stage approach involving multiclass support vector machines (SVMs) as classifiers for three different structural conformations, viz., helix, sheet and coil. In the first stage, PSSMs obtained from PSI-BLAST and five specially selected physicochemical properties of amino acids are fed into SVMs as features for sequence-to-structure prediction. Confidence values for forming helix, sheet and coil that are obtained from the first stage SVM are then used in the second stage SVM for performing structure-to-structure prediction. The two-stage cascaded classifiers (PSP_MCSVM) are trained with proteins from RS126 dataset. The classifiers are finally tested on target proteins of critical assessment of protein structure prediction experiment-9 (CASP9). PSP_MCSVM with brainstorming consensus procedure performs better than the prediction servers like Predator, DSC, SIMPA96, for randomly selected proteins from CASP9 targets. The overall performance is found to be comparable with the current state-of-the art. PSP_MCSVM source code, train-test datasets and supplementary files are available freely in public domain at: http://sysbio.icm.edu.pl/secstruct and http://code.google.com/p/cmater-bioinfo

    Identification and In Vivo Characterization of NvFP-7R, a Developmentally Regulated Red Fluorescent Protein of Nematostella vectensis

    Get PDF
    In recent years, the sea anemone Nematostella vectensis has emerged as a critical model organism for comparative genomics and developmental biology. Although Nematostella is a member of the anthozoan cnidarians (known for producing an abundance of diverse fluorescent proteins (FPs)), endogenous patterns of Nematostella fluorescence have not been described and putative FPs encoded by the genome have not been characterized.We described the spatiotemporal expression of endogenous red fluorescence during Nematostella development. Spatially, there are two patterns of red fluorescence, both restricted to the oral endoderm in developing polyps. One pattern is found in long fluorescent domains associated with the eight mesenteries and the other is found in short fluorescent domains situated between tentacles. Temporally, the long domains appear simultaneously at the 12-tentacle stage. In contrast, the short domains arise progressively between the 12- and 16-tentacle stage. To determine the source of the red fluorescence, we used bioinformatic approaches to identify all possible putative Nematostella FPs and a Drosophila S2 cell culture assay to validate NvFP-7R, a novel red fluorescent protein. We report that both the mRNA expression pattern and spectral signature of purified NvFP-7R closely match that of the endogenous red fluorescence. Strikingly, the red fluorescent pattern of NvFP-7R exhibits asymmetric expression along the directive axis, indicating that the nvfp-7r locus senses the positional information of the body plan. At the tissue level, NvFP-7R exhibits an unexpected subcellular localization and a complex complementary expression pattern in apposed epithelia sheets comprising each endodermal mesentery.These experiments not only identify NvFP-7R as a novel red fluorescent protein that could be employed as a research tool; they also uncover an unexpected spatio-temporal complexity of gene expression in an adult cnidarian. Perhaps most importantly, our results define Nematostella as a new model organism for understanding the biological function of fluorescent proteins in vivo

    Using ESTs to improve the accuracy of de novo gene prediction

    Get PDF
    BACKGROUND: ESTs are a tremendous resource for determining the exon-intron structures of genes, but even extensive EST sequencing tends to leave many exons and genes untouched. Gene prediction systems based exclusively on EST alignments miss these exons and genes, leading to poor sensitivity. De novo gene prediction systems, which ignore ESTs in favor of genomic sequence, can predict such "untouched" exons, but they are less accurate when predicting exons to which ESTs align. TWINSCAN is the most accurate de novo gene finder available for nematodes and N-SCAN is the most accurate for mammals, as measured by exact CDS gene prediction and exact exon prediction. RESULTS: TWINSCAN_EST is a new system that successfully combines EST alignments with TWINSCAN. On the whole C. elegans genome TWINSCAN_EST shows 14% improvement in sensitivity and 13% in specificity in predicting exact gene structures compared to TWINSCAN without EST alignments. Not only are the structures revealed by EST alignments predicted correctly, but these also constrain the predictions without alignments, improving their accuracy. For the human genome, we used the same approach with N-SCAN, creating N-SCAN_EST. On the whole genome, N-SCAN_EST produced a 6% improvement in sensitivity and 1% in specificity of exact gene structure predictions compared to N-SCAN. CONCLUSION: TWINSCAN_EST and N-SCAN_EST are more accurate than TWINSCAN and N-SCAN, while retaining their ability to discover novel genes to which no ESTs align. Thus, we recommend using the EST versions of these programs to annotate any genome for which EST information is available. TWINSCAN_EST and N-SCAN_EST are part of the TWINSCAN open source software package

    Cross-species protein sequence and gene structure prediction with fine-tuned Webscipio 2.0 and Scipio

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Obtaining transcripts of homologs of closely related organisms and retrieving the reconstructed exon-intron patterns of the genes is a very important process during the analysis of the evolution of a protein family and the comparative analysis of the exon-intron structure of a certain gene from different species. Due to the ever-increasing speed of genome sequencing, the gap to genome annotation is growing. Thus, tools for the correct prediction and reconstruction of genes in related organisms become more and more important. The tool Scipio, which can also be used via the graphical interface WebScipio, performs significant hit processing of the output of the Blat program to account for sequencing errors, missing sequence, and fragmented genome assemblies. However, Scipio has so far been limited to high sequence similarity and unable to reconstruct short exons.</p> <p>Results</p> <p>Scipio and WebScipio have fundamentally been extended to better reconstruct very short exons and intron splice sites and to be better suited for cross-species gene structure predictions. The Needleman-Wunsch algorithm has been implemented for the search for short parts of the query sequence that were not recognized by Blat. Those regions might either be short exons, divergent sequence at intron splice sites, or very divergent exons. We have shown the benefit and use of new parameters with several protein examples from completely different protein families in searches against species from several kingdoms of the eukaryotes. The performance of the new Scipio version has been tested in comparison with several similar tools.</p> <p>Conclusions</p> <p>With the new version of Scipio very short exons, terminal and internal, of even just one amino acid can correctly be reconstructed. Scipio is also able to correctly predict almost all genes in cross-species searches even if the ancestors of the species separated more than 100 Myr ago and if the protein sequence identity is below 80%. For our test cases Scipio outperforms all other software tested. WebScipio has been restructured and provides easy access to the genome assemblies of about 640 eukaryotic species. Scipio and WebScipio are freely accessible at <url>http://www.webscipio.org</url>.</p

    TOPAZ1, a Novel Germ Cell-Specific Expressed Gene Conserved during Evolution across Vertebrates

    Get PDF
    BACKGROUND: We had previously reported that the Suppression Subtractive Hybridization (SSH) approach was relevant for the isolation of new mammalian genes involved in oogenesis and early follicle development. Some of these transcripts might be potential new oocyte and granulosa cell markers. We have now characterized one of them, named TOPAZ1 for the Testis and Ovary-specific PAZ domain gene. PRINCIPAL FINDINGS: Sheep and mouse TOPAZ1 mRNA have 4,803 bp and 4,962 bp open reading frames (20 exons), respectively, and encode putative TOPAZ1 proteins containing 1,600 and 1653 amino acids. They possess PAZ and CCCH domains. In sheep, TOPAZ1 mRNA is preferentially expressed in females during fetal life with a peak during prophase I of meiosis, and in males during adulthood. In the mouse, Topaz1 is a germ cell-specific gene. TOPAZ1 protein is highly conserved in vertebrates and specifically expressed in mouse and sheep gonads. It is localized in the cytoplasm of germ cells from the sheep fetal ovary and mouse adult testis. CONCLUSIONS: We have identified a novel PAZ-domain protein that is abundantly expressed in the gonads during germ cell meiosis. The expression pattern of TOPAZ1, and its high degree of conservation, suggests that it may play an important role in germ cell development. Further characterization of TOPAZ1 may elucidate the mechanisms involved in gametogenesis, and particularly in the RNA silencing process in the germ lin

    Patterns of Sequence Divergence and Evolution of the S1 Orthologous Regions between Asian and African Cultivated Rice Species

    Get PDF
    A strong postzygotic reproductive barrier separates the recently diverged Asian and African cultivated rice species, Oryza sativa and O. glaberrima. Recently a model of genetic incompatibilities between three adjacent loci: S1A, S1 and S1B (called together the S1 regions) interacting epistatically, was postulated to cause the allelic elimination of female gametes in interspecific hybrids. Two candidate factors for the S1 locus (including a putative F-box gene) were proposed, but candidates for S1A and S1B remained undetermined. Here, to better understand the basis of the evolution of regions involved in reproductive isolation, we studied the genic and structural changes accumulated in the S1 regions between orthologous sequences. First, we established an 813 kb genomic sequence in O. glaberrima, covering completely the S1A, S1 and the majority of the S1B regions, and compared it with the orthologous regions of O. sativa. An overall strong structural conservation was observed, with the exception of three isolated regions of disturbed collinearity: (1) a local invasion of transposable elements around a putative F-box gene within S1, (2) the multiple duplication and subsequent divergence of the same F-box gene within S1A, (3) an interspecific chromosomal inversion in S1B, which restricts recombination in our O. sativa×O. glaberrima crosses. Beside these few structural variations, a uniform conservative pattern of coding sequence divergence was found all along the S1 regions. Hence, the S1 regions have undergone no drastic variation in their recent divergence and evolution between O. sativa and O. glaberrima, suggesting that a small accumulation of genic changes, following a Bateson-Dobzhansky-Muller (BDM) model, might be involved in the establishment of the sterility barrier. In this context, genetic incompatibilities involving the duplicated F-box genes as putative candidates, and a possible strengthening step involving the chromosomal inversion might participate to the reproductive barrier between Asian and African rice species

    Histoplasma capsulatum proteome response to decreased iron availability

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A fundamental pathogenic feature of the fungus <it>Histoplasma capsulatum </it>is its ability to evade innate and adaptive immune defenses. Once ingested by macrophages the organism is faced with several hostile environmental conditions including iron limitation. <it>H. capsulatum </it>can establish a persistent state within the macrophage. A gap in knowledge exists because the identities and number of proteins regulated by the organism under host conditions has yet to be defined. Lack of such knowledge is an important problem because until these proteins are identified it is unlikely that they can be targeted as new and innovative treatment for histoplasmosis.</p> <p>Results</p> <p>To investigate the proteomic response by <it>H. capsulatum </it>to decreasing iron availability we have created <it>H. capsulatum </it>protein/genomic databases compatible with current mass spectrometric (MS) search engines. Databases were assembled from the <it>H. capsulatum </it>G217B strain genome using gene prediction programs and expressed sequence tag (EST) libraries. Searching these databases with MS data generated from two dimensional (2D) in-gel digestions of proteins resulted in over 50% more proteins identified compared to searching the publicly available fungal databases alone. Using 2D gel electrophoresis combined with statistical analysis we discovered 42 <it>H. capsulatum </it>proteins whose abundance was significantly modulated when iron concentrations were lowered. Altered proteins were identified by mass spectrometry and database searching to be involved in glycolysis, the tricarboxylic acid cycle, lysine metabolism, protein synthesis, and one protein sequence whose function was unknown.</p> <p>Conclusion</p> <p>We have created a bioinformatics platform for <it>H. capsulatum </it>and demonstrated the utility of a proteomic approach by identifying a shift in metabolism the organism utilizes to cope with the hostile conditions provided by the host. We have shown that enzyme transcripts regulated by other fungal pathogens in response to lowering iron availability are also regulated in <it>H. capsulatum </it>at the protein level. We also identified <it>H. capsulatum </it>proteins sensitive to iron level reductions which have yet to be connected to iron availability in other pathogens. These data also indicate the complexity of the response by <it>H. capsulatum </it>to nutritional deprivation. Finally, we demonstrate the importance of a strain specific gene/protein database for <it>H. capsulatum </it>proteomic analysis.</p
    corecore