162 research outputs found

    Cellular Function Prediction for Hypothetical Proteins Using High-Throughput Data

    Get PDF
    We have developed an integrated probabilistic prediction method, which combines the information from protein-protein interactions, protein complexes, microarray gene-expression profiles and functional annotations for known proteins. Our approach differs from the other approaches to use high-throughput data in a variety of ways. First, we utilize the GO biological process functional annotation in comparison to the MIPS classification followed by others. Second, we incorporate information from multiple sources of high-throughput data, including genetic interactions, to develop a better model for function prediction. By incorporating information from the multiple sources of high-throughput data, we identify the parameters important for protein function prediction. Third, we estimate the probability for the proteins to have a function of interest by designing a new statistical method for function prediction. Fourth, our approach assigns multiple functions to the hypothetical proteins and allows confidence assessment, based on the supportive evidences from the high-throughput data. Our work demonstrates the power of integrating multiple sources of high-throughput data with biological functional annotations, in the function prediction for unknown proteins. In addition to this, we have also developed a Web server for function prediction in yeast as well as other organisms. We have applied our method to the Saccharomyces cerevisiae proteome and are able to assign function to 1548 out of the 2472 unannotated proteins in yeast with our approach

    Quantitative assessment of relationship between sequence similarity and function similarity

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Comparative sequence analysis is considered as the first step towards annotating new proteins in genome annotation. However, sequence comparison may lead to creation and propagation of function assignment errors. Thus, it is important to perform a thorough analysis for the quality of sequence-based function assignment using large-scale data in a systematic way.</p> <p>Results</p> <p>We present an analysis of the relationship between sequence similarity and function similarity for the proteins in four model organisms, i.e., <it>Arabidopsis thaliana, Saccharomyces cerevisiae, Caenorrhabditis elegans</it>, and <it>Drosophila melanogaster</it>. Using a measure of functional similarity based on the three categories of Gene Ontology (GO) classifications (biological process, molecular function, and cellular component), we quantified the correlation between functional similarity and sequence similarity measured by sequence identity or statistical significance of the alignment and compared such a correlation against randomly chosen protein pairs.</p> <p>Conclusion</p> <p>Various sequence-function relationships were identified from BLAST versus PSI-BLAST, sequence identity versus Expectation Value, GO indices versus semantic similarity approaches, and within genome versus between genome comparisons, for the three GO categories. Our study provides a benchmark to estimate the confidence in assignment of functions purely based on sequence similarity.</p

    SNP discovery by high-throughput sequencing in soybean

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>With the advance of new massively parallel genotyping technologies, quantitative trait loci (QTL) fine mapping and map-based cloning become more achievable in identifying genes for important and complex traits. Development of high-density genetic markers in the QTL regions of specific mapping populations is essential for fine-mapping and map-based cloning of economically important genes. Single nucleotide polymorphisms (SNPs) are the most abundant form of genetic variation existing between any diverse genotypes that are usually used for QTL mapping studies. The massively parallel sequencing technologies (Roche GS/454, Illumina GA/Solexa, and ABI/SOLiD), have been widely applied to identify genome-wide sequence variations. However, it is still remains unclear whether sequence data at a low sequencing depth are enough to detect the variations existing in any QTL regions of interest in a crop genome, and how to prepare sequencing samples for a complex genome such as soybean. Therefore, with the aims of identifying SNP markers in a cost effective way for fine-mapping several QTL regions, and testing the validation rate of the putative SNPs predicted with Solexa short sequence reads at a low sequencing depth, we evaluated a pooled DNA fragment reduced representation library and SNP detection methods applied to short read sequences generated by Solexa high-throughput sequencing technology.</p> <p>Results</p> <p>A total of 39,022 putative SNPs were identified by the Illumina/Solexa sequencing system using a reduced representation DNA library of two parental lines of a mapping population. The validation rates of these putative SNPs predicted with low and high stringency were 72% and 85%, respectively. One hundred sixty four SNP markers resulted from the validation of putative SNPs and have been selectively chosen to target a known QTL, thereby increasing the marker density of the targeted region to one marker per 42 K bp.</p> <p>Conclusions</p> <p>We have demonstrated how to quickly identify large numbers of SNPs for fine mapping of QTL regions by applying massively parallel sequencing combined with genome complexity reduction techniques. This SNP discovery approach is more efficient for targeting multiple QTL regions in a same genetic population, which can be applied to other crops.</p

    Soybean transcription factor ORFeome associated with drought resistance: a valuable resource to accelerate research on abiotic stress resistance

    Get PDF
    Tissue/organ expression pattern of TF genes. The expression of soybean TF-ORFeome candidates in seven soybean organs including root, root tip, leaf, shoot apical meristem (SAM), nodule, flower and green pod were based on published RNA-Seq data [26]. The color scale indicates the degree of gene expression levels (yellow, low expression level; red, high expression level)

    Generation of Phaseolus vulgaris ESTs and investigation of their regulation upon Uromyces appendiculatus infection

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Phaseolus vulgaris </it>(common bean) is the second most important legume crop in the world after soybean. Consequently, yield losses due to fungal infection, like <it>Uromyces appendiculatus </it>(bean rust), have strong consequences. Several resistant genes were identified that confer resistance to bean rust infection. However, the downstream genes and mechanisms involved in bean resistance to infection are poorly characterized.</p> <p>Results</p> <p>A subtractive bean cDNA library composed of 10,581 unisequences was constructed and enriched in sequences regulated by either bean rust race 41, a virulent strain, or race 49, an avirulent strain on cultivar Early Gallatin carrying the resistance gene <it>Ur-4</it>. The construction of this library allowed the identification of 6,202 new bean ESTs, significantly adding to the available sequences for this plant. Regulation of selected bean genes in response to bean rust infection was confirmed by qRT-PCR. Plant gene expression was similar for both race 41 and 49 during the first 48 hours of the infection process but varied significantly at the later time points (72–96 hours after inoculation) mainly due to the presence of the <it>Avr4 </it>gene in the race 49 leading to a hypersensitive response in the bean plants. A biphasic pattern of gene expression was observed for several genes regulated in response to fungal infection.</p> <p>Conclusion</p> <p>The enrichment of the public database with over 6,000 bean ESTs significantly adds to the genomic resources available for this important crop plant. The analysis of these genes in response to bean rust infection provides a foundation for further studies of the mechanism of fungal disease resistance. The expression pattern of 90 bean genes upon rust infection shares several features with other legumes infected by biotrophic fungi. This finding suggests that the <it>P. vulgaris</it>-<it>U. appendiculatus </it>pathosystem could serve as a model to explore legume-rust interaction.</p
    corecore