155 research outputs found
Outer membrane proteins can be simply identified using secondary structure element alignment
<p>Abstract</p> <p>Background</p> <p>Outer membrane proteins (OMPs) are frequently found in the outer membranes of gram-negative bacteria, mitochondria and chloroplasts and have been found to play diverse functional roles. Computational discrimination of OMPs from globular proteins and other types of membrane proteins is helpful to accelerate new genome annotation and drug discovery.</p> <p>Results</p> <p>Based on the observation that almost all OMPs consist of antiparallel Ξ²-strands in a barrel shape and that their secondary structure arrangements differ from those of other types of proteins, we propose a simple method called SSEA-OMP to identify OMPs using secondary structure element alignment. Through intensive benchmark experiments, the proposed SSEA-OMP method is better than some well-established OMP detection methods.</p> <p>Conclusions</p> <p>The major advantage of SSEA-OMP is its good prediction performance considering its simplicity. The web server implements the method is freely accessible at <url>http://protein.cau.edu.cn/SSEA-OMP/index.html</url>.</p
The prediction of protein-protein interaction networks in rice blast fungus
<p>Abstract</p> <p>Background</p> <p>Protein-protein interaction (PPI) maps are useful tools for investigating the cellular functions of genes. Thus far, large-scale PPI mapping projects have not been implemented for the rice blast fungus <it>Magnaporthe grisea</it>, which is responsible for the most severe rice disease. Inspired by recent advances in PPI prediction, we constructed a PPI map of this important fungus.</p> <p>Results</p> <p>Using a well-recognized interolog approach, we have predicted 11,674 interactions among 3,017 <it>M. grisea </it>proteins. Although the scale of the constructed map covers approximately only one-fourth of the <it>M. grisea</it>'s proteome, it is the first PPI map for this crucial organism and will therefore provide new insights into the functional genomics of the rice blast fungus. Focusing on the network topology of proteins encoded by known pathogenicity genes, we have found that pathogenicity proteins tend to interact with higher numbers of proteins. The pathogenicity proteins and their interacting partners in the entire network were then used to construct a subnet called a pathogenicity network. These data may provide further clues for the study of these pathogenicity proteins. Finally, it has been established that secreted proteins in <it>M. grisea </it>interact with fewer proteins. These secreted proteins and their interacting partners were also compiled into a network of secreted proteins, which may be helpful in constructing an interactome between the rice blast fungus and rice.</p> <p>Conclusion</p> <p>We predicted the PPIs of <it>M. grisea </it>and compiled them into a database server called MPID. It is hoped that MPID will provide new hints as to the functional genomics of this fungus. MPID is available at <url>http://bioinformatics.cau.edu.cn/zzd_lab/MPID.html</url>.</p
DescFold: A web server for protein fold recognition
<p>Abstract</p> <p>Background</p> <p>Machine learning-based methods have been proven to be powerful in developing new fold recognition tools. In our previous work [Zhang, Kochhar and Grigorov (2005) <it>Protein Science</it>, <b>14</b>: 431-444], a machine learning-based method called DescFold was established by using Support Vector Machines (SVMs) to combine the following four descriptors: a profile-sequence-alignment-based descriptor using Psi-blast <it>e</it>-values and bit scores, a sequence-profile-alignment-based descriptor using Rps-blast <it>e</it>-values and bit scores, a descriptor based on secondary structure element alignment (SSEA), and a descriptor based on the occurrence of PROSITE functional motifs. In this work, we focus on the improvement of DescFold by incorporating more powerful descriptors and setting up a user-friendly web server.</p> <p>Results</p> <p>In seeking more powerful descriptors, the profile-profile alignment score generated from the COMPASS algorithm was first considered as a new descriptor (i.e., PPA). When considering a profile-profile alignment between two proteins in the context of fold recognition, one protein is regarded as a template (i.e., its 3D structure is known). Instead of a sequence profile derived from a Psi-blast search, a structure-seeded profile for the template protein was generated by searching its structural neighbors with the assistance of the TM-align structural alignment algorithm. Moreover, the COMPASS algorithm was used again to derive a profile-structural-profile-alignment-based descriptor (i.e., PSPA). We trained and tested the new DescFold in a total of 1,835 highly diverse proteins extracted from the SCOP 1.73 version. When the PPA and PSPA descriptors were introduced, the new DescFold boosts the performance of fold recognition substantially. Using the SCOP_1.73_40% dataset as the fold library, the DescFold web server based on the trained SVM models was further constructed. To provide a large-scale test for the new DescFold, a stringent test set of 1,866 proteins were selected from the SCOP 1.75 version. At a less than 5% false positive rate control, the new DescFold is able to correctly recognize structural homologs at the fold level for nearly 46% test proteins. Additionally, we also benchmarked the DescFold method against several well-established fold recognition algorithms through the LiveBench targets and Lindahl dataset.</p> <p>Conclusions</p> <p>The new DescFold method was intensively benchmarked to have very competitive performance compared with some well-established fold recognition methods, suggesting that it can serve as a useful tool to assist in template-based protein structure prediction. The DescFold server is freely accessible at <url>http://202.112.170.199/DescFold/index.html</url>.</p
Comprehensive analysis of tandem amino acid repeats from ten angiosperm genomes
<p>Abstract</p> <p>Background</p> <p>The presence of tandem amino acid repeats (AARs) is one of the signatures of eukaryotic proteins. AARs were thought to be frequently involved in bio-molecular interactions. Comprehensive studies that primarily focused on metazoan AARs have suggested that AARs are evolving rapidly and are highly variable among species. However, there is still controversy over causal factors of this inter-species variation. In this work, we attempted to investigate this topic mainly by comparing AARs in orthologous proteins from ten angiosperm genomes.</p> <p>Results</p> <p>Angiosperm AAR content is positively correlated with the GC content of the protein coding sequence. However, based on observations from fungal AARs and insect AARs, we argue that the applicability of this kind of correlation is limited by AAR residue composition and species' life history traits. Angiosperm AARs also tend to be fast evolving and structurally disordered, supporting the results of comprehensive analyses of metazoans. The functions of conserved long AARs are summarized. Finally, we propose that the rapid mRNA decay rate, alternative splicing and tissue specificity are regulatory processes that are associated with angiosperm proteins harboring AARs.</p> <p>Conclusions</p> <p>Our investigation suggests that GC content is a predictor of AAR content in the protein coding sequence under certain conditions. Although angiosperm AARs lack conservation and 3D structure, a fraction of the proteins that contain AARs may be functionally important and are under extensive regulation in plant cells.</p
TIM-Finder: A new method for identifying TIM-barrel proteins
<p>Abstract</p> <p>Background</p> <p>The triosephosphate isomerase (TIM)-barrel fold occurs frequently in the proteomes of different organisms, and the known TIM-barrel proteins have been found to play diverse functional roles. To accelerate the exploration of the sequence-structure protein landscape in the TIM-barrel fold, a computational tool that allows sensitive detection of TIM-barrel proteins is required.</p> <p>Results</p> <p>To develop a new TIM-barrel protein identification method in this work, we consider three descriptors: a sequence-alignment-based descriptor using PSI-BLAST e-values and bit scores, a descriptor based on secondary structure element alignment (SSEA), and a descriptor based on the occurrence of PROSITE functional motifs. With the assistance of Support Vector Machine (SVM), the three descriptors were combined to obtain a new method with improved performance, which we call TIM-Finder. When tested on the whole proteome of <it>Bacillus subtilis</it>, TIM-Finder is able to detect 194 TIM-barrel proteins at a 99% confidence level, outperforming the PSI-BLAST search as well as one existing fold recognition method.</p> <p>Conclusions</p> <p>TIM-Finder can serve as a competitive tool for proteome-wide TIM-barrel protein identification. The TIM-Finder web server is freely accessible at <url>http://202.112.170.199/TIM-Finder/</url>.</p
Predicting Residue-Residue Contacts and Helix-Helix Interactions in Transmembrane Proteins Using an Integrative Feature-Based Random Forest Approach
Integral membrane proteins constitute 25β30% of genomes and play crucial roles in many biological processes. However, less than 1% of membrane protein structures are in the Protein Data Bank. In this context, it is important to develop reliable computational methods for predicting the structures of membrane proteins. Here, we present the first application of random forest (RF) for residue-residue contact prediction in transmembrane proteins, which we term as TMhhcp. Rigorous cross-validation tests indicate that the built RF models provide a more favorable prediction performance compared with two state-of-the-art methods, i.e., TMHcon and MEMPACK. Using a strict leave-one-protein-out jackknifing procedure, they were capable of reaching the top L/5 prediction accuracies of 49.5% and 48.8% for two different residue contact definitions, respectively. The predicted residue contacts were further employed to predict interacting helical pairs and achieved the Matthew's correlation coefficients of 0.430 and 0.424, according to two different residue contact definitions, respectively. To facilitate the academic community, the TMhhcp server has been made freely accessible at http://protein.cau.edu.cn/tmhhcp
Coordinated regulation of core and accessory genes in the multipartite genome of Sinorhizobium fredii
Prokaryotes benefit from having accessory genes, but it is unclear how accessory genes can be linked with the core regulatory network when developing adaptations to new niches. Here we determined hierarchical core/accessory subsets in the multipartite pangenome (composed of genes from the chromosome, chromid and plasmids) of the soybean microsymbiont Sinorhizobium fredii by comparing twelve Sinorhizobium genomes. Transcriptomes of two S. fredii strains at mid-log and stationary growth phases and in symbiotic conditions were obtained. The average level of gene expression, variation of expression between different conditions, and gene connectivity within the co-expression network were positively correlated with the gene conservation level from strain-specific accessory genes to genus core. Condition-dependent transcriptomes exhibited adaptive transcriptional changes in pangenome subsets shared by the two strains, while strain-dependent transcriptomes were enriched with accessory genes on the chromid. Proportionally more chromid genes than plasmid genes were co-expressed with chromosomal genes, while plasmid genes had a higher within-replicon connectivity in expression than chromid ones. However, key nitrogen fixation genes on the symbiosis plasmid were characterized by high connectivity in both within- and between-replicon analyses. Among those genes with host-specific upregulation patterns, chromosomal znu and mdt operons, encoding a conserved high-affinity zinc transporter and an accessory multi-drug efflux system, respectively, were experimentally demonstrated to be involved in host-specific symbiotic adaptation. These findings highlight the importance of integrative regulation of hierarchical core/accessory components in the multipartite genome of bacteria during niche adaptation and in shaping the prokaryotic pangenome in the long run
Evaluation of a novel saliva-based epidermal growth factor receptor mutation detection for lung cancer: A pilot study.
BackgroundThis article describes a pilot study evaluating a novel liquid biopsy system for non-small cell lung cancer (NSCLC) patients. The electric field-induced release and measurement (EFIRM) method utilizes an electrochemical biosensor for detecting oncogenic mutations in biofluids.MethodsSaliva and plasma of 17 patients were collected from three cancer centers prior to and after surgical resection. The EFIRM method was then applied to the collected samples to assay for exon 19 deletion and p.L858 mutations. EFIRM results were compared with cobas results of exon 19 deletion and p.L858 mutation detection in cancer tissues.ResultsThe EFIRM method was found to detect exon 19 deletion with an area under the curve (AUC) of 1.0 in both saliva and plasma samples in lung cancer patients. For L858R mutation detection, the AUC of saliva was 1.0, while the AUC of plasma was 0.98. Strong correlations were also found between presurgery and post-surgery samples for both saliva (0.86 for exon 19 and 0.98 for L858R) and plasma (0.73 for exon 19 and 0.94 for L858R).ConclusionOur study demonstrates the feasibility of utilizing EFIRM to rapidly, non-invasively, and conveniently detect epidermal growth factor receptor mutations in the saliva of patients with NSCLC, with results corresponding perfectly with the results of cobas tissue genotyping
Comparative Analysis of the Genomes of Two Field Isolates of the Rice Blast Fungus Magnaporthe oryzae.
Rice blast caused by Magnaporthe oryzae is one of the most destructive diseases of rice worldwide. The fungal pathogen is notorious for its ability to overcome host resistance. To better understand its genetic variation in nature, we sequenced the genomes of two field isolates, Y34 and P131. In comparison with the previously sequenced laboratory strain 70-15, both field isolates had a similar genome size but slightly more genes. Sequences from the field isolates were used to improve genome assembly and gene prediction of 70-15. Although the overall genome structure is similar, a number of gene families that are likely involved in plant-fungal interactions are expanded in the field isolates. Genome-wide analysis on asynonymous to synonymous nucleotide substitution rates revealed that many infection-related genes underwent diversifying selection. The field isolates also have hundreds of isolate-specific genes and a number of isolate-specific gene duplication events. Functional characterization of randomly selected isolate-specific genes revealed that they play diverse roles, some of which affect virulence. Furthermore, each genome contains thousands of loci of transposon-like elements, but less than 30% of them are conserved among different isolates, suggesting active transposition events in M. oryzae. A total of approximately 200 genes were disrupted in these three strains by transposable elements. Interestingly, transposon-like elements tend to be associated with isolate-specific or duplicated sequences. Overall, our results indicate that gain or loss of unique genes, DNA duplication, gene family expansion, and frequent translocation of transposon-like elements are important factors in genome variation of the rice blast fungus
- β¦