11 research outputs found

    IntPath-an integrated pathway gene relationship database for model organisms and important pathogens

    Get PDF
    10.1186/1752-0509-6-S2-S2BMC Systems Biology6SUPPL.2

    A ratiometric-based measure of gene co-expression

    Get PDF

    Stringent DDI-based Prediction of H. sapiens-M. tuberculosis H37Rv Protein-Protein Interactions

    Get PDF
    Background: H. sapiens-M. tuberculosis H37Rv protein-protein interaction (PPI) data are very important information to illuminate the infection mechanism of M. tuberculosis H37Rv. But current H. sapiens-M. tuberculosis H37Rv PPI data are very scarce. This seriously limits the study of the interaction between this important pathogen and its host H. sapiens. Computational prediction of H. sapiens-M. tuberculosis H37Rv PPIs is an important strategy to fill in the gap. Domain-domain interaction (DDI) based prediction is one of the frequently used computational approaches in predicting both intra-species and inter-species PPIs. However, the performance of DDI-based host-pathogen PPI prediction has been rather limited. Results: We develop a stringent DDI-based prediction approach with emphasis on (i) differences between the specific domain sequences on annotated regions of proteins under the same domain ID and (ii) calculation of the interaction strength of predicted PPIs based on the interacting residues in their interaction interfaces. We compare our stringent DDI-based approach to a conventional DDI-based approach for predicting PPIs based on gold standard intra-species PPIs and coherent informative Gene Ontology terms assessment. The assessment results show that our stringent DDI-based approach achieves much better performance in predicting PPIs than the conventional approach. Using our stringent DDI-based approach, we have predicted a small set of reliable H. sapiens-M. tuberculosis H37Rv PPIs which could be very useful for a variety of related studies. We also analyze the H. sapiens-M. tuberculosis H37Rv PPIs predicted by our stringent DDI-based approach using cellular compartment distribution analysis, functional category enrichment analysis and pathway enrichment analysis. The analyses support the validity of our prediction result. Also, based on an analysis of the H. sapiens-M. tuberculosis H37Rv PPI network predicted by our stringent DDI-based approach, we have discovered some important properties of domains involved in host-pathogen PPIs. We find that both host and pathogen proteins involved in host-pathogen PPIs tend to have more domains than proteins involved in intra-species PPIs, and these domains have more interaction partners than domains on proteins involved in intra-species PPI. Conclusions: The stringent DDI-based prediction approach reported in this work provides a stringent strategy for predicting host-pathogen PPIs. It also performs better than a conventional DDI-based approach in predicting PPIs. We have predicted a small set of accurate H. sapiens-M. tuberculosis H37Rv PPIs which could be very useful for a variety of related studies

    A ratiometric-based measure of gene co-expression

    Get PDF
    Background: Gene co-expression analysis has previously been based on measures that include correlation coefficients and mutual information, as well as newcomers such as MIC. These measures depend primarily on the degree of association between the RNA levels of two genes and to a lesser extent on their variability. They focus on the similarity of expression value trajectories that change in like manner across samples. However there are relationships of biological interest for which these classical measures are expected to be insensitive. These include genes whose expression levels are ratiometrically stable and genes whose variance is tightly constrained. Large-scale studies of relatively homogeneous samples, including single cell RNA-seq, are experimental settings in which such relationships might be especially pertinent. Results: We develop and implement a ratiometric approach for detecting gene associations (abbreviated RA). It is based on the coefficient of variation of the measured expression ratio of each pair of genes. We apply it to a collection of lymphoblastoid RNA-seq data from the 1000 Genomes Project Consortium, a typical sample set with high overall homogeneity. RA is a selective method, reporting in this case ~1/4 of all possible gene pairs, yet these relationships include a distilled picture of biological relationships previously found by other methods. In addition, RA reveals expression relationships that are not detected by traditional correlation and mutual information methods. We also analyze data from individual lymphoblastoid cells and show that desirable properties of the RA method extend to single-cell RNA-seq. Conclusion: We show that our ratiometric method identifies biologically significant relationships that are often missed or low-ranked by conventional association-based methods when applied to a relatively homogenous dataset. The results open new questions about the regulatory mechanisms that produce strong RA relationships. RA is scalable and potentially well suited for the analysis of thousands of bulk-RNA or single-cell transcriptomes

    Computational prediction of host-pathogen protein-protein interactions

    Get PDF
    Philosophiae Doctor - PhDSupervised machine learning approaches have been applied successfully to the prediction of protein-protein interactions (PPIs) within a single organism, i.e., intra-species predictions. However, because of the absence of large amounts of experimentally validated PPIs data for training and testing, fewer studies have successfully applied these techniques to host-pathogen PPI, i.e., inter-species comparisons. Among the host-pathogen studies, most of them have focused on human-virus interactions and specifically human-HIV PPI data. Additional improvements to machine learning techniques and feature sets are important to improve the classification accuracy for host-pathogen protein-protein interactions prediction. The primary aim of this bioinformatics thesis was to develop a binary classifier with an appropriate feature set for host-pathogen protein-protein interaction prediction using published human-Hepatitis C virus PPI, and to test the model on available host-pathogen data for human-Bacillus anthracis PPI. Twelve different feature sets were compared to find the optimal set. The feature selection process reveals that our novel quadruple feature (a subsequence of four consecutive amino acid) combined with sequence similarity and human interactome network properties (such as degree, cluster coefficient, and betweenness centrality) were the best set. The optimal feature set outperformed those in the relevant published material, giving 95.9% sensitivity, 91.6% specificity and 89.0% accuracy. Using our optimal features set, we developed a neural network model to predict PPI between human-Mycobacterium tuberculosis. The strategy is to develop a model trained with intra-species PPI data and extend it to inter-species prediction. However, the lack of experimentally validated PPI data between human-Mycobacterium tuberculosis (Mtuberculosis), leads us to first assess the feasibility of using validated intra-species PPI data to build a model for inter-species PPI. In this model we used human intra-species PPI combined with Bacillus anthracis intra-species data to develop a binary classification model and extend the model for human-Bacillus anthracis inter-species prediction. Thus, we test our hypotheses on known human-Bacillus anthracis PPI data and the result shows good performance with 89.0% as average accuracy. The same approach was extended to the prediction of PPI between human-Mycobacterium tuberculosis. The predicted human-M-tuberculosis PPI data were further validated using functional enrichment of experimentally verified secretory proteins in M-tuberculosis, cellular compartment analysis and pathway enrichment analysis. Results show that five of the M-tuberculosis secretory proteins within an infected host macrophage that correspond to the mycobacterial virulent strain H37Rv were extracted from the human-M- tuberculosis PPI dataset predicted by our model. Finally, a web server was created to predict PPIs between human and Mycobacterium tuberculosis which is available online at URL:http://hppredict.sanbi.ac.za. In summary, the concepts, techniques and technologies developed as part of this thesis have the potential to contribute not only to the understanding PPI analysis between human and Mycobacterium tuberculosis, but can be extended to other pathogens. Further materials related to this study are available at ftp://ftp.sanbi.ac.za/machine learning.National Research Foundation (NRF) and SANB
    corecore