35 research outputs found

    Computational Analysis of RNAi Screening Data to Identify Host Factors Involved in Viral Infection and to Characterize Protein-Protein Interactions

    Get PDF
    The study of gene functions in a variety of different treatments, cell lines and organisms has been facilitated by RNA interference (RNAi) technology that tracks the phenotype of cells after silencing of particular genes. In this thesis, I describe two computational approaches developed to analyze the image data from two different RNAi screens. Firstly, I developed an alternative approach to detect host factors (human proteins) that support virus growth and replication of cells infected with the Hepatitis C virus (HCV). To identify the human proteins that are crucial for the efficiency of viral infection, several RNAi experiments of viral-infected cells have been conducted. However, the target lists from different laboratories have shown only little overlap. This inconsistency might be caused not only by experimental discrepancies, but also by not fully explored possibilities of the data analysis. Observing only viral intensity readouts from the experiments might be insufficient. In this project, I describe our computational development as a new alternative approach to improve the reliability for the host factor identification. Our approach is based on characterizing the clustering of infected cells. The idea is that viral infection is spread by cell-cell contacts, or at least advantaged by the vicinity of cells. Therefore, clustering of the HCV infected cells is observed during spreading of the infection. We developed a clustering detection method basing on a distance-based point pattern analysis (K-function) to identify knockdown genes in which the clusters of HCV infected cells were reduced. The approach could significantly separate between positive and negative controls and found good correlations between the clustering score and intensity readouts from the experimental screens. In comparison to another clustering algorithm, the K-function method was superior to Quadrat analysis method. Statistical normalization approaches were exploited to identify protein targets from our clustering-based approach and the experimental screens. Integrating results from our clustering method, intensity readout analysis and secondary screen, we finally identified five promising host factors that are suitable candidate targets for drug therapy. Secondly, a machine learning based approach was developed to characterize protein-protein interactions (PPIs) in a signaling network. The characterization of each PPI is fundamental to our understanding of the complex signaling system of a human cell. Experiments for PPI identification, such as yeast two-hybrid and FRET analysis, are resource-intensive, and, therefore, computational approaches for analysing large-scale RNAi knockdown screens have become an important pursuit of inferring the functional similarities from the phenotypic similarities of the down-regulated proteins. However, these methods did not provide a more detailed characterization of the PPIs. In this project, I developed a new computational approach that is based on a machine learning technique which employs the mitotic phenotypes of an RNAi screen. It enables the identification of the nature of a PPI, i.e., if it is of rather activating or inhibiting nature. We established a systematic classification using Support Vector Machines (SVMs) that was based on the phenotypic descriptors and used it to classify the interactions that activate or inhibit signal transduction. The machines yielded promising results with good performance when integrating different sets of published descriptors and our own developed descriptors calculated from fractions of specific phenotypes, linear classification of phenotypes, and phenotypic distance to distinct proteins. A comprehensive model generated from the machines was used for further predictions. We investigated the nature of pairs of interacting proteins and generated a consistency score that enhanced the precisions of the classification results. We predicted the activating/inhibiting nature for 214 PPIs with high confidence in signaling pathways and enabled to identify a new subgroup of chemokine receptors. These findings might facilitate an enhanced understanding of the cellular mechanisms during inflammation and immunologic responses. In summary, two computational approaches were developed to analyze the image data of the different RNAi screens: 1) a clustering-based approach was used to identify the host factors that are crucial for HCV infection; and 2) a machine learning-based approach with various descriptors was employed to characterize PPI activities. The results from the host factor analysis revealed novel target proteins that are involved in the spread of the HCV. In addition, the results of the characterization of the PPIs lead to a better understanding of the signaling pathways. The two large-scale RNAi data were successfully analyzed by our established approaches to obtain new insights into virus biology and cellular signaling

    Hybrid Deep Learning Based on a Heterogeneous Network Profile for Functional Annotations of Plasmodium falciparum Genes

    No full text
    Functional annotation of unknown function genes reveals unidentified functions that can enhance our understanding of complex genome communications. A common approach for inferring gene function involves the ortholog-based method. However, genetic data alone are often not enough to provide information for function annotation. Thus, integrating other sources of data can potentially increase the possibility of retrieving annotations. Network-based methods are efficient techniques for exploring interactions among genes and can be used for functional inference. In this study, we present an analysis framework for inferring the functions of Plasmodium falciparum genes based on connection profiles in a heterogeneous network between human and Plasmodium falciparum proteins. These profiles were fed into a hybrid deep learning algorithm to predict the orthologs of unknown function genes. The results show high performance of the model’s predictions, with an AUC of 0.89. One hundred and twenty-one predicted pairs with high prediction scores were selected for inferring the functions using statistical enrichment analysis. Using this method, PF3D7_1248700 and PF3D7_0401800 were found to be involved with muscle contraction and striated muscle tissue development, while PF3D7_1303800 and PF3D7_1201000 were found to be related to protein dephosphorylation. In conclusion, combining a heterogeneous network and a hybrid deep learning technique can allow us to identify unknown gene functions of malaria parasites. This approach is generalized and can be applied to other diseases that enhance the field of biomedical science

    Heterogeneous Network Model to Identify Potential Associations Between Plasmodium vivax and Human Proteins

    No full text
    Integration of multiple sources and data levels provides a great insight into the complex associations between human and malaria systems. In this study, a meta-analysis framework was developed based on a heterogeneous network model for integrating human-malaria protein similarities, a human protein interaction network, and a Plasmodium vivax protein interaction network. An iterative network propagation was performed on the heterogeneous network until we obtained stabilized weights. The association scores were calculated for qualifying a novel potential human-malaria protein association. This method provided a better performance compared to random experiments. After that, the stabilized network was clustered into association modules. The potential association candidates were then thoroughly analyzed by statistical enrichment analysis with protein complexes and known drug targets. The most promising target proteins were the succinate dehydrogenase protein complex in the human citrate (TCA) cycle pathway and the nicotinic acetylcholine receptor in the human central nervous system. Promising associations and potential drug targets were also provided for further studies and designs in therapeutic approaches for malaria at a systematic level. In conclusion, this method is efficient to identify new human-malaria protein associations and can be generalized to infer other types of association studies to further advance biomedical science

    DDA: A Novel Network-Based Scoring Method to Identify Disease-Disease Associations

    No full text
    Categorizing human diseases provides higher efficiency and accuracy for disease diagnosis, prognosis, and treatment. Disease-disease association (DDA) is a precious information that indicates the large-scale structure of complex relationships of diseases. However, the number of known and reliable associations is very small. Therefore, identification of DDAs is a challenging task in systems biology and medicine. Here, we developed a novel network-based scoring algorithm called DDA to identify the relationships between diseases in a large-scale study. Our method is developed based on a random walk prioritization in a protein-protein interaction network. This approach considers not only whether two diseases directly share associated genes but also the statistical relationships between two different diseases using known disease-related genes. Predicted associations were validated by known DDAs from a database and literature supports. The method yielded a good performance with an area under the curve of 71% and outperformed other standard association indices. Furthermore, novel DDAs and relationships among diseases from the clusters analysis were reported. This method is efficient to identify disease-disease relationships on an interaction network and can also be generalized to other association studies to further enhance knowledge in medical studies

    Network-based association analysis to infer new disease-gene relationships using large-scale protein interactions.

    No full text
    Protein-protein interactions integrated with disease-gene associations represent important information for revealing protein functions under disease conditions to improve the prevention, diagnosis, and treatment of complex diseases. Although several studies have attempted to identify disease-gene associations, the number of possible disease-gene associations is very small. High-throughput technologies have been established experimentally to identify the association between genes and diseases. However, these techniques are still quite expensive, time consuming, and even difficult to perform. Thus, based on currently available data and knowledge, computational methods have served as alternatives to provide more possible associations to increase our understanding of disease mechanisms. Here, a new network-based algorithm, namely, Disease-Gene Association (DGA), was developed to calculate the association score of a query gene to a new possible set of diseases. First, a large-scale protein interaction network was constructed, and the relationship between two interacting proteins was calculated with regard to the disease relationship. Novel plausible disease-gene pairs were identified and statistically scored by our algorithm using neighboring protein information. The results yielded high performance for disease-gene prediction, with an F-measure of 0.78 and an AUC of 0.86. To identify promising candidates of disease-gene associations, the association coverage of genes and diseases were calculated and used with the association score to perform gene and disease selection. Based on gene selection, we identified promising pairs that exhibited evidence related to several important diseases, e.g., inflammation, lipid metabolism, inborn errors, xanthomatosis, cerebellar ataxia, cognitive deterioration, malignant neoplasms of the skin and malignant tumors of the cervix. Focusing on disease selection, we identified target genes that were important to blistering skin diseases and muscular dystrophy. In summary, our developed algorithm is simple, efficiently identifies disease-gene associations in the protein-protein interaction network and provides additional knowledge regarding disease-gene associations. This method can be generalized to other association studies to further advance biomedical science

    Reverse Nearest Neighbor Search on a Protein-Protein Interaction Network to Infer Protein-Disease Associations

    No full text
    The associations between proteins and diseases are crucial information for investigating pathological mechanisms. However, the number of known and reliable protein-disease associations is quite small. In this study, an analysis framework to infer associations between proteins and diseases was developed based on a large data set of a human protein-protein interaction network integrating an effective network search, namely, the reverse k -nearest neighbor (R k NN) search. The R k NN search was used to identify an impact of a protein on other proteins. Then, associations between proteins and diseases were inferred statistically. The method using the R k NN search yielded a much higher precision than a random selection, standard nearest neighbor search, or when applying the method to a random protein-protein interaction network. All protein-disease pair candidates were verified by a literature search. Supporting evidence for 596 pairs was identified. In addition, cluster analysis of these candidates revealed 10 promising groups of diseases to be further investigated experimentally. This method can be used to identify novel associations to better understand complex relationships between proteins and diseases

    ICON-GEMs: integration of co-expression network in genome-scale metabolic models, shedding light through systems biology

    No full text
    Abstract Background Flux Balance Analysis (FBA) is a key metabolic modeling method used to simulate cellular metabolism under steady-state conditions. Its simplicity and versatility have led to various strategies incorporating transcriptomic and proteomic data into FBA, successfully predicting flux distribution and phenotypic results. However, despite these advances, the untapped potential lies in leveraging gene-related connections like co-expression patterns for valuable insights. Results To fill this gap, we introduce ICON-GEMs, an innovative constraint-based model to incorporate gene co-expression network into the FBA model, facilitating more precise determination of flux distributions and functional pathways. In this study, transcriptomic data from both Escherichia coli and Saccharomyces cerevisiae were integrated into their respective genome-scale metabolic models. A comprehensive gene co-expression network was constructed as a global view of metabolic mechanism of the cell. By leveraging quadratic programming, we maximized the alignment between pairs of reaction fluxes and the correlation of their corresponding genes in the co-expression network. The outcomes notably demonstrated that ICON-GEMs outperformed existing methodologies in predictive accuracy. Flux variabilities over subsystems and functional modules also demonstrate promising results. Furthermore, a comparison involving different types of biological networks, including protein–protein interactions and random networks, reveals insights into the utilization of the co-expression network in genome-scale metabolic engineering. Conclusion ICON-GEMs introduce an innovative constrained model capable of simultaneous integration of gene co-expression networks, ready for board application across diverse transcriptomic data sets and multiple organisms. It is freely available as open-source at https://github.com/ThummaratPaklao/ICOM-GEMs.git

    DGA algorithm performance.

    No full text
    <p>The highest F-measure from cross-validation results from the DGA algorithm with different disease relationships calculated using the Jaccard, Simpson, Geometric, and Cosine indices.</p

    Immune-Related Protein Interaction Network in Severe COVID-19 Patients toward the Identification of Key Proteins and Drug Repurposing

    No full text
    Coronavirus disease 2019 (COVID-19) is still an active global public health issue. Although vaccines and therapeutic options are available, some patients experience severe conditions and need critical care support. Hence, identifying key genes or proteins involved in immune-related severe COVID-19 is necessary to find or develop the targeted therapies. This study proposed a novel construction of an immune-related protein interaction network (IPIN) in severe cases with the use of a network diffusion technique on a human interactome network and transcriptomic data. Enrichment analysis revealed that the IPIN was mainly associated with antiviral, innate immune, apoptosis, cell division, and cell cycle regulation signaling pathways. Twenty-three proteins were identified as key proteins to find associated drugs. Finally, poly (I:C), mitomycin C, decitabine, gemcitabine, hydroxyurea, tamoxifen, and curcumin were the potential drugs interacting with the key proteins to heal severe COVID-19. In conclusion, IPIN can be a good representative network for the immune system that integrates the protein interaction network and transcriptomic data. Thus, the key proteins and target drugs in IPIN help to find a new treatment with the use of existing drugs to treat the disease apart from vaccination and conventional antiviral therapy

    Multi-Data Aspects of Protein Similarity with a Learning Technique to Identify Drug-Disease Associations

    No full text
    Drug repositioning has been proposed to develop drugs for diseases. However, the similarity in a single aspect may not be sufficient to reveal hidden information. Therefore, we established protein–protein similarity vectors (PPSVs) based on potential similarities in various types of biological information associated with proteins, including their network topology, proteomic data, functional analysis, and druggable property. Based on the proposed PPSVs, a separate drug–disease matrix was constructed for individual to prevent characteristics from being obscured between diseases. The classification technique was employed for prediction. The results showed that more than half of the tested disease models exhibited high performance, with overall F1 scores of more than 80%. Furthermore, comparing all diseases using traditional methods in one run, we obtained an (area under the curve) AUC of 98.9%. All candidate drugs were then tested in clinical trials (p-value &lt; 2.2 × 10−16) and were known drugs based on their functions (p-value &lt; 0.05). An analysis revealed that, in the functional aspect, the confidence value of an interaction in the protein–protein interaction network and the functional pathway score were the best descriptors for prediction. Based on the learning processes of PPSVs with an isolated disease, the classifier exhibited high performance in predicting and identifying new potential drugs for that disease
    corecore