141 research outputs found
A Novel Biclustering Approach to Association Rule Mining for Predicting HIV-1–Human Protein Interactions
Identification of potential viral-host protein interactions is a vital and useful approach towards development of new drugs targeting those interactions. In recent days, computational tools are being utilized for predicting viral-host interactions. Recently a database containing records of experimentally validated interactions between a set of HIV-1 proteins and a set of human proteins has been published. The problem of predicting new interactions based on this database is usually posed as a classification problem. However, posing the problem as a classification one suffers from the lack of biologically validated negative interactions. Therefore it will be beneficial to use the existing database for predicting new viral-host interactions without the need of negative samples. Motivated by this, in this article, the HIV-1–human protein interaction database has been analyzed using association rule mining. The main objective is to identify a set of association rules both among the HIV-1 proteins and among the human proteins, and use these rules for predicting new interactions. In this regard, a novel association rule mining technique based on biclustering has been proposed for discovering frequent closed itemsets followed by the association rules from the adjacency matrix of the HIV-1–human interaction network. Novel HIV-1–human interactions have been predicted based on the discovered association rules and tested for biological significance. For validation of the predicted new interactions, gene ontology-based and pathway-based studies have been performed. These studies show that the human proteins which are predicted to interact with a particular viral protein share many common biological activities. Moreover, literature survey has been used for validation purpose to identify some predicted interactions that are already validated experimentally but not present in the database. Comparison with other prediction methods is also discussed
Recommended from our members
iBBiG: iterative binary bi-clustering of gene sets
Motivation: Meta-analysis of genomics data seeks to identify genes associated with a biological phenotype across multiple datasets; however, merging data from different platforms by their features (genes) is challenging. Meta-analysis using functionally or biologically characterized gene sets simplifies data integration is biologically intuitive and is seen as having great potential, but is an emerging field with few established statistical methods. Results: We transform gene expression profiles into binary gene set profiles by discretizing results of gene set enrichment analyses and apply a new iterative bi-clustering algorithm (iBBiG) to identify groups of gene sets that are coordinately associated with groups of phenotypes across multiple studies. iBBiG is optimized for meta-analysis of large numbers of diverse genomics data that may have unmatched samples. It does not require prior knowledge of the number or size of clusters. When applied to simulated data, it outperforms commonly used clustering methods, discovers overlapping clusters of diverse sizes and is robust in the presence of noise. We apply it to meta-analysis of breast cancer studies, where iBBiG extracted novel gene set—phenotype association that predicted tumor metastases within tumor subtypes
Propagation-Based Biclustering Algorithm for Extracting Inclusion-Maximal Motifs
Biclustering, which is simultaneous clustering of columns and rows in data matrix, became an issue when classical clustering algorithms proved not to be good enough to detect similar expressions of genes under subset of conditions. Biclustering algorithms may be also applied to different datasets, such as medical, economical, social networks etc. In this article we explain the concept beneath hybrid biclustering algorithms and present details of propagation-based biclustering, a novel approach for extracting inclusion-maximal gene expression motifs conserved in gene microarray data. We prove that this approach may successfully compete with other well-recognized biclustering algorithms
Biclustering random matrix partitions with an application to classification of forensic body fluids
Classification of unlabeled data is usually achieved by supervised learning
from labeled samples. Although there exist many sophisticated supervised
machine learning methods that can predict the missing labels with a high level
of accuracy, they often lack the required transparency in situations where it
is important to provide interpretable results and meaningful measures of
confidence. Body fluid classification of forensic casework data is the case in
point. We develop a new Biclustering Dirichlet Process (BDP), with a
three-level hierarchy of clustering, and a model-based approach to
classification which adapts to block structure in the data matrix. As the class
labels of some observations are missing, the number of rows in the data matrix
for each class is unknown. The BDP handles this and extends existing
biclustering methods by simultaneously biclustering multiple matrices each
having a randomly variable number of rows. We demonstrate our method by
applying it to the motivating problem, which is the classification of body
fluids based on mRNA profiles taken from crime scenes. The analyses of
casework-like data show that our method is interpretable and produces
well-calibrated posterior probabilities. Our model can be more generally
applied to other types of data with a similar structure to the forensic data.Comment: 45 pages, 10 figure
- …