Search CORE

141 research outputs found

A Novel Biclustering Approach to Association Rule Mining for Predicting HIV-1–Human Protein Interactions

Author: A Ben-Hur
A Mukhopadhyay
A Mukhopadhyay
A Mukhopadhyay
A Panchenko
A Prelic
AL DeFranco
Anirban Mukhopadhyay
B Goethals
C Zhou
D Gibellini
F Supek
H Vashistha
J Doolittle
J Hipp
J Huang
J Jiang
JI MacPherson
L Zhang
MD Dyer
MJ Zaki
MR Arkin
N Lin
N Pasquier
O Tastan
P Gupta
Peter Csermely
R Agrawal
R Agrawal
R Cheung
R Jansen
RG Ptak
RN Saha
S Bandyopadhyay
Sanghamitra Bandyopadhyay
SC Madeira
U Maulik
U Maulik
U Maulik
U Maulik
Ujjwal Maulik
W Fu
X Wang
Y Qi
Y Qi
Y Yamanishi
Publication venue: Public Library of Science
Publication date: 23/04/2012
Field of study

Identification of potential viral-host protein interactions is a vital and useful approach towards development of new drugs targeting those interactions. In recent days, computational tools are being utilized for predicting viral-host interactions. Recently a database containing records of experimentally validated interactions between a set of HIV-1 proteins and a set of human proteins has been published. The problem of predicting new interactions based on this database is usually posed as a classification problem. However, posing the problem as a classification one suffers from the lack of biologically validated negative interactions. Therefore it will be beneficial to use the existing database for predicting new viral-host interactions without the need of negative samples. Motivated by this, in this article, the HIV-1–human protein interaction database has been analyzed using association rule mining. The main objective is to identify a set of association rules both among the HIV-1 proteins and among the human proteins, and use these rules for predicting new interactions. In this regard, a novel association rule mining technique based on biclustering has been proposed for discovering frequent closed itemsets followed by the association rules from the adjacency matrix of the HIV-1–human interaction network. Novel HIV-1–human interactions have been predicted based on the discovered association rules and tested for biological significance. For validation of the predicted new interactions, gene ontology-based and pathway-based studies have been performed. These studies show that the human proteins which are predicted to interact with a particular viral protein share many common biological activities. Moreover, literature survey has been used for validation purpose to identify some predicted interactions that are already validated experimentally but not present in the database. Comparison with other prediction methods is also discussed

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Recommended from our members

iBBiG: iterative binary bi-clustering of gene sets

Author: Bentink Stefan
Culhane Aedín C.
Gusenleitner Daniel
Howe Eleanor A.
Quackenbush John
Publication venue: 'Oxford University Press (OUP)'
Publication date: 24/04/2013
Field of study

Motivation: Meta-analysis of genomics data seeks to identify genes associated with a biological phenotype across multiple datasets; however, merging data from different platforms by their features (genes) is challenging. Meta-analysis using functionally or biologically characterized gene sets simplifies data integration is biologically intuitive and is seen as having great potential, but is an emerging field with few established statistical methods. Results: We transform gene expression profiles into binary gene set profiles by discretizing results of gene set enrichment analyses and apply a new iterative bi-clustering algorithm (iBBiG) to identify groups of gene sets that are coordinately associated with groups of phenotypes across multiple studies. iBBiG is optimized for meta-analysis of large numbers of diverse genomics data that may have unmatched samples. It does not require prior knowledge of the number or size of clusters. When applied to simulated data, it outperforms commonly used clustering methods, discovers overlapping clusters of diverse sizes and is robust in the presence of noise. We apply it to meta-analysis of breast cancer studies, where iBBiG extracted novel gene set—phenotype association that predicted tumor metastases within tumor subtypes

Harvard University - DASH

Propagation-Based Biclustering Algorithm for Extracting Inclusion-Maximal Motifs

Author: Boryczko Krzysztof
Orzechowski Patryk
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 11/07/2016
Field of study

Biclustering, which is simultaneous clustering of columns and rows in data matrix, became an issue when classical clustering algorithms proved not to be good enough to detect similar expressions of genes under subset of conditions. Biclustering algorithms may be also applied to different datasets, such as medical, economical, social networks etc. In this article we explain the concept beneath hybrid biclustering algorithms and present details of propagation-based biclustering, a novel approach for extracting inclusion-maximal gene expression motifs conserved in gene microarray data. We prove that this approach may successfully compete with other well-recognized biclustering algorithms

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Biclustering random matrix partitions with an application to classification of forensic body fluids

Author: Nicholls Geoff K.
Roeder Amy D.
Wu Chieh-Hsi
Publication venue
Publication date: 27/06/2023
Field of study

Classification of unlabeled data is usually achieved by supervised learning from labeled samples. Although there exist many sophisticated supervised machine learning methods that can predict the missing labels with a high level of accuracy, they often lack the required transparency in situations where it is important to provide interpretable results and meaningful measures of confidence. Body fluid classification of forensic casework data is the case in point. We develop a new Biclustering Dirichlet Process (BDP), with a three-level hierarchy of clustering, and a model-based approach to classification which adapts to block structure in the data matrix. As the class labels of some observations are missing, the number of rows in the data matrix for each class is unknown. The BDP handles this and extends existing biclustering methods by simultaneously biclustering multiple matrices each having a randomly variable number of rows. We demonstrate our method by applying it to the motivating problem, which is the classification of body fluids based on mRNA profiles taken from crime scenes. The analyses of casework-like data show that our method is interpretable and produces well-calibrated posterior probabilities. Our model can be more generally applied to other types of data with a similar structure to the forensic data.Comment: 45 pages, 10 figure

arXiv.org e-Print Archive