7 research outputs found
Recent advances in clustering methods for protein interaction networks
The increasing availability of large-scale protein-protein interaction data has made it possible to understand the basic components and organization of cell machinery from the network level. The arising challenge is how to analyze such complex interacting data to reveal the principles of cellular organization, processes and functions. Many studies have shown that clustering protein interaction network is an effective approach for identifying protein complexes or functional modules, which has become a major research topic in systems biology. In this review, recent advances in clustering methods for protein interaction networks will be presented in detail. The predictions of protein functions and interactions based on modules will be covered. Finally, the performance of different clustering methods will be compared and the directions for future research will be discussed
FAC-PIN: An efficient and fast agglomerative clustering algorithm for protein interaction networks to predict protein complexes and functional modules
Proteins are known to interact with each other to perform specific living organism functions by forming functional modules or protein complexes. Many community detection methods have been devised for the discovery of functional modules or protein complexes in protein interaction networks. One common problem in current agglomerative community detection approaches is that vertices with just one neighbor are often classified as separated clusters, which does not make sense for module or complex identification. In this thesis, we propose a new agglomerative algorithm, FAC-PIN, based on a local premetric of relative vertex-to-vertex clustering value. Our proposed FAC-PIN method is applied to PINs from different species for validating functional modules and protein complexes generated from FAC-PIN with experimentally verified functional modules and complexes respectively. The preliminary computational results show that FAC-PIN can discover functional modules and protein complexes from PINs more accurately. As well as we have also compared the computational times for different species with HC-PIN and CNM algorithms. Our algorithm outperforms two algorithms. Our FAC-PIN algorithm is faster and accurate algorithm which is the current state-of-the-art agglomerative approach to complex prediction and functional module identification
Information Flow in Interaction Networks
Interaction networks, consisting of agents linked by their interactions, are
ubiquitous across many disciplines of modern science. Many methods of analysis
of interaction networks have been proposed, mainly concentrating on node degree
distribution or aiming to discover clusters of agents that are very strongly
connected between themselves. These methods are principally based on
graph-theory or machine learning.
We present a mathematically simple formalism for modelling context-specific
information propagation in interaction networks based on random walks. The
context is provided by selection of sources and destinations of information and
by use of potential functions that direct the flow towards the destinations. We
also use the concept of dissipation to model the aging of information as it
diffuses from its source.
Using examples from yeast protein-protein interaction networks and some of
the histone acetyltransferases involved in control of transcription, we
demonstrate the utility of the concepts and the mathematical constructs
introduced in this paper.Comment: 30 pages, 5 figures. This paper was published in 2007 in Journal of
Computational Biology. The version posted here does not include post
peer-review change
New approaches to weighted frequent pattern mining
Researchers have proposed frequent pattern mining algorithms that are more
efficient than previous algorithms and generate fewer but more important patterns. Many
techniques such as depth first/breadth first search, use of tree/other data structures, top
down/bottom up traversal and vertical/horizontal formats for frequent pattern mining
have been developed. Most frequent pattern mining algorithms use a support measure to
prune the combinatorial search space. However, support-based pruning is not enough
when taking into consideration the characteristics of real datasets. Additionally, after
mining datasets to obtain the frequent patterns, there is no way to adjust the number of
frequent patterns through user feedback, except for changing the minimum support.
Alternative measures for mining frequent patterns have been suggested to address these
issues. One of the main limitations of the traditional approach for mining frequent
patterns is that all items are treated uniformly when, in reality, items have different
importance. For this reason, weighted frequent pattern mining algorithms have been
suggested that give different weights to items according to their significance. The main
focus in weighted frequent pattern mining concerns satisfying the downward closure
property. In this research, frequent pattern mining approaches with weight constraints are
suggested. Our main approach is to push weight constraints into the pattern growth
algorithm while maintaining the downward closure property. We develop WFIM
(Weighted Frequent Itemset Mining with a weight range and a minimum weight),
WLPMiner (Weighted frequent Pattern Mining with length decreasing constraints), WIP
(Weighted Interesting Pattern mining with a strong weight and/or support affinity),
WSpan (Weighted Sequential pattern mining with a weight range and a minimum
weight) and WIS (Weighted Interesting Sequential pattern mining with a similar level of
support and/or weight affinity)
The extensive performance analysis shows that suggested approaches are
efficient and scalable in weighted frequent pattern mining
LIPIcs, Volume 258, SoCG 2023, Complete Volume
LIPIcs, Volume 258, SoCG 2023, Complete Volum
Semantics-based language models for information retrieval and text mining
The language modeling approach centers on the issue of estimating an accurate model by choosing appropriate language models as well as smoothing techniques. In the thesis, we propose a novel context-sensitive semantic smoothing method referred to as a topic signature language model. It extracts explicit topic signatures from a document and then statistically maps them into individual words in the vocabulary. In order to support the new language model, we developed two automated algorithms to extract multiword phrases and ontological concepts, respectively, and an EM-based algorithm to learn semantic mapping knowledge from co-occurrence data. The topic signature language model is applied to three applications: information retrieval, text classification, and text clustering. The evaluations on news collection and biomedical literature prove the effectiveness of the topic signature language model.In the experiment of information retrieval, the topic signature language model consistently outperforms the baseline two-stage language model as well as the context-insensitive semantic smoothing method in all configurations. It also beats the state-of-the-art Okapi models in all configurations. In the experiment of text classification, when the size of training documents is small, the Bayesian classifier with semantic smoothing not only outperforms the classifiers with background smoothing and Laplace smoothing, but it also beats the active learning classifiers and SVM classifiers. On the task of clustering, whether or not the dataset to cluster is small, the model-based k-means with semantic smoothing performs significantly better than both the model-based k-means with background smoothing and Laplace smoothing. It is also superior to the spherical k-means in terms of effectiveness.In addition, we empirically prove that, within the framework of topic signature language models, the semantic knowledge learned from one collection could be effectively applied to other collections. In the thesis, we also compare three types of topic signatures (i.e., words, multiword phrases, and ontological concepts), with respect to their effectiveness and efficiency for semantic smoothing. In general, it is more expensive to extract multiword phrases and ontological concepts than individual words, but semantic mapping based on multiword phrases and ontological concepts are more effective in handling data sparsity than on individual words.Ph.D., Information Science and Technology -- Drexel University, 200