970 research outputs found

    Positive-Unlabeled Learning for inferring drug interactions based on heterogeneous attributes

    Get PDF
    BACKGROUND: Investigating and understanding drug-drug interactions (DDIs) is important in improving the effectiveness of clinical care. DDIs can occur when two or more drugs are administered together. Experimentally based DDI detection methods require a large cost and time. Hence, there is a great interest in developing efficient and useful computational methods for inferring potential DDIs. Standard binary classifiers require both positives and negatives for training. In a DDI context, drug pairs that are known to interact can serve as positives for predictive methods. But, the negatives or drug pairs that have been confirmed to have no interaction are scarce. To address this lack of negatives, we introduce a Positive-Unlabeled Learning method for inferring potential DDIs. RESULTS: The proposed method consists of three steps: i) application of Growing Self Organizing Maps to infer negatives from the unlabeled dataset; ii) using a pairwise similarity function to quantify the overlap between individual features of drugs and iii) using support vector machine classifier for inferring DDIs. We obtained 6036 DDIs from DrugBank database. Using the proposed approach, we inferred 589 drug pairs that are likely to not interact with each other; these drug pairs are used as representative data for the negative class in binary classification for DDI prediction. Moreover, we classify the predicted DDIs as Cytochrome P450 (CYP) enzyme-Dependent and CYP-Independent interactions invoking their locations on the Growing Self Organizing Map, due to the particular importance of these enzymes in clinically significant interaction effects. Further, we provide a case study on three predicted CYP-Dependent DDIs to evaluate the clinical relevance of this study. CONCLUSION: Our proposed approach showed an absolute improvement in F1-score of 14 and 38% in comparison to the method that randomly selects unlabeled data points as likely negatives, depending on the choice of similarity function. We inferred 5300 possible CYP-Dependent DDIs and 592 CYP-Independent DDIs with the highest posterior probabilities. Our discoveries can be used to improve clinical care as well as the research outcomes of drug development

    Automated gene function prediction through gene multifunctionality in biological networks

    Get PDF
    As the number of sequenced genomes rapidly grows, Automated Prediction of gene Function (AFP) is now a challenging problem. Despite significant progresses in the last several years, the accuracy of gene function prediction still needs to be improved in order to be used effectively in practice. Two of the main issues of AFP problem are the imbalance of gene functional annotations and the 'multifunctional properties' of genes. While the former is a well studied problem in machine learning, the latter has recently emerged in bioinformatics and few studies have been carried out about it. Here we propose a method for AFP which appropriately handles the label imbalance characterizing biological taxonomies, and embeds in the model the property of some genes of being 'multifunctional'. We tested the method in predicting the functions of the Gene Ontology functional hierarchy for genes of yeast and fly model organisms, in a genome-wide approach. The achieved results show that cost-sensitive strategies and 'gene multifunctionality' can be combined to achieve significantly better results than the compared state-of-the-art algorithms for AFP

    Microbial community pattern detection in human body habitats via ensemble clustering framework

    Full text link
    The human habitat is a host where microbial species evolve, function, and continue to evolve. Elucidating how microbial communities respond to human habitats is a fundamental and critical task, as establishing baselines of human microbiome is essential in understanding its role in human disease and health. However, current studies usually overlook a complex and interconnected landscape of human microbiome and limit the ability in particular body habitats with learning models of specific criterion. Therefore, these methods could not capture the real-world underlying microbial patterns effectively. To obtain a comprehensive view, we propose a novel ensemble clustering framework to mine the structure of microbial community pattern on large-scale metagenomic data. Particularly, we first build a microbial similarity network via integrating 1920 metagenomic samples from three body habitats of healthy adults. Then a novel symmetric Nonnegative Matrix Factorization (NMF) based ensemble model is proposed and applied onto the network to detect clustering pattern. Extensive experiments are conducted to evaluate the effectiveness of our model on deriving microbial community with respect to body habitat and host gender. From clustering results, we observed that body habitat exhibits a strong bound but non-unique microbial structural patterns. Meanwhile, human microbiome reveals different degree of structural variations over body habitat and host gender. In summary, our ensemble clustering framework could efficiently explore integrated clustering results to accurately identify microbial communities, and provide a comprehensive view for a set of microbial communities. Such trends depict an integrated biography of microbial communities, which offer a new insight towards uncovering pathogenic model of human microbiome.Comment: BMC Systems Biology 201

    GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function

    Get PDF
    Abstract Background: Most successful computational approaches for protein function prediction integrate multiple genomics and proteomics data sources to make inferences about the function of unknown proteins. The most accurate of these algorithms have long running times, making them unsuitable for real-time protein function prediction in large genomes. As a result, the predictions of these algorithms are stored in static databases that can easily become outdated. We propose a new algorithm, GeneMANIA, that is as accurate as the leading methods, while capable of predicting protein function in real-time. Results: We use a fast heuristic algorithm, derived from ridge regression, to integrate multiple functional association networks and predict gene function from a single process-specific network using label propagation. Our algorithm is efficient enough to be deployed on a modern webserver and is as accurate as, or more so than, the leading methods on the MouseFunc I benchmark and a new yeast function prediction benchmark; it is robust to redundant and irrelevant data and requires, on average, less than ten seconds of computation time on tasks from these benchmarks. Conclusion: GeneMANIA is fast enough to predict gene function on-the-fly while achieving state-of-the-art accuracy. A prototype version of a GeneMANIA-based webserver is available at http://morrislab.med.utoronto.ca/prototype

    Novel topological descriptors for analyzing biological networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Topological descriptors, other graph measures, and in a broader sense, graph-theoretical methods, have been proven as powerful tools to perform biological network analysis. However, the majority of the developed descriptors and graph-theoretical methods does not have the ability to take vertex- and edge-labels into account, e.g., atom- and bond-types when considering molecular graphs. Indeed, this feature is important to characterize biological networks more meaningfully instead of only considering pure topological information.</p> <p>Results</p> <p>In this paper, we put the emphasis on analyzing a special type of biological networks, namely bio-chemical structures. First, we derive entropic measures to calculate the information content of vertex- and edge-labeled graphs and investigate some useful properties thereof. Second, we apply the mentioned measures combined with other well-known descriptors to supervised machine learning methods for predicting Ames mutagenicity. Moreover, we investigate the influence of our topological descriptors - measures for only unlabeled vs. measures for labeled graphs - on the prediction performance of the underlying graph classification problem.</p> <p>Conclusions</p> <p>Our study demonstrates that the application of entropic measures to molecules representing graphs is useful to characterize such structures meaningfully. For instance, we have found that if one extends the measures for determining the structural information content of unlabeled graphs to labeled graphs, the uniqueness of the resulting indices is higher. Because measures to structurally characterize labeled graphs are clearly underrepresented so far, the further development of such methods might be valuable and fruitful for solving problems within biological network analysis.</p

    A Comparative Study of Pairwise Learning Methods based on Kernel Ridge Regression

    Full text link
    Many machine learning problems can be formulated as predicting labels for a pair of objects. Problems of that kind are often referred to as pairwise learning, dyadic prediction or network inference problems. During the last decade kernel methods have played a dominant role in pairwise learning. They still obtain a state-of-the-art predictive performance, but a theoretical analysis of their behavior has been underexplored in the machine learning literature. In this work we review and unify existing kernel-based algorithms that are commonly used in different pairwise learning settings, ranging from matrix filtering to zero-shot learning. To this end, we focus on closed-form efficient instantiations of Kronecker kernel ridge regression. We show that independent task kernel ridge regression, two-step kernel ridge regression and a linear matrix filter arise naturally as a special case of Kronecker kernel ridge regression, implying that all these methods implicitly minimize a squared loss. In addition, we analyze universality, consistency and spectral filtering properties. Our theoretical results provide valuable insights in assessing the advantages and limitations of existing pairwise learning methods.Comment: arXiv admin note: text overlap with arXiv:1606.0427

    Feature Extraction from Degree Distribution for Comparison and Analysis of Complex Networks

    Full text link
    The degree distribution is an important characteristic of complex networks. In many data analysis applications, the networks should be represented as fixed-length feature vectors and therefore the feature extraction from the degree distribution is a necessary step. Moreover, many applications need a similarity function for comparison of complex networks based on their degree distributions. Such a similarity measure has many applications including classification and clustering of network instances, evaluation of network sampling methods, anomaly detection, and study of epidemic dynamics. The existing methods are unable to effectively capture the similarity of degree distributions, particularly when the corresponding networks have different sizes. Based on our observations about the structure of the degree distributions in networks over time, we propose a feature extraction and a similarity function for the degree distributions in complex networks. We propose to calculate the feature values based on the mean and standard deviation of the node degrees in order to decrease the effect of the network size on the extracted features. The proposed method is evaluated using different artificial and real network datasets, and it outperforms the state of the art methods with respect to the accuracy of the distance function and the effectiveness of the extracted features.Comment: arXiv admin note: substantial text overlap with arXiv:1307.362

    Quantification and Comparison of Degree Distributions in Complex Networks

    Full text link
    The degree distribution is an important characteristic of complex networks. In many applications, quantification of degree distribution in the form of a fixed-length feature vector is a necessary step. On the other hand, we often need to compare the degree distribution of two given networks and extract the amount of similarity between the two distributions. In this paper, we propose a novel method for quantification of the degree distributions in complex networks. Based on this quantification method,a new distance function is also proposed for degree distributions, which captures the differences in the overall structure of the two given distributions. The proposed method is able to effectively compare networks even with different scales, and outperforms the state of the art methods considerably, with respect to the accuracy of the distance function
    corecore