102 research outputs found

    Integration of molecular network data reconstructs Gene Ontology.

    Get PDF
    Motivation: Recently, a shift was made from using Gene Ontology (GO) to evaluate molecular network data to using these data to construct and evaluate GO. Dutkowski et al. provide the first evidence that a large part of GO can be reconstructed solely from topologies of molecular networks. Motivated by this work, we develop a novel data integration framework that integrates multiple types of molecular network data to reconstruct and update GO. We ask how much of GO can be recovered by integrating various molecular interaction data. Results: We introduce a computational framework for integration of various biological networks using penalized non-negative matrix tri-factorization (PNMTF). It takes all network data in a matrix form and performs simultaneous clustering of genes and GO terms, inducing new relations between genes and GO terms (annotations) and between GO terms themselves. To improve the accuracy of our predicted relations, we extend the integration methodology to include additional topological information represented as the similarity in wiring around non-interacting genes. Surprisingly, by integrating topologies of bakers’ yeasts protein–protein interaction, genetic interaction (GI) and co-expression networks, our method reports as related 96% of GO terms that are directly related in GO. The inclusion of the wiring similarity of non-interacting genes contributes 6% to this large GO term association capture. Furthermore, we use our method to infer new relationships between GO terms solely from the topologies of these networks and validate 44% of our predictions in the literature. In addition, our integration method reproduces 48% of cellular component, 41% of molecular function and 41% of biological process GO terms, outperforming the previous method in the former two domains of GO. Finally, we predict new GO annotations of yeast genes and validate our predictions through GIs profiling. Availability and implementation: Supplementary Tables of new GO term associations and predicted gene annotations are available at http://bio-nets.doc.ic.ac.uk/GO-Reconstruction/. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online

    Fuse: Multiple Network Alignment via Data Fusion

    Get PDF

    Data Fusion by Matrix Factorization

    Get PDF
    For most problems in science and engineering we can obtain data sets that describe the observed system from various perspectives and record the behavior of its individual components. Heterogeneous data sets can be collectively mined by data fusion. Fusion can focus on a specific target relation and exploit directly associated data together with contextual data and data about system's constraints. In the paper we describe a data fusion approach with penalized matrix tri-factorization (DFMF) that simultaneously factorizes data matrices to reveal hidden associations. The approach can directly consider any data that can be expressed in a matrix, including those from feature-based representations, ontologies, associations and networks. We demonstrate the utility of DFMF for gene function prediction task with eleven different data sources and for prediction of pharmacologic actions by fusing six data sources. Our data fusion algorithm compares favorably to alternative data integration approaches and achieves higher accuracy than can be obtained from any single data source alone.Comment: Short preprint, 13 pages, 3 Figures, 3 Tables. Full paper in 10.1109/TPAMI.2014.234397

    Integrative methods for analysing big data in precision medicine

    Get PDF
    We provide an overview of recent developments in big data analyses in the context of precision medicine and health informatics. With the advance in technologies capturing molecular and medical data, we entered the area of “Big Data” in biology and medicine. These data offer many opportunities to advance precision medicine. We outline key challenges in precision medicine and present recent advances in data integration-based methods to uncover personalized information from big data produced by various omics studies. We survey recent integrative methods for disease subtyping, biomarkers discovery, and drug repurposing, and list the tools that are available to domain scientists. Given the ever-growing nature of these big data, we highlight key issues that big data integration methods will face

    Four algorithms to solve symmetric multi-type non-negative matrix tri-factorization problem

    Get PDF
    In this paper, we consider the symmetric multi-type non-negative matrix tri-factorization problem (SNMTF), which attempts to factorize several symmetric non-negative matrices simultaneously. This can be considered as a generalization of the classical non-negative matrix tri-factorization problem and includes a non-convex objective function which is a multivariate sixth degree polynomial and a has convex feasibility set. It has a special importance in data science, since it serves as a mathematical model for the fusion of different data sources in data clustering. We develop four methods to solve the SNMTF. They are based on four theoretical approaches known from the literature: the fixed point method (FPM), the block-coordinate descent with projected gradient (BCD), the gradient method with exact line search (GM-ELS) and the adaptive moment estimation method (ADAM). For each of these methods we offer a software implementation: for the former two methods we use Matlab and for the latter Python with the TensorFlow library. We test these methods on three data-sets: the synthetic data-set we generated, while the others represent real-life similarities between different objects. Extensive numerical results show that with sufficient computing time all four methods perform satisfactorily and ADAM most often yields the best mean square error (MSE\mathrm{MSE}). However, if the computation time is limited, FPM gives the best MSE\mathrm{MSE} because it shows the fastest convergence at the beginning. All data-sets and codes are publicly available on our GitLab profile

    Dual-dropout graph convolutional network for predicting synthetic lethality in human cancers

    Get PDF
    corecore