102 research outputs found
Integration of molecular network data reconstructs Gene Ontology.
Motivation: Recently, a shift was made from using Gene Ontology (GO) to evaluate molecular network data to using these data to construct and evaluate GO. Dutkowski et al. provide the first evidence that a large part of GO can be reconstructed solely from topologies of molecular networks. Motivated by this work, we develop a novel data integration framework that integrates multiple types of molecular network data to reconstruct and update GO. We ask how much of GO can be recovered by integrating various molecular interaction data. Results: We introduce a computational framework for integration of various biological networks using penalized non-negative matrix tri-factorization (PNMTF). It takes all network data in a matrix form and performs simultaneous clustering of genes and GO terms, inducing new relations between genes and GO terms (annotations) and between GO terms themselves. To improve the accuracy of our predicted relations, we extend the integration methodology to include additional topological information represented as the similarity in wiring around non-interacting genes. Surprisingly, by integrating topologies of bakers’ yeasts protein–protein interaction, genetic interaction (GI) and co-expression networks, our method reports as related 96% of GO terms that are directly related in GO. The inclusion of the wiring similarity of non-interacting genes contributes 6% to this large GO term association capture. Furthermore, we use our method to infer new relationships between GO terms solely from the topologies of these networks and validate 44% of our predictions in the literature. In addition, our integration method reproduces 48% of cellular component, 41% of molecular function and 41% of biological process GO terms, outperforming the previous method in the former two domains of GO. Finally, we predict new GO annotations of yeast genes and validate our predictions through GIs profiling. Availability and implementation: Supplementary Tables of new GO term associations and predicted gene annotations are available at http://bio-nets.doc.ic.ac.uk/GO-Reconstruction/. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online
Data Fusion by Matrix Factorization
For most problems in science and engineering we can obtain data sets that
describe the observed system from various perspectives and record the behavior
of its individual components. Heterogeneous data sets can be collectively mined
by data fusion. Fusion can focus on a specific target relation and exploit
directly associated data together with contextual data and data about system's
constraints. In the paper we describe a data fusion approach with penalized
matrix tri-factorization (DFMF) that simultaneously factorizes data matrices to
reveal hidden associations. The approach can directly consider any data that
can be expressed in a matrix, including those from feature-based
representations, ontologies, associations and networks. We demonstrate the
utility of DFMF for gene function prediction task with eleven different data
sources and for prediction of pharmacologic actions by fusing six data sources.
Our data fusion algorithm compares favorably to alternative data integration
approaches and achieves higher accuracy than can be obtained from any single
data source alone.Comment: Short preprint, 13 pages, 3 Figures, 3 Tables. Full paper in
10.1109/TPAMI.2014.234397
Integrative methods for analysing big data in precision medicine
We provide an overview of recent developments in big data analyses in the context of precision medicine and health informatics. With the advance in technologies capturing molecular and medical data, we entered the area of “Big Data” in biology and medicine. These data offer many opportunities to advance precision medicine. We outline key challenges in precision medicine and present recent advances in data integration-based methods to uncover personalized information from big data produced by various omics studies. We survey recent integrative methods for disease subtyping, biomarkers discovery, and drug repurposing, and list the tools that are available to domain scientists. Given the ever-growing nature of these big data, we highlight key issues that big data integration methods will face
Four algorithms to solve symmetric multi-type non-negative matrix tri-factorization problem
In this paper, we consider the symmetric multi-type non-negative matrix
tri-factorization problem (SNMTF), which attempts to factorize several
symmetric non-negative matrices simultaneously. This can be considered as a
generalization of the classical non-negative matrix tri-factorization problem
and includes a non-convex objective function which is a multivariate sixth
degree polynomial and a has convex feasibility set. It has a special importance
in data science, since it serves as a mathematical model for the fusion of
different data sources in data clustering.
We develop four methods to solve the SNMTF. They are based on four
theoretical approaches known from the literature: the fixed point method (FPM),
the block-coordinate descent with projected gradient (BCD), the gradient method
with exact line search (GM-ELS) and the adaptive moment estimation method
(ADAM). For each of these methods we offer a software implementation: for the
former two methods we use Matlab and for the latter Python with the TensorFlow
library.
We test these methods on three data-sets: the synthetic data-set we
generated, while the others represent real-life similarities between different
objects.
Extensive numerical results show that with sufficient computing time all four
methods perform satisfactorily and ADAM most often yields the best mean square
error (). However, if the computation time is limited, FPM gives
the best because it shows the fastest convergence at the
beginning.
All data-sets and codes are publicly available on our GitLab profile
- …