15,171 research outputs found
Quantifying randomness in protein-protein interaction networks of different species: A random matrix approach
We analyze protein-protein interaction networks for six different species
under the framework of random matrix theory. Nearest neighbor spacing
distribution of the eigenvalues of adjacency matrices of the largest connected
part of these networks emulate universal Gaussian orthogonal statistics of
random matrix theory. We demonstrate that spectral rigidity, which quantifies
long range correlations in eigenvalues, for all protein-protein interaction
networks follow random matrix prediction up to certain ranges indicating
randomness in interactions. After this range, deviation from the universality
evinces underlying structural features in network.Comment: 20 pages, 5 figure
Classification in biological networks with hypergraphlet kernels
Biological and cellular systems are often modeled as graphs in which vertices
represent objects of interest (genes, proteins, drugs) and edges represent
relational ties among these objects (binds-to, interacts-with, regulates). This
approach has been highly successful owing to the theory, methodology and
software that support analysis and learning on graphs. Graphs, however, often
suffer from information loss when modeling physical systems due to their
inability to accurately represent multiobject relationships. Hypergraphs, a
generalization of graphs, provide a framework to mitigate information loss and
unify disparate graph-based methodologies. In this paper, we present a
hypergraph-based approach for modeling physical systems and formulate vertex
classification, edge classification and link prediction problems on
(hyper)graphs as instances of vertex classification on (extended, dual)
hypergraphs in a semi-supervised setting. We introduce a novel kernel method on
vertex- and edge-labeled (colored) hypergraphs for analysis and learning. The
method is based on exact and inexact (via hypergraph edit distances)
enumeration of small simple hypergraphs, referred to as hypergraphlets, rooted
at a vertex of interest. We extensively evaluate this method and show its
potential use in a positive-unlabeled setting to estimate the number of missing
and false positive links in protein-protein interaction networks
An application of topological graph clustering to protein function prediction
We use a semisupervised learning algorithm based on a topological data
analysis approach to assign functional categories to yeast proteins using
similarity graphs. This new approach to analyzing biological networks yields
results that are as good as or better than state of the art existing
approaches.Comment: 10 page
Towards Gene Expression Convolutions using Gene Interaction Graphs
We study the challenges of applying deep learning to gene expression data. We
find experimentally that there exists non-linear signal in the data, however is
it not discovered automatically given the noise and low numbers of samples used
in most research. We discuss how gene interaction graphs (same pathway,
protein-protein, co-expression, or research paper text association) can be used
to impose a bias on a deep model similar to the spatial bias imposed by
convolutions on an image. We explore the usage of Graph Convolutional Neural
Networks coupled with dropout and gene embeddings to utilize the graph
information. We find this approach provides an advantage for particular tasks
in a low data regime but is very dependent on the quality of the graph used. We
conclude that more work should be done in this direction. We design experiments
that show why existing methods fail to capture signal that is present in the
data when features are added which clearly isolates the problem that needs to
be addressed.Comment: 4 pages +1 page references, To appear in the International Conference
on Machine Learning Workshop on Computational Biology, 201
Randomness and preserved patterns in cancer network
Breast cancer has been reported to account for the maximum cases among all
female cancers till date. In order to gain a deeper insight into the
complexities of the disease, we analyze the breast cancer network and its
normal counterpart at the proteomic level. While the short range correlations
in the eigenvalues exhibiting universality provide an evidence towards the
importance of random connections in the underlying networks, the long range
correlations along with the localization properties reveal insightful
structural patterns involving functionally important proteins. The analysis
provides a benchmark for designing drugs which can target a subgraph instead of
individual proteins.Comment: 21 pages, 9 figure
node2vec: Scalable Feature Learning for Networks
Prediction tasks over nodes and edges in networks require careful effort in
engineering features used by learning algorithms. Recent research in the
broader field of representation learning has led to significant progress in
automating prediction by learning the features themselves. However, present
feature learning approaches are not expressive enough to capture the diversity
of connectivity patterns observed in networks. Here we propose node2vec, an
algorithmic framework for learning continuous feature representations for nodes
in networks. In node2vec, we learn a mapping of nodes to a low-dimensional
space of features that maximizes the likelihood of preserving network
neighborhoods of nodes. We define a flexible notion of a node's network
neighborhood and design a biased random walk procedure, which efficiently
explores diverse neighborhoods. Our algorithm generalizes prior work which is
based on rigid notions of network neighborhoods, and we argue that the added
flexibility in exploring neighborhoods is the key to learning richer
representations. We demonstrate the efficacy of node2vec over existing
state-of-the-art techniques on multi-label classification and link prediction
in several real-world networks from diverse domains. Taken together, our work
represents a new way for efficiently learning state-of-the-art task-independent
representations in complex networks.Comment: In Proceedings of the 22nd ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, 201
Network Enhancement: a general method to denoise weighted biological networks
Networks are ubiquitous in biology where they encode connectivity patterns at
all scales of organization, from molecular to the biome. However, biological
networks are noisy due to the limitations of measurement technology and
inherent natural variation, which can hamper discovery of network patterns and
dynamics. We propose Network Enhancement (NE), a method for improving the
signal-to-noise ratio of undirected, weighted networks. NE uses a doubly
stochastic matrix operator that induces sparsity and provides a closed-form
solution that increases spectral eigengap of the input network. As a result, NE
removes weak edges, enhances real connections, and leads to better downstream
performance. Experiments show that NE improves gene function prediction by
denoising tissue-specific interaction networks, alleviates interpretation of
noisy Hi-C contact maps from the human genome, and boosts fine-grained
identification accuracy of species. Our results indicate that NE is widely
applicable for denoising biological networks
Thresholding of Semantic Similarity Networks using a Spectral Graph Based Technique
Semantic similarity measures (SSMs) refer to a set of algorithms used to
quantify the similarity of two or more terms belonging to the same ontology.
Ontology terms may be associated to concepts, for instance in computational
biology gene and proteins are associated with terms of biological ontologies.
Thus, SSMs may be used to quantify the similarity of genes and proteins
starting from the comparison of the associated annotations. SSMs have been
recently used to compare genes and proteins even on a system level scale. More
recently some works have focused on the building and analysis of Semantic
Similarity Networks (SSNs) i.e. weighted networks in which nodes represents
genes or proteins while weighted edges represent the semantic similarity score
among them. SSNs are quasi-complete networks, thus their analysis presents
different challenges that should be addressed. For instance, the need for the
introduction of reliable thresholds for the elimination of meaningless edges
arises. Nevertheless, the use of global thresholding methods may produce the
elimination of meaningful nodes, while the use of local thresholds may
introduce biases. For these aims, we introduce a novel technique, based on
spectral graph considerations and on a mixed global-local focus. The
effectiveness of our technique is demonstrated by using markov clustering for
the extraction of biological modules. We applied clustering to simplified
networks demonstrating a considerable improvements with respect to the original
ones
Representation Learning on Graphs: Methods and Applications
Machine learning on graphs is an important and ubiquitous task with
applications ranging from drug design to friendship recommendation in social
networks. The primary challenge in this domain is finding a way to represent,
or encode, graph structure so that it can be easily exploited by machine
learning models. Traditionally, machine learning approaches relied on
user-defined heuristics to extract features encoding structural information
about a graph (e.g., degree statistics or kernel functions). However, recent
years have seen a surge in approaches that automatically learn to encode graph
structure into low-dimensional embeddings, using techniques based on deep
learning and nonlinear dimensionality reduction. Here we provide a conceptual
review of key advancements in this area of representation learning on graphs,
including matrix factorization-based methods, random-walk based algorithms, and
graph neural networks. We review methods to embed individual nodes as well as
approaches to embed entire (sub)graphs. In doing so, we develop a unified
framework to describe these recent approaches, and we highlight a number of
important applications and directions for future work.Comment: Published in the IEEE Data Engineering Bulletin, September 2017;
version with minor correction
Spectral properties of complex networks
This review presents an account of the major works done on spectra of
adjacency matrices drawn on networks and the basic understanding attained so
far. We have divided the review under three sections: (a) extremal eigenvalues,
(b) bulk part of the spectrum and (c) degenerate eigenvalues, based on the
intrinsic properties of eigenvalues and the phenomena they capture. We have
reviewed the works done for spectra of various popular model networks, such as
the Erd\H{o}s-R\'enyi random networks, scale-free networks, 1-d lattice,
small-world networks, and various different real-world networks. Additionally,
potential applications of spectral properties for natural processes have been
reviewed.Comment: 29 pages, 18 figure
- …