348 research outputs found
Information flow in interaction networks II: channels, path lengths and potentials
In our previous publication, a framework for information flow in interaction
networks based on random walks with damping was formulated with two fundamental
modes: emitting and absorbing. While many other network analysis methods based
on random walks or equivalent notions have been developed before and after our
earlier work, one can show that they can all be mapped to one of the two modes.
In addition to these two fundamental modes, a major strength of our earlier
formalism was its accommodation of context-specific directed information flow
that yielded plausible and meaningful biological interpretation of protein
functions and pathways. However, the directed flow from origins to destinations
was induced via a potential function that was heuristic. Here, with a
theoretically sound approach called the channel mode, we extend our earlier
work for directed information flow. This is achieved by constructing a
potential function facilitating a purely probabilistic interpretation of the
channel mode. For each network node, the channel mode combines the solutions of
emitting and absorbing modes in the same context, producing what we call a
channel tensor. The entries of the channel tensor at each node can be
interpreted as the amount of flow passing through that node from an origin to a
destination. Similarly to our earlier model, the channel mode encompasses
damping as a free parameter that controls the locality of information flow.
Through examples involving the yeast pheromone response pathway, we illustrate
the versatility and stability of our new framework.Comment: Minor changes from v3. 30 pages, 7 figures. Plain LaTeX format. This
version contains some additional material compared to the journal submission:
two figures, one appendix and a few paragraph
Information Flow in Interaction Networks
Interaction networks, consisting of agents linked by their interactions, are
ubiquitous across many disciplines of modern science. Many methods of analysis
of interaction networks have been proposed, mainly concentrating on node degree
distribution or aiming to discover clusters of agents that are very strongly
connected between themselves. These methods are principally based on
graph-theory or machine learning.
We present a mathematically simple formalism for modelling context-specific
information propagation in interaction networks based on random walks. The
context is provided by selection of sources and destinations of information and
by use of potential functions that direct the flow towards the destinations. We
also use the concept of dissipation to model the aging of information as it
diffuses from its source.
Using examples from yeast protein-protein interaction networks and some of
the histone acetyltransferases involved in control of transcription, we
demonstrate the utility of the concepts and the mathematical constructs
introduced in this paper.Comment: 30 pages, 5 figures. This paper was published in 2007 in Journal of
Computational Biology. The version posted here does not include post
peer-review change
CytoITMprobe: a network information flow plugin for Cytoscape
To provide the Cytoscape users the possibility of integrating ITM Probe into
their workflows, we developed CytoITMprobe, a new Cytoscape plugin.
CytoITMprobe maintains all the desirable features of ITM Probe and adds
additional flexibility not achievable through its web service version. It
provides access to ITM Probe either through a web server or locally. The input,
consisting of a Cytoscape network, together with the desired origins and/or
destinations of information and a dissipation coefficient, is specified through
a query form. The results are shown as a subnetwork of significant nodes and
several summary tables. Users can control the composition and appearance of the
subnetwork and interchange their ITM Probe results with other software tools
through tab-delimited files.
The main strength of CytoITMprobe is its flexibility. It allows the user to
specify as input any Cytoscape network, rather than being restricted to the
pre-compiled protein-protein interaction networks available through the ITM
Probe web service. Users may supply their own edge weights and
directionalities. Consequently, as opposed to ITM Probe web service,
CytoITMprobe can be applied to many other domains of network-based research
beyond protein-networks. It also enables seamless integration of ITM Probe
results with other Cytoscape plugins having complementary functionality for
data analysis.Comment: 16 pages, 6 figures. Version
Learning from Partially Labeled Data: Unsupervised and Semi-supervised Learning on Graphs and Learning with Distribution Shifting
This thesis focuses on two fundamental machine learning problems:unsupervised learning, where no label information is available, and semi-supervised learning, where a small amount of labels are given in
addition to unlabeled data. These problems arise in many real word applications, such as Web analysis and bioinformatics,where a large amount of data is available, but no or only a small amount of labeled data exists. Obtaining classification labels in these domains is usually quite difficult because it involves either manual labeling or physical experimentation.
This thesis approaches these problems from two perspectives:
graph based and distribution based.
First, I investigate a series of graph based learning algorithms that are able to exploit information embedded in different types of graph structures. These algorithms allow label information to be shared between nodes
in the graph---ultimately communicating information globally to yield effective unsupervised and semi-supervised learning.
In particular, I extend existing graph based learning algorithms, currently based on undirected graphs, to more general graph types, including directed graphs, hypergraphs and complex networks. These richer graph representations allow one to more naturally
capture the intrinsic data relationships that exist, for example, in Web data, relational data, bioinformatics and social networks.
For each of these generalized graph structures I show how information propagation can be characterized by distinct random walk models, and then use this characterization
to develop new unsupervised and semi-supervised learning algorithms.
Second, I investigate a more statistically oriented approach that explicitly models a learning scenario where the training and test examples come from different distributions.
This is a difficult situation for standard statistical learning approaches, since they typically incorporate an assumption that the distributions for training and test sets are similar, if not identical. To achieve good performance in this scenario, I utilize unlabeled data to correct the bias between the training and test distributions. A key idea is to produce resampling weights for bias correction by working directly in a feature space and bypassing the problem
of explicit density estimation. The technique can be easily applied to many different supervised learning algorithms, automatically adapting their behavior to cope with distribution shifting between training and test data
How kinesin waits for ATP affects the nucleotide and load dependence of the stepping kinetics
Dimeric molecular motors walk on polar tracks by binding and hydrolyzing one
ATP per step. Despite tremendous progress, the waiting state for ATP binding in
the well-studied kinesin that walks on microtubule (MT), remains controversial.
One experiment suggests that in the waiting state both heads are bound to the
MT, while the other shows that ATP binds to the leading head after the partner
head detaches. To discriminate between these two scenarios, we developed a
theory to calculate accurately several experimentally measurable quantities as
a function of ATP concentration and resistive force.
In particular, we predict that measurement of the randomness parameter could
discriminate between the two scenarios for the waiting state of kinesin,
thereby resolving this standing controversy
Retinal Vascular Network Topology Reconstruction and Artery/Vein Classification via Dominant Set Clustering
The estimation of vascular network topology in complex networks is important in understanding the relationship between vascular changes and a wide spectrum of diseases. Automatic classification of the retinal vascular trees into arteries and veins is of direct assistance to the ophthalmologist in terms of diagnosis and treatment of eye disease. However, it is challenging due to their projective ambiguity and subtle changes in appearance, contrast and geometry in the imaging process. In this paper, we propose a novel method that is capable of making the artery/vein (A/V) distinction in retinal color fundus images based on vascular network topological properties. To this end, we adapt the concept of dominant set clustering and formalize the retinal blood vessel topology estimation and the A/V classification as a pairwise clustering problem. The graph is constructed through image segmentation, skeletonization and identification of significant nodes. The edge weight is defined as the inverse Euclidean distance between its two end points in the feature space of intensity, orientation, curvature, diameter, and entropy. The reconstructed vascular network is classified into arteries and veins based on their intensity and morphology. The proposed approach has been applied to five public databases, INSPIRE, IOSTAR, VICAVR, DRIVE and WIDE, and achieved high accuracies of 95.1%, 94.2%, 93.8%, 91.1%, and 91.0%, respectively. Furthermore, we have made manual annotations of the blood vessel topologies for INSPIRE, IOSTAR, VICAVR, and DRIVE datasets, and these annotations are released for public access so as to facilitate researchers in the community
Fractals in the Nervous System: conceptual Implications for Theoretical Neuroscience
This essay is presented with two principal objectives in mind: first, to
document the prevalence of fractals at all levels of the nervous system, giving
credence to the notion of their functional relevance; and second, to draw
attention to the as yet still unresolved issues of the detailed relationships
among power law scaling, self-similarity, and self-organized criticality. As
regards criticality, I will document that it has become a pivotal reference
point in Neurodynamics. Furthermore, I will emphasize the not yet fully
appreciated significance of allometric control processes. For dynamic fractals,
I will assemble reasons for attributing to them the capacity to adapt task
execution to contextual changes across a range of scales. The final Section
consists of general reflections on the implications of the reviewed data, and
identifies what appear to be issues of fundamental importance for future
research in the rapidly evolving topic of this review
Methods, tools, and computational environment for network-based analysis of biological data
Cancer currently affects more than 18 million persons world-wide annually. It is a leading cause of death and so far, only 60% cure rate can be reached within the most developed health care systems. The nature of cancer has been a mystery for centuries, until discoveries during recent decades shed light on the underlying molecular events. This depended on the progress in understanding cell and tissue biology, developments of molecular technologies and of -omics technologies. Cancer has then emerged as a highly heterogeneous disease, however with some very basic mechanistic features common to all cancers. To deal with the complexity of causes and consequences of pathological changes in the molecular machinery, methods and tools of network analysis can be helpful. Complexity of this task requires easy-to-use tools, which allow researchers and clinicians with no background in computer science to perform network analysis.
Paper I describes a web-based framework for network enrichment analysis (NEA), using previously developed algorithm and code. The developed platform introduces functionality for a researcher to use data pre-downloaded from various popular databases as well as own data, perform NEA and obtain statistical estimations, export results in different formats for publications or further use in research pipeline.
Paper II presents development of another web server, which provided vast opportunities for exploration and integrated analysis of multiple public cancer datasets that describe in vitro and in vivo sample collections. The web server linked molecular data at the single gene level, phenotype and pharmacological response variables, as well as pathway level variables calculated with NEA and connected to the framework presented in Paper I. Researchers can use the platform for creating multivariate models based on raw or pre-processed data from various sources, visualize created models, estimate their performance and compare them, export models for further usage in own research environments.
Paper III demonstrates NEAdriver, a practical application of NEA to probabilistic evaluation of driver roles of mutations reported in ten cancer cohorts. NEAdriver results are compared with cancer gene sets produced by other, both network analysis and network-free methods. The paper demonstrated ability of NEA to be used directly for discovering novel driver genes as well as being used in combination with other methods. In order to demonstrate benefits of using NEA, some rare cancer types and types with low mutation burden were used.
Paper IV is a manuscript evaluating performance of most representative methods of network analysis across methods’ parameters, functional ontologies and network versions. This study emphasizes discovery of novel functional associations for known genes, as opposed to previous tests dominated by a few “gold standard” genes which were well characterized previously. We performed the analysis in the context of various topological properties of networks, pathways of interest, and genes. It employed both existing, widely used topological metrics and a number of novel ones developed for this analysis
- …