649 research outputs found
Non-parametric resampling of random walks for spectral network clustering
Parametric resampling schemes have been recently introduced in complex
network analysis with the aim of assessing the statistical significance of
graph clustering and the robustness of community partitions. We propose here a
method to replicate structural features of complex networks based on the
non-parametric resampling of the transition matrix associated with an unbiased
random walk on the graph. We test this bootstrapping technique on synthetic and
real-world modular networks and we show that the ensemble of replicates
obtained through resampling can be used to improve the performance of standard
spectral algorithms for community detection.Comment: 5 pages, 2 figure
Discriminating different classes of biological networks by analyzing the graphs spectra distribution
The brain's structural and functional systems, protein-protein interaction,
and gene networks are examples of biological systems that share some features
of complex networks, such as highly connected nodes, modularity, and
small-world topology. Recent studies indicate that some pathologies present
topological network alterations relative to norms seen in the general
population. Therefore, methods to discriminate the processes that generate the
different classes of networks (e.g., normal and disease) might be crucial for
the diagnosis, prognosis, and treatment of the disease. It is known that
several topological properties of a network (graph) can be described by the
distribution of the spectrum of its adjacency matrix. Moreover, large networks
generated by the same random process have the same spectrum distribution,
allowing us to use it as a "fingerprint". Based on this relationship, we
introduce and propose the entropy of a graph spectrum to measure the
"uncertainty" of a random graph and the Kullback-Leibler and Jensen-Shannon
divergences between graph spectra to compare networks. We also introduce
general methods for model selection and network model parameter estimation, as
well as a statistical procedure to test the nullity of divergence between two
classes of complex networks. Finally, we demonstrate the usefulness of the
proposed methods by applying them on (1) protein-protein interaction networks
of different species and (2) on networks derived from children diagnosed with
Attention Deficit Hyperactivity Disorder (ADHD) and typically developing
children. We conclude that scale-free networks best describe all the
protein-protein interactions. Also, we show that our proposed measures
succeeded in the identification of topological changes in the network while
other commonly used measures (number of edges, clustering coefficient, average
path length) failed
Detection of regulator genes and eQTLs in gene networks
Genetic differences between individuals associated to quantitative phenotypic
traits, including disease states, are usually found in non-coding genomic
regions. These genetic variants are often also associated to differences in
expression levels of nearby genes (they are "expression quantitative trait
loci" or eQTLs for short) and presumably play a gene regulatory role, affecting
the status of molecular networks of interacting genes, proteins and
metabolites. Computational systems biology approaches to reconstruct causal
gene networks from large-scale omics data have therefore become essential to
understand the structure of networks controlled by eQTLs together with other
regulatory genes, and to generate detailed hypotheses about the molecular
mechanisms that lead from genotype to phenotype. Here we review the main
analytical methods and softwares to identify eQTLs and their associated genes,
to reconstruct co-expression networks and modules, to reconstruct causal
Bayesian gene and module networks, and to validate predicted networks in
silico.Comment: minor revision with typos corrected; review article; 24 pages, 2
figure
Fiedler Random Fields: A Large-Scale Spectral Approach to Statistical Network Modeling
International audienceStatistical models for networks have been typically committed to strong prior assumptions concerning the form of the modeled distributions. Moreover, the vast majority of currently available models are explicitly designed for capturing some specific graph properties (such as power-law degree distributions), which makes them unsuitable for application to domains where the behavior of the target quantities is not known a priori. The key contribution of this paper is twofold. First, we introduce the Fiedler delta statistic, based on the Laplacian spectrum of graphs, which allows to dispense with any parametric assumption concerning the modeled network properties. Second, we use the defined statistic to develop the Fiedler random field model, which allows for efficient estimation of edge distributions over large-scale random networks. After analyzing the dependence structure involved in Fiedler random fields, we estimate them over several real-world networks, showing that they achieve a much higher modeling accuracy than other well-known statistical approaches
Diffusion-Jump GNNs: Homophiliation via Learnable Metric Filters
High-order Graph Neural Networks (HO-GNNs) have been developed to infer
consistent latent spaces in the heterophilic regime, where the label
distribution is not correlated with the graph structure. However, most of the
existing HO-GNNs are hop-based, i.e., they rely on the powers of the transition
matrix. As a result, these architectures are not fully reactive to the
classification loss and the achieved structural filters have static supports.
In other words, neither the filters' supports nor their coefficients can be
learned with these networks. They are confined, instead, to learn combinations
of filters. To address the above concerns, we propose Diffusion-jump GNNs a
method relying on asymptotic diffusion distances that operates on jumps. A
diffusion-pump generates pairwise distances whose projections determine both
the support and coefficients of each structural filter. These filters are
called jumps because they explore a wide range of scales in order to find bonds
between scattered nodes with the same label. Actually, the full process is
controlled by the classification loss. Both the jumps and the diffusion
distances react to classification errors (i.e. they are learnable).
Homophiliation, i.e., the process of learning piecewise smooth latent spaces in
the heterophilic regime, is formulated as a Dirichlet problem: the known labels
determine the border nodes and the diffusion-pump ensures a minimal deviation
of the semi-supervised grouping from a canonical unsupervised grouping. This
triggers the update of both the diffusion distances and, consequently, the
jumps in order to minimize the classification error. The Dirichlet formulation
has several advantages. It leads to the definition of structural heterophily, a
novel measure beyond edge heterophily. It also allows us to investigate links
with (learnable) diffusion distances, absorbing random walks and stochastic
diffusion
Collaborative Artificial Intelligence Algorithms for Medical Imaging Applications
In this dissertation, we propose novel machine learning algorithms for high-risk medical imaging applications. Specifically, we tackle current challenges in radiology screening process and introduce cutting-edge methods for image-based diagnosis, detection and segmentation. We incorporate expert knowledge through eye-tracking, making the whole process human-centered. This dissertation contributes to machine learning, computer vision, and medical imaging research by: 1) introducing a mathematical formulation of radiologists level of attention, and sparsifying their gaze data for a better extraction and comparison of search patterns. 2) proposing novel, local and global, image analysis algorithms. Imaging based diagnosis and pattern analysis are high-risk Artificial Intelligence applications. A standard radiology screening procedure includes detection, diagnosis and measurement (often done with segmentation) of abnormalities. We hypothesize that having a true collaboration is essential for a better control mechanism, in such applications. In this regard, we propose to form a collaboration medium between radiologists and machine learning algorithms through eye-tracking. Further, we build a generic platform consisting of novel machine learning algorithms for each of these tasks. Our collaborative algorithm utilizes eye tracking and includes an attention model and gaze-pattern analysis, based on data clustering and graph sparsification. Then, we present a semi-supervised multi-task network for local analysis of image in radiologists\u27 ROIs, extracted in the previous step. To address missing tumors and analyze regions that are completely missed by radiologists during screening, we introduce a detection framework, S4ND: Single Shot Single Scale Lung Nodule Detection. Our proposed detection algorithm is specifically designed to handle tiny abnormalities in lungs, which are easy to miss by radiologists. Finally, we introduce a novel projective adversarial framework, PAN: Projective Adversarial Network for Medical Image Segmentation, for segmenting complex 3D structures/organs, which can be beneficial in the screening process by guiding radiologists search areas through segmentation of desired structure/organ
Semi-supervised Eigenvectors for Large-scale Locally-biased Learning
In many applications, one has side information, e.g., labels that are
provided in a semi-supervised manner, about a specific target region of a large
data set, and one wants to perform machine learning and data analysis tasks
"nearby" that prespecified target region. For example, one might be interested
in the clustering structure of a data graph near a prespecified "seed set" of
nodes, or one might be interested in finding partitions in an image that are
near a prespecified "ground truth" set of pixels. Locally-biased problems of
this sort are particularly challenging for popular eigenvector-based machine
learning and data analysis tools. At root, the reason is that eigenvectors are
inherently global quantities, thus limiting the applicability of
eigenvector-based methods in situations where one is interested in very local
properties of the data.
In this paper, we address this issue by providing a methodology to construct
semi-supervised eigenvectors of a graph Laplacian, and we illustrate how these
locally-biased eigenvectors can be used to perform locally-biased machine
learning. These semi-supervised eigenvectors capture
successively-orthogonalized directions of maximum variance, conditioned on
being well-correlated with an input seed set of nodes that is assumed to be
provided in a semi-supervised manner. We show that these semi-supervised
eigenvectors can be computed quickly as the solution to a system of linear
equations; and we also describe several variants of our basic method that have
improved scaling properties. We provide several empirical examples
demonstrating how these semi-supervised eigenvectors can be used to perform
locally-biased learning; and we discuss the relationship between our results
and recent machine learning algorithms that use global eigenvectors of the
graph Laplacian
- …