23 research outputs found
Identifying networks with common organizational principles
Many complex systems can be represented as networks, and the problem of
network comparison is becoming increasingly relevant. There are many techniques
for network comparison, from simply comparing network summary statistics to
sophisticated but computationally costly alignment-based approaches. Yet it
remains challenging to accurately cluster networks that are of a different size
and density, but hypothesized to be structurally similar. In this paper, we
address this problem by introducing a new network comparison methodology that
is aimed at identifying common organizational principles in networks. The
methodology is simple, intuitive and applicable in a wide variety of settings
ranging from the functional classification of proteins to tracking the
evolution of a world trade network.Comment: 26 pages, 7 figure
Supervised Learning with Similarity Functions
We address the problem of general supervised learning when data can only be
accessed through an (indefinite) similarity function between data points.
Existing work on learning with indefinite kernels has concentrated solely on
binary/multi-class classification problems. We propose a model that is generic
enough to handle any supervised learning task and also subsumes the model
previously proposed for classification. We give a "goodness" criterion for
similarity functions w.r.t. a given supervised learning task and then adapt a
well-known landmarking technique to provide efficient algorithms for supervised
learning using "good" similarity functions. We demonstrate the effectiveness of
our model on three important super-vised learning problems: a) real-valued
regression, b) ordinal regression and c) ranking where we show that our method
guarantees bounded generalization error. Furthermore, for the case of
real-valued regression, we give a natural goodness definition that, when used
in conjunction with a recent result in sparse vector recovery, guarantees a
sparse predictor with bounded generalization error. Finally, we report results
of our learning algorithms on regression and ordinal regression tasks using
non-PSD similarity functions and demonstrate the effectiveness of our
algorithms, especially that of the sparse landmark selection algorithm that
achieves significantly higher accuracies than the baseline methods while
offering reduced computational costs.Comment: To appear in the proceedings of NIPS 2012, 30 page
Positive Definite Kernels in Machine Learning
This survey is an introduction to positive definite kernels and the set of
methods they have inspired in the machine learning literature, namely kernel
methods. We first discuss some properties of positive definite kernels as well
as reproducing kernel Hibert spaces, the natural extension of the set of
functions associated with a kernel defined
on a space . We discuss at length the construction of kernel
functions that take advantage of well-known statistical models. We provide an
overview of numerous data-analysis methods which take advantage of reproducing
kernel Hilbert spaces and discuss the idea of combining several kernels to
improve the performance on certain tasks. We also provide a short cookbook of
different kernels which are particularly useful for certain data-types such as
images, graphs or speech segments.Comment: draft. corrected a typo in figure
Local kernel canonical correlation analysis with application to virtual drug screening
Drug discovery is the process of identifying compounds which have potentially meaningful biological activity. A major challenge that arises is that the number of compounds to search over can be quite large, sometimes numbering in the millions, making experimental testing intractable. For this reason computational methods are employed to filter out those compounds which do not exhibit strong biological activity. This filtering step, also called virtual screening reduces the search space, allowing for the remaining compounds to be experimentally tested
Interpretable statistics for complex modelling: quantile and topological learning
As the complexity of our data increased exponentially in the last decades, so has our
need for interpretable features. This thesis revolves around two paradigms to approach
this quest for insights.
In the first part we focus on parametric models, where the problem of interpretability
can be seen as a âparametrization selectionâ. We introduce a quantile-centric
parametrization and we show the advantages of our proposal in the context of regression,
where it allows to bridge the gap between classical generalized linear (mixed)
models and increasingly popular quantile methods.
The second part of the thesis, concerned with topological learning, tackles the
problem from a non-parametric perspective. As topology can be thought of as a way
of characterizing data in terms of their connectivity structure, it allows to represent
complex and possibly high dimensional through few features, such as the number of
connected components, loops and voids. We illustrate how the emerging branch of
statistics devoted to recovering topological structures in the data, Topological Data
Analysis, can be exploited both for exploratory and inferential purposes with a special
emphasis on kernels that preserve the topological information in the data.
Finally, we show with an application how these two approaches can borrow strength
from one another in the identification and description of brain activity through fMRI
data from the ABIDE project