Search CORE

23 research outputs found

Identifying networks with common organizational principles

Author: Deane Charlotte M.
Gaunt Robert E.
Ospina-Forero Luis
Reinert Gesine
Wegner Anatol E.
Publication venue
Publication date: 02/04/2017
Field of study

Many complex systems can be represented as networks, and the problem of network comparison is becoming increasingly relevant. There are many techniques for network comparison, from simply comparing network summary statistics to sophisticated but computationally costly alignment-based approaches. Yet it remains challenging to accurately cluster networks that are of a different size and density, but hypothesized to be structurally similar. In this paper, we address this problem by introducing a new network comparison methodology that is aimed at identifying common organizational principles in networks. The methodology is simple, intuitive and applicable in a wide variety of settings ranging from the functional classification of proteins to tracking the evolution of a world trade network.Comment: 26 pages, 7 figure

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

The University of Manchester - Institutional Repository

Supervised Learning with Similarity Functions

Author: Jain Prateek
Kar Purushottam
Publication venue
Publication date: 22/10/2012
Field of study

We address the problem of general supervised learning when data can only be accessed through an (indefinite) similarity function between data points. Existing work on learning with indefinite kernels has concentrated solely on binary/multi-class classification problems. We propose a model that is generic enough to handle any supervised learning task and also subsumes the model previously proposed for classification. We give a "goodness" criterion for similarity functions w.r.t. a given supervised learning task and then adapt a well-known landmarking technique to provide efficient algorithms for supervised learning using "good" similarity functions. We demonstrate the effectiveness of our model on three important super-vised learning problems: a) real-valued regression, b) ordinal regression and c) ranking where we show that our method guarantees bounded generalization error. Furthermore, for the case of real-valued regression, we give a natural goodness definition that, when used in conjunction with a recent result in sparse vector recovery, guarantees a sparse predictor with bounded generalization error. Finally, we report results of our learning algorithms on regression and ordinal regression tasks using non-PSD similarity functions and demonstrate the effectiveness of our algorithms, especially that of the sparse landmark selection algorithm that achieves significantly higher accuracies than the baseline methods while offering reduced computational costs.Comment: To appear in the proceedings of NIPS 2012, 30 page

arXiv.org e-Print Archive

CiteSeerX

Positive Definite Kernels in Machine Learning

Author: Cuturi Marco
Publication venue
Publication date: 01/01/2009
Field of study

This survey is an introduction to positive definite kernels and the set of methods they have inspired in the machine learning literature, namely kernel methods. We first discuss some properties of positive definite kernels as well as reproducing kernel Hibert spaces, the natural extension of the set of functions

\{k(x,\cdot),x\in\mathcal{X}\}

associated with a kernel

k

defined on a space

\mathcal{X}

. We discuss at length the construction of kernel functions that take advantage of well-known statistical models. We provide an overview of numerous data-analysis methods which take advantage of reproducing kernel Hilbert spaces and discuss the idea of combining several kernels to improve the performance on certain tasks. We also provide a short cookbook of different kernels which are particularly useful for certain data-types such as images, graphs or speech segments.Comment: draft. corrected a typo in figure

arXiv.org e-Print Archive

CiteSeerX

Local kernel canonical correlation analysis with application to virtual drug screening

Author: Grulke Christopher
Liu Yufeng
Marron J. S.
Samarov Daniel
Tropsha Alexander
Publication venue
Publication date: 01/01/2011
Field of study

Drug discovery is the process of identifying compounds which have potentially meaningful biological activity. A major challenge that arises is that the number of compounds to search over can be quite large, sometimes numbering in the millions, making experimental testing intractable. For this reason computational methods are employed to filter out those compounds which do not exhibit strong biological activity. This filtering step, also called virtual screening reduces the search space, allowing for the remaining compounds to be experimentally tested

arXiv.org e-Print Archive

PubMed Central

Carolina Digital Repository

Indefinite Core Vector Machine

Author: Schleif Frank-Michael
Tino Peter
Publication venue: 'Elsevier BV'
Publication date: 01/11/2017
Field of study

Crossref

University of Birmingham Research Portal

Interpretable statistics for complex modelling: quantile and topological learning

Author: Padellini Tullia
Publication venue
Publication date: 22/02/2019
Field of study

As the complexity of our data increased exponentially in the last decades, so has our need for interpretable features. This thesis revolves around two paradigms to approach this quest for insights. In the first part we focus on parametric models, where the problem of interpretability can be seen as a “parametrization selection”. We introduce a quantile-centric parametrization and we show the advantages of our proposal in the context of regression, where it allows to bridge the gap between classical generalized linear (mixed) models and increasingly popular quantile methods. The second part of the thesis, concerned with topological learning, tackles the problem from a non-parametric perspective. As topology can be thought of as a way of characterizing data in terms of their connectivity structure, it allows to represent complex and possibly high dimensional through few features, such as the number of connected components, loops and voids. We illustrate how the emerging branch of statistics devoted to recovering topological structures in the data, Topological Data Analysis, can be exploited both for exploratory and inferential purposes with a special emphasis on kernels that preserve the topological information in the data. Finally, we show with an application how these two approaches can borrow strength from one another in the identification and description of brain activity through fMRI data from the ABIDE project

Archivio della ricerca- Università di Roma La Sapienza