Search CORE

2,641 research outputs found

Neural Networks for Complex Data

Author: Cottrell Marie
Olteanu Madalina
Rossi Fabrice
Rynkiewicz Joseph
Villa-Vialaneix Nathalie
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/10/2012
Field of study

Artificial neural networks are simple and efficient machine learning tools. Defined originally in the traditional setting of simple vector data, neural network models have evolved to address more and more difficulties of complex real world problems, ranging from time evolving data to sophisticated data structures such as graphs and functions. This paper summarizes advances on those themes from the last decade, with a focus on results obtained by members of the SAMM team of Universit\'e Paris

arXiv.org e-Print Archive

HAL-Paris1

Classifying sequences by the optimized dissimilarity space embedding approach: a case study on the solubility analysis of the E. coli proteome

Author: Livi Lorenzo
Rizzi Antonello
Sadeghian Alireza
Publication venue: 'IOS Press'
Publication date: 01/01/2015
Field of study

We evaluate a version of the recently-proposed classification system named Optimized Dissimilarity Space Embedding (ODSE) that operates in the input space of sequences of generic objects. The ODSE system has been originally presented as a classification system for patterns represented as labeled graphs. However, since ODSE is founded on the dissimilarity space representation of the input data, the classifier can be easily adapted to any input domain where it is possible to define a meaningful dissimilarity measure. Here we demonstrate the effectiveness of the ODSE classifier for sequences by considering an application dealing with the recognition of the solubility degree of the Escherichia coli proteome. Solubility, or analogously aggregation propensity, is an important property of protein molecules, which is intimately related to the mechanisms underlying the chemico-physical process of folding. Each protein of our dataset is initially associated with a solubility degree and it is represented as a sequence of symbols, denoting the 20 amino acid residues. The herein obtained computational results, which we stress that have been achieved with no context-dependent tuning of the ODSE system, confirm the validity and generality of the ODSE-based approach for structured data classification.Comment: 10 pages, 49 reference

arXiv.org e-Print Archive

Archivio della ricerca- Università di Roma La Sapienza

Developments in the theory of randomized shortest paths with a comparison of graph node distances

Author: Kivimäki Ilkka
Saerens Marco
Shimbo Masashi
Publication venue: 'Elsevier BV'
Publication date: 03/10/2013
Field of study

There have lately been several suggestions for parametrized distances on a graph that generalize the shortest path distance and the commute time or resistance distance. The need for developing such distances has risen from the observation that the above-mentioned common distances in many situations fail to take into account the global structure of the graph. In this article, we develop the theory of one family of graph node distances, known as the randomized shortest path dissimilarity, which has its foundation in statistical physics. We show that the randomized shortest path dissimilarity can be easily computed in closed form for all pairs of nodes of a graph. Moreover, we come up with a new definition of a distance measure that we call the free energy distance. The free energy distance can be seen as an upgrade of the randomized shortest path dissimilarity as it defines a metric, in addition to which it satisfies the graph-geodetic property. The derivation and computation of the free energy distance are also straightforward. We then make a comparison between a set of generalized distances that interpolate between the shortest path distance and the commute time, or resistance distance. This comparison focuses on the applicability of the distances in graph node clustering and classification. The comparison, in general, shows that the parametrized distances perform well in the tasks. In particular, we see that the results obtained with the free energy distance are among the best in all the experiments.Comment: 30 pages, 4 figures, 3 table

arXiv.org e-Print Archive

DIAL UCLouvain

Similarities on Graphs: Kernels versus Proximity Measures

Author: Avrachenkov Konstantin
Chebotarev Pavel
Rubanov Dmytro
Publication venue: 'Elsevier BV'
Publication date: 17/02/2018
Field of study

We analytically study proximity and distance properties of various kernels and similarity measures on graphs. This helps to understand the mathematical nature of such measures and can potentially be useful for recommending the adoption of specific similarity measures in data analysis.Comment: 16 page

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Analysis of a data matrix and a graph: Metagenomic data and the phylogenetic tree

Author: Purdom Elizabeth
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2011
Field of study

In biological experiments researchers often have information in the form of a graph that supplements observed numerical data. Incorporating the knowledge contained in these graphs into an analysis of the numerical data is an important and nontrivial task. We look at the example of metagenomic data---data from a genomic survey of the abundance of different species of bacteria in a sample. Here, the graph of interest is a phylogenetic tree depicting the interspecies relationships among the bacteria species. We illustrate that analysis of the data in a nonstandard inner-product space effectively uses this additional graphical information and produces more meaningful results.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS402 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Modelling and recognition of protein contact networks by multiple kernel learning and dissimilarity representations

Author: De Santis Enrico
Giuliani Alessandro
Martino Alessio
Rizzi Antonello
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

Multiple kernel learning is a paradigm which employs a properly constructed chain of kernel functions able to simultaneously analyse different data or different representations of the same data. In this paper, we propose an hybrid classification system based on a linear combination of multiple kernels defined over multiple dissimilarity spaces. The core of the training procedure is the joint optimisation of kernel weights and representatives selection in the dissimilarity spaces. This equips the system with a two-fold knowledge discovery phase: by analysing the weights, it is possible to check which representations are more suitable for solving the classification problem, whereas the pivotal patterns selected as representatives can give further insights on the modelled system, possibly with the help of field-experts. The proposed classification system is tested on real proteomic data in order to predict proteins' functional role starting from their folded structure: specifically, a set of eight representations are drawn from the graph-based protein folded description. The proposed multiple kernel-based system has also been benchmarked against a clustering-based classification system also able to exploit multiple dissimilarities simultaneously. Computational results show remarkable classification capabilities and the knowledge discovery analysis is in line with current biological knowledge, suggesting the reliability of the proposed system

Multidisciplinary Digital Publishing Institute

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Archivio della ricerca- Università di Roma La Sapienza