Search CORE

299 research outputs found

Modelling and recognition of protein contact networks by multiple kernel learning and dissimilarity representations

Author: De Santis Enrico
Giuliani Alessandro
Martino Alessio
Rizzi Antonello
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

Multiple kernel learning is a paradigm which employs a properly constructed chain of kernel functions able to simultaneously analyse different data or different representations of the same data. In this paper, we propose an hybrid classification system based on a linear combination of multiple kernels defined over multiple dissimilarity spaces. The core of the training procedure is the joint optimisation of kernel weights and representatives selection in the dissimilarity spaces. This equips the system with a two-fold knowledge discovery phase: by analysing the weights, it is possible to check which representations are more suitable for solving the classification problem, whereas the pivotal patterns selected as representatives can give further insights on the modelled system, possibly with the help of field-experts. The proposed classification system is tested on real proteomic data in order to predict proteins' functional role starting from their folded structure: specifically, a set of eight representations are drawn from the graph-based protein folded description. The proposed multiple kernel-based system has also been benchmarked against a clustering-based classification system also able to exploit multiple dissimilarities simultaneously. Computational results show remarkable classification capabilities and the knowledge discovery analysis is in line with current biological knowledge, suggesting the reliability of the proposed system

Archivio della ricerca- Università di Roma La Sapienza

Gromov-Wasserstein Averaging of Kernel and Distance Matrices

Author: Cuturi Marco
Peyré Gabriel
Solomon Justin
Publication venue: HAL CCSD
Publication date: 01/06/2016
Field of study

International audienceThis paper presents a new technique for computing the barycenter of a set of distance or kernel matrices. These matrices, which define the interrelationships between points sampled from individual domains, are not required to have the same size or to be in row-by-row correspondence. We compare these matrices using the softassign criterion , which measures the minimum distortion induced by a probabilistic map from the rows of one similarity matrix to the rows of another; this criterion amounts to a regularized version of the Gromov-Wasserstein (GW) distance between metric-measure spaces. The barycenter is then defined as a Fréchet mean of the input matrices with respect to this criterion, minimizing a weighted sum of softassign values. We provide a fast iterative algorithm for the resulting noncon-vex optimization problem, built upon state-of-the-art tools for regularized optimal transportation. We demonstrate its application to the computation of shape barycenters and to the prediction of energy levels from molecular configurations in quantum chemistry

INRIA a CCSD electronic archive server

Doctor of Philosophy

Author: Raman Parasaran
Publication venue: University of Utah
Publication date: 01/12/2013
Field of study

dissertationWith the tremendous growth of data produced in the recent years, it is impossible to identify patterns or test hypotheses without reducing data size. Data mining is an area of science that extracts useful information from the data by discovering patterns and structures present in the data. In this dissertation, we will largely focus on clustering which is often the first step in any exploratory data mining task, where items that are similar to each other are grouped together, making downstream data analysis robust. Different clustering techniques have different strengths, and the resulting groupings provide different perspectives on the data. Due to the unsupervised nature i.e., the lack of domain experts who can label the data, validation of results is very difficult. While there are measures that compute "goodness" scores for clustering solutions as a whole, there are few methods that validate the assignment of individual data items to their clusters. To address these challenges we focus on developing a framework that can generate, compare, combine, and evaluate different solutions to make more robust and significant statements about the data. In the first part of this dissertation, we present fast and efficient techniques to generate and combine different clustering solutions. We build on some recent ideas on efficient representations of clusters of partitions to develop a well founded metric that is spatially aware to compare clusterings. With the ability to compare clusterings, we describe a heuristic to combine different solutions to produce a single high quality clustering. We also introduce a Markov chain Monte Carlo approach to sample different clusterings from the entire landscape to provide the users with a variety of choices. In the second part of this dissertation, we build certificates for individual data items and study their influence on effective data reduction. We present a geometric approach by defining regions of influence for data items and clusters and use this to develop adaptive sampling techniques to speedup machine learning algorithms. This dissertation is therefore a systematic approach to study the landscape of clusterings in an attempt to provide a better understanding of the data

Closed Likelihood Ratio Testing Procedures to Assess Similarity of Covariance Matrices

Author: Akaike H.
Anderson T. W.
Antonio Punzo
Bagnato L.
Bensmail H.
Biernacki C.
Boente G.
Bozdogan H.
Bozdogan H.
Bretz F.
Campbell N. A.
Cavanaugh J. E.
Celeux G.
Christensen R.
Emerson S.
Fisher R. A.
Flury B. N.
Flury B. N.
Flury B. N.
Flury B. N.
Francesca Greselin
Giancristofaro Arboretti R.
Greselin F.
Hallin M.
Hochberg Y.
Holm S.
Jolicoeur P.
Manly B. F. J.
Marcus R.
R Development Core Team
Rencher A. C.
Schmidt-Nielsen K.
Schwarz G.
Westfall P.
Westfall P. H.
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Intelligent video surveillance

Author: Kangin Dmitry
Publication venue: Lancaster University
Publication date: 01/01/2016
Field of study

In the focus of this thesis are the new and modified algorithms for object detection, recognition and tracking within the context of video analytics. The manual video surveillance has been proven to have low effectiveness and, at the same time, high expense because of the need in manual labour of operators, which are additionally prone to erroneous decisions. Along with increase of the number of surveillance cameras, there is a strong need to push for automatisation of the video analytics. The benefits of this approach can be found both in military and civilian applications. For military applications, it can help in localisation and tracking of objects of interest. For civilian applications, the similar object localisation procedures can make the criminal investigations more effective, extracting the meaningful data from the massive video footage. Recently, the wide accessibility of consumer unmanned aerial vehicles has become a new threat as even the simplest and cheapest airborne vessels can carry some cargo that means they can be upgraded to a serious weapon. Additionally they can be used for spying that imposes a threat to a private life. The autonomous car driving systems are now impossible without applying machine vision methods. The industrial applications require automatic quality control, including non-destructive methods and particularly methods based on the video analysis. All these applications give a strong evidence in a practical need in machine vision algorithms for object detection, tracking and classification and gave a reason for writing this thesis. The contributions to knowledge of the thesis consist of two main parts: video tracking and object detection and recognition, unified by the common idea of its applicability to video analytics problems. The novel algorithms for object detection and tracking, described in this thesis, are unsupervised and have only a small number of parameters. The approach is based on rigid motion segmentation by Bayesian filtering. The Bayesian filter, which was proposed specially for this method and contributes to its novelty, is formulated as a generic approach, and then applied to the video analytics problems. The method is augmented with optional object coordinate estimation using plain two-dimensional terrain assumption which gives a basis for the algorithm usage inside larger sensor data fusion models. The proposed approach for object detection and classification is based on the evolving systems concept and the new Typicality-Eccentricity Data Analytics (TEDA) framework. The methods are capable of solving classical problems of data mining: clustering, classification, and regression. The methods are proposed in a domain-independent way and are capable of addressing shift and drift of the data streams. Examples are given for the clustering and classification of the imagery data. For all the developed algorithms, the experiments have shown sustainable results on the testing data. The practical applications of the proposed algorithms are carefully examined and tested

Lancaster E-Prints