Search CORE

1,289 research outputs found

Generalised median of a set of correspondences based on the hamming distance.

Author: A Solé
B Zitová
CF Moreno-García
CF Moreno-García
CF Moreno-García
CF Moreno-García
D Conte
F Serratosa
F Serratosa
F Serratosa
G Navarro
H Bunke
H Bunke
HW Kuhn
J Munkres
L Franek
L Franek
L Franek
M Ferrer
P Bille
P Foggia
R Jonker
S Saha
S Vega-Pons
X Jiang
X Jiang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/11/2016
Field of study

A correspondence is a set of mappings that establishes a relation between the elements of two data structures (i.e. sets of points, strings, trees or graphs). If we consider several correspondences between the same two structures, one option to define a representative of them is through the generalised median correspondence. In general, the computation of the generalised median is an NP-complete task. In this paper, we present two methods to calculate the generalised median correspondence of multiple correspondences. The first one obtains the optimal solution in cubic time, but it is restricted to the Hamming distance. The second one obtains a sub-optimal solution through an iterative approach, but does not have any restrictions with respect to the used distance. We compare both proposals in terms of the distance to the true generalised median and runtime

Open Access Institutional Repository at Robert Gordon University

Modelling the generalised median correspondence through an edit distance.

Author: A Sanfeliu
A Solé-Ribalta
B Zitová
CF Moreno-García
CF Moreno-García
CF Moreno-García
CF Moreno-García
CF Moreno-García
CF Moreno-García
CF Moreno-García
F Serratosa
F Zhou
G Navarro
H Bunke
H Bunke
HW Kuhn
J Munkres
L Franek
L Franek
L Franek
M Ferrer
M Vento
P Bille
R Jonker
RA Wagner
TS Caetano
X Cortés
X Cortés
X Gao
X Jiang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/08/2018
Field of study

On the one hand, classification applications modelled by structural pattern recognition, in which elements are represented as strings, trees or graphs, have been used for the last thirty years. In these models, structural distances are modelled as the correspondence (also called matching or labelling) between all the local elements (for instance nodes or edges) that generates the minimum sum of local distances. On the other hand, the generalised median is a well-known concept used to obtain a reliable prototype of data such as strings, graphs and data clusters. Recently, the structural distance and the generalised median has been put together to define a generalise median of matchings to solve some classification and learning applications. In this paper, we present an improvement in which the Correspondence edit distance is used instead of the classical Hamming distance. Experimental validation shows that the new approach obtains better results in reasonable runtime compared to other median calculation strategies

Open Access Institutional Repository at Robert Gordon University

On the role of metaheuristic optimization in bioinformatics

Author: Benito Sergio
Calvet Laura
Juan Angel A
Prados Ferran
Publication venue: 'Royal College of Obstetricians & Gynaecologists (RCOG)'
Publication date: 01/01/2022
Field of study

Metaheuristic algorithms are employed to solve complex and large-scale optimization problems in many different fields, from transportation and smart cities to finance. This paper discusses how metaheuristic algorithms are being applied to solve different optimization problems in the area of bioinformatics. While the text provides references to many optimization problems in the area, it focuses on those that have attracted more interest from the optimization community. Among the problems analyzed, the paper discusses in more detail the molecular docking problem, the protein structure prediction, phylogenetic inference, and different string problems. In addition, references to other relevant optimization problems are also given, including those related to medical imaging or gene selection for classification. From the previous analysis, the paper generates insights on research opportunities for the Operations Research and Computer Science communities in the field of bioinformatics

Modelling phylogeny in 16S rRNA gene sequencing datasets using string kernels

Author: Filippi Sarah
Ish-Horowicz Jonathan
Publication venue
Publication date: 16/02/2023
Field of study

Bacterial community composition is measured using 16S rRNA (ribosomal ribonucleic acid) gene sequencing, for which one of the defining characteristics is the phylogenetic relationships that exist between variables. Here, we demonstrate the utility of modelling these relationships in two statistical tasks (the two sample test and host trait prediction) by employing string kernels originally proposed in natural language processing. We show via simulation studies that a kernel two-sample test using the proposed kernels, which explicitly model phylogenetic relationships, is powerful while also being sensitive to the phylogenetic scale of the difference between the two populations. We also demonstrate how the proposed kernels can be used with Gaussian processes to improve predictive performance in host trait prediction. Our method is implemented in the Python package StringPhylo (available at github.com/jonathanishhorowicz/stringphylo)

arXiv.org e-Print Archive