1,289 research outputs found
Generalised median of a set of correspondences based on the hamming distance.
A correspondence is a set of mappings that establishes a relation between the elements of two data structures (i.e. sets of points, strings, trees or graphs). If we consider several correspondences between the same two structures, one option to define a representative of them is through the generalised median correspondence. In general, the computation of the generalised median is an NP-complete task. In this paper, we present two methods to calculate the generalised median correspondence of multiple correspondences. The first one obtains the optimal solution in cubic time, but it is restricted to the Hamming distance. The second one obtains a sub-optimal solution through an iterative approach, but does not have any restrictions with respect to the used distance. We compare both proposals in terms of the distance to the true generalised median and runtime
Modelling the generalised median correspondence through an edit distance.
On the one hand, classification applications modelled by structural pattern recognition, in which elements are represented as strings, trees or graphs, have been used for the last thirty years. In these models, structural distances are modelled as the correspondence (also called matching or labelling) between all the local elements (for instance nodes or edges) that generates the minimum sum of local distances. On the other hand, the generalised median is a well-known concept used to obtain a reliable prototype of data such as strings, graphs and data clusters. Recently, the structural distance and the generalised median has been put together to define a generalise median of matchings to solve some classification and learning applications. In this paper, we present an improvement in which the Correspondence edit distance is used instead of the classical Hamming distance. Experimental validation shows that the new approach obtains better results in reasonable runtime compared to other median calculation strategies
On the role of metaheuristic optimization in bioinformatics
Metaheuristic algorithms are employed to solve complex and large-scale optimization problems in many different fields, from transportation and smart cities to finance. This paper discusses how metaheuristic algorithms are being applied to solve different optimization problems in the area of bioinformatics. While the text provides references to many optimization problems in the area, it focuses on those that have attracted more interest from the optimization community. Among the problems analyzed, the paper discusses in more detail the molecular docking problem, the protein structure prediction, phylogenetic inference, and different string problems. In addition, references to other relevant optimization problems are also given, including those related to medical imaging or gene selection for classification. From the previous analysis, the paper generates insights on research opportunities for the Operations Research and Computer Science communities in the field of bioinformatics
Modelling phylogeny in 16S rRNA gene sequencing datasets using string kernels
Bacterial community composition is measured using 16S rRNA (ribosomal
ribonucleic acid) gene sequencing, for which one of the defining
characteristics is the phylogenetic relationships that exist between variables.
Here, we demonstrate the utility of modelling these relationships in two
statistical tasks (the two sample test and host trait prediction) by employing
string kernels originally proposed in natural language processing. We show via
simulation studies that a kernel two-sample test using the proposed kernels,
which explicitly model phylogenetic relationships, is powerful while also being
sensitive to the phylogenetic scale of the difference between the two
populations. We also demonstrate how the proposed kernels can be used with
Gaussian processes to improve predictive performance in host trait prediction.
Our method is implemented in the Python package StringPhylo (available at
github.com/jonathanishhorowicz/stringphylo)
- …