11 research outputs found

    Large neighborhood search for the most strings with few bad columns problem

    Get PDF
    In this work, we consider the following NP-hard combinatorial optimization problem from computational biology. Given a set of input strings of equal length, the goal is to identify a maximum cardinality subset of strings that differ maximally in a pre-defined number of positions. First of all, we introduce an integer linear programming model for this problem. Second, two variants of a rather simple greedy strategy are proposed. Finally, a large neighborhood search algorithm is presented. A comprehensive experimental comparison among the proposed techniques shows, first, that larger neighborhood search generally outperforms both greedy strategies. Second, while large neighborhood search shows to be competitive with the stand-alone application of CPLEX for small- and medium-sized problem instances, it outperforms CPLEX in the context of larger instances.Peer ReviewedPostprint (author's final draft

    Design and analysis of algorithms for similarity search based on intrinsic dimension

    Get PDF
    One of the most fundamental operations employed in data mining tasks such as classification, cluster analysis, and anomaly detection, is that of similarity search. It has been used in numerous fields of application such as multimedia, information retrieval, recommender systems and pattern recognition. Specifically, a similarity query aims to retrieve from the database the most similar objects to a query object, where the underlying similarity measure is usually expressed as a distance function. The cost of processing similarity queries has been typically assessed in terms of the representational dimension of the data involved, that is, the number of features used to represent individual data objects. It is generally the case that high representational dimension would result in a significant increase in the processing cost of similarity queries. This relation is often attributed to an amalgamation of phenomena, collectively referred to as the curse of dimensionality. However, the observed effects of dimensionality in practice may not be as severe as expected. This has led to the development of models quantifying the complexity of data in terms of some measure of the intrinsic dimensionality. The generalized expansion dimension (GED) is one of such models, which estimates the intrinsic dimension in the vicinity of a query point q through the observation of the ranks and distances of pairs of neighbors with respect to q. This dissertation is mainly concerned with the design and analysis of search algorithms, based on the GED model. In particular, three variants of similarity search problem are considered, including adaptive similarity search, flexible aggregate similarity search, and subspace similarity search. The good practical performance of the proposed algorithms demonstrates the effectiveness of dimensionality-driven design of search algorithms

    Similarity of business process models : metrics and evaluation

    Get PDF
    It is common for large and complex organizations to maintain repositories of business process models in order to document and to continuously improve their operations. Given such a repository, this paper deals with the problem of retrieving those process models in the repository that most closely resemble a given process model or fragment thereof. The paper presents three similarity metrics that can be used to answer such queries: (i) label matching similarity that compares the labels attached to process model elements; (ii) structural similarity that compares element labels as well as the topology of process models; and (iii) behavioral similarity that compares element labels as well as causal relations captured in the process model. These similarity metrics are experimentally evaluated in terms of precision and recall, and in terms of correlation of the metrics with respect to human judgement. The experimental results show that all three metrics yield comparable results, with structural similarity slightly outperforming the other two metrics. Also, all three metrics outperform traditional search engines when it comes to searching through a repository for similar business process models

    On solving the most strings with few bad columns problem: An ILP model and heuristics

    Get PDF
    The most strings with few bad columns problem is an NP-hard combinatorial optimization problem from the bioinformatics field. This paper presents the first integer linear programming model for this problem. Moreover, a simple greedy heuristic and a more sophisticated extension, namely a greedy-based pilot method, are proposed. Experiments show that, as expected, the greedy-based pilot method improves over the greedy strategy. For problem instances of small and medium size the best results were obtained by solving the integer linear programming model by CPLEX, while the greedy-based pilot methods scales much better to large problem instances.Peer ReviewedPostprint (author's final draft

    Outsourced Similarity Search on Metric Data Assets

    Full text link

    Development of hybrid metaheuristics based on instance reduction for combinatorial optimization problems

    Get PDF
    113 p.La tesis presentada describe el desarrollo de algoritmos metaheurísticos híbridos, basados en reducción de instancias de problema. Éstos son enfocados en la resolución de problemas de optimización combinatorial. La motivación original de la investigación radicó en lograr, a través de la reducción de instancias de problemas, el uso efectivo de modelos de programación lineal entera (ILP) sobre problemas que dado su tamaño no admiten el uso directo con esta técnica exacta. En este contexto se presenta entre otros desarrollos el framework Construct, Merge, Solve & Adapt (CMSA) para resolución de problemas de optimización combinatorial en general, el cual posteriormente fue adaptado para mejorar el desempeño de otras metaheurísticas sin el uso de modelos ILP. Los algoritmos presentados mostraron resultados que compiten o superan el estado del arte sobre los problemas Minimum Common String Partition (MCSP), Minimum Covering Arborescence (MCA) y Weighted Independent Domination (WID)

    Development of hybrid metaheuristics based on instance reduction for combinatorial optimization problems

    Get PDF
    113 p.La tesis presentada describe el desarrollo de algoritmos metaheurísticos híbridos, basados en reducción de instancias de problema. Éstos son enfocados en la resolución de problemas de optimización combinatorial. La motivación original de la investigación radicó en lograr, a través de la reducción de instancias de problemas, el uso efectivo de modelos de programación lineal entera (ILP) sobre problemas que dado su tamaño no admiten el uso directo con esta técnica exacta. En este contexto se presenta entre otros desarrollos el framework Construct, Merge, Solve & Adapt (CMSA) para resolución de problemas de optimización combinatorial en general, el cual posteriormente fue adaptado para mejorar el desempeño de otras metaheurísticas sin el uso de modelos ILP. Los algoritmos presentados mostraron resultados que compiten o superan el estado del arte sobre los problemas Minimum Common String Partition (MCSP), Minimum Covering Arborescence (MCA) y Weighted Independent Domination (WID)

    Contributions `a la r´esolution de probl`emes d’optimisation combinatoires NP-difficiles

    Get PDF
    Cette th�ese porte sur des algorithmes e�caces pour la r�esolution de probl�emes d'optimisation combinatoires NP-di�ciles, avec deux contributions. La premi�ere contribution consiste en la proposition d'un nouvel algorithme multiob- jectif hybride combinant un algorithme g�en�etique avec un op�erateur de recherche bas�e sur l'optimisation par essaims de particules. L'objectif de cette hybridation est de surmonter les situations de convergence lente des algorithmes g�en�etiques multiobjectifs lors de la r�e- solution de probl�emes di�ciles �a plus de deux objectifs. Dans le sch�ema hybride propos�e, un algorithme g�en�etique multiobjectif Pareto applique p�eriodiquement un algorithme d'op- timisation par essaim de particules pour optimiser une fonction d'adaptation scalaire sur une population archive. Deux variantes de cet algorithme hybride sont propos�ees et adap- t�ees pour la r�esolution du probl�eme du sac �a dos multiobjectif. Les r�esultats exp�erimentaux prouvent que les algorithmes hybrides sont plus performants que les algorithmes standards. La seconde contribution concerne l'am�elioration d'un algorithme heuristique de recherche locale dit PALS (pour l'anglais Problem Aware Local Search) sp�eci�que au probl�eme d'as- semblage de fragments d'ADN, un probl�eme d'optimisation combinatoire NP-di�cile en bio-informatique des s�equences. Deux modi�cations �a PALS sont propos�ees. La premi�ere modi�cation permet d'�eviter les ph�enom�enes de convergence pr�ematur�ee vers des optima lo- caux. La seconde modi�cation conduit �a une r�eduction signi�cative des temps de calcul tout en conservant la pr�ecision des r�esultats. Apr�es des exp�erimentations r�ealis�ees sur les jeux de donn�ees disponibles dans la litt�erature, nos nouvelles variantes de PALS se r�ev�elent tr�es comp�etitives par rapport aux variantes existantes et �a d'autres algorithmes d'assemblage

    Efficient Algorithms for Similarity Search in Axis-Aligned Subspaces

    No full text
    corecore