Search CORE

138 research outputs found

A hybrid evolutionary algorithm based on solution merging for the longest arc-preserving common subsequence problem

Author: Blesa Aguilera Maria Josep
Blum Christian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

The longest arc-preserving common subsequence problem is an NP-hard combinatorial optimization problem from the field of computational biology. This problem finds applications, in particular, in the comparison of art-annotated ribonucleic acid (RNA) sequences. In this work we propose a simple, hybrid evolutionary algorithm to tackle this problem. The most important feature of this algorithm concerns a crossover operator based on solution merging. In solution merging, two or more solutions to the problem are merged, and an exact technique is used to find the best solution within this union. It is experimentally shown that the proposed algorithm outperforms a heuristic from the literature.Peer ReviewedPostprint (author's final draft

arXiv.org e-Print Archive

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Digital.CSIC

Hybrid techniques based on solving reduced problem instances for a longest common subsequence problem

Author: Blesa Aguilera Maria Josep
Blum Christian
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

Finding the longest common subsequence of a given set of input strings is a relevant problem arising in various practical settings. One of these problems is the so-called longest arc-preserving common subsequence problem. This NP-hard combinatorial optimization problem was introduced for the comparison of arc-annotated ribonucleic acid (RNA) sequences. In this work we present an integer linear programming (ILP) formulation of the problem. As even in the context of rather small problem instances the application of a general purpose ILP solver is not viable due to the size of the model, we study alternative ways based on model reduction in order to take profit from this ILP model. First, we present a heuristic way for reducing the model, with the subsequent application of an ILP solver. Second, we propose the application of an iterative hybrid algorithm that makes use of an ILP solver for generating high quality solutions at each iteration. Experimental results concerning artificial and real problem instances show that the proposed techniques outperform an available technique from the literature.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Digital.CSIC

Density-based algorithms for active and anytime clustering

Author: Mai Son
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 26/09/2014
Field of study

Data intensive applications like biology, medicine, and neuroscience require effective and efficient data mining technologies. Advanced data acquisition methods produce a constantly increasing volume and complexity. As a consequence, the need of new data mining technologies to deal with complex data has emerged during the last decades. In this thesis, we focus on the data mining task of clustering in which objects are separated in different groups (clusters) such that objects inside a cluster are more similar than objects in different clusters. Particularly, we consider density-based clustering algorithms and their applications in biomedicine. The core idea of the density-based clustering algorithm DBSCAN is that each object within a cluster must have a certain number of other objects inside its neighborhood. Compared with other clustering algorithms, DBSCAN has many attractive benefits, e.g., it can detect clusters with arbitrary shape and is robust to outliers, etc. Thus, DBSCAN has attracted a lot of research interest during the last decades with many extensions and applications. In the first part of this thesis, we aim at developing new algorithms based on the DBSCAN paradigm to deal with the new challenges of complex data, particularly expensive distance measures and incomplete availability of the distance matrix. Like many other clustering algorithms, DBSCAN suffers from poor performance when facing expensive distance measures for complex data. To tackle this problem, we propose a new algorithm based on the DBSCAN paradigm, called Anytime Density-based Clustering (A-DBSCAN), that works in an anytime scheme: in contrast to the original batch scheme of DBSCAN, the algorithm A-DBSCAN first produces a quick approximation of the clustering result and then continuously refines the result during the further run. Experts can interrupt the algorithm, examine the results, and choose between (1) stopping the algorithm at any time whenever they are satisfied with the result to save runtime and (2) continuing the algorithm to achieve better results. Such kind of anytime scheme has been proven in the literature as a very useful technique when dealing with time consuming problems. We also introduced an extended version of A-DBSCAN called A-DBSCAN-XS which is more efficient and effective than A-DBSCAN when dealing with expensive distance measures. Since DBSCAN relies on the cardinality of the neighborhood of objects, it requires the full distance matrix to perform. For complex data, these distances are usually expensive, time consuming or even impossible to acquire due to high cost, high time complexity, noisy and missing data, etc. Motivated by these potential difficulties of acquiring the distances among objects, we propose another approach for DBSCAN, called Active Density-based Clustering (Act-DBSCAN). Given a budget limitation B, Act-DBSCAN is only allowed to use up to B pairwise distances ideally to produce the same result as if it has the entire distance matrix at hand. The general idea of Act-DBSCAN is that it actively selects the most promising pairs of objects to calculate the distances between them and tries to approximate as much as possible the desired clustering result with each distance calculation. This scheme provides an efficient way to reduce the total cost needed to perform the clustering. Thus it limits the potential weakness of DBSCAN when dealing with the distance sparseness problem of complex data. As a fundamental data clustering algorithm, density-based clustering has many applications in diverse fields. In the second part of this thesis, we focus on an application of density-based clustering in neuroscience: the segmentation of the white matter fiber tracts in human brain acquired from Diffusion Tensor Imaging (DTI). We propose a model to evaluate the similarity between two fibers as a combination of structural similarity and connectivity-related similarity of fiber tracts. Various distance measure techniques from fields like time-sequence mining are adapted to calculate the structural similarity of fibers. Density-based clustering is used as the segmentation algorithm. We show how A-DBSCAN and A-DBSCAN-XS are used as novel solutions for the segmentation of massive fiber datasets and provide unique features to assist experts during the fiber segmentation process.Datenintensive Anwendungen wie Biologie, Medizin und Neurowissenschaften erfordern effektive und effiziente Data-Mining-Technologien. Erweiterte Methoden der Datenerfassung erzeugen stetig wachsende Datenmengen und Komplexit\"at. In den letzten Jahrzehnten hat sich daher ein Bedarf an neuen Data-Mining-Technologien f\"ur komplexe Daten ergeben. In dieser Arbeit konzentrieren wir uns auf die Data-Mining-Aufgabe des Clusterings, in der Objekte in verschiedenen Gruppen (Cluster) getrennt werden, so dass Objekte in einem Cluster untereinander viel \"ahnlicher sind als Objekte in verschiedenen Clustern. Insbesondere betrachten wir dichtebasierte Clustering-Algorithmen und ihre Anwendungen in der Biomedizin. Der Kerngedanke des dichtebasierten Clustering-Algorithmus DBSCAN ist, dass jedes Objekt in einem Cluster eine bestimmte Anzahl von anderen Objekten in seiner Nachbarschaft haben muss. Im Vergleich mit anderen Clustering-Algorithmen hat DBSCAN viele attraktive Vorteile, zum Beispiel kann es Cluster mit beliebiger Form erkennen und ist robust gegen\"uber Ausrei{\ss}ern. So hat DBSCAN in den letzten Jahrzehnten gro{\ss}es Forschungsinteresse mit vielen Erweiterungen und Anwendungen auf sich gezogen. Im ersten Teil dieser Arbeit wollen wir auf die Entwicklung neuer Algorithmen eingehen, die auf dem DBSCAN Paradigma basieren, um mit den neuen Herausforderungen der komplexen Daten, insbesondere teurer Abstandsma{\ss}e und unvollst\"andiger Verf\"ugbarkeit der Distanzmatrix umzugehen. Wie viele andere Clustering-Algorithmen leidet DBSCAN an schlechter Per- formanz, wenn es teuren Abstandsma{\ss}en f\"ur komplexe Daten gegen\"uber steht. Um dieses Problem zu l\"osen, schlagen wir einen neuen Algorithmus vor, der auf dem DBSCAN Paradigma basiert, genannt Anytime Density-based Clustering (A-DBSCAN), der mit einem Anytime Schema funktioniert. Im Gegensatz zu dem urspr\"unglichen Schema DBSCAN, erzeugt der Algorithmus A-DBSCAN zuerst eine schnelle Ann\"aherung des Clusterings-Ergebnisses und verfeinert dann kontinuierlich das Ergebnis im weiteren Verlauf. Experten k\"onnen den Algorithmus unterbrechen, die Ergebnisse pr\"ufen und w\"ahlen zwischen (1) Anhalten des Algorithmus zu jeder Zeit, wann immer sie mit dem Ergebnis zufrieden sind, um Laufzeit sparen und (2) Fortsetzen des Algorithmus, um bessere Ergebnisse zu erzielen. Eine solche Art eines "Anytime Schemas" ist in der Literatur als eine sehr n\"utzliche Technik erprobt, wenn zeitaufwendige Problemen anfallen. Wir stellen auch eine erweiterte Version von A-DBSCAN als A-DBSCAN-XS vor, die effizienter und effektiver als A-DBSCAN beim Umgang mit teuren Abstandsma{\ss}en ist. Da DBSCAN auf der Kardinalit\"at der Nachbarschaftsobjekte beruht, ist es notwendig, die volle Distanzmatrix auszurechen. F\"ur komplexe Daten sind diese Distanzen in der Regel teuer, zeitaufwendig oder sogar unm\"oglich zu errechnen, aufgrund der hohen Kosten, einer hohen Zeitkomplexit\"at oder verrauschten und fehlende Daten. Motiviert durch diese m\"oglichen Schwierigkeiten der Berechnung von Entfernungen zwischen Objekten, schlagen wir einen anderen Ansatz f\"ur DBSCAN vor, namentlich Active Density-based Clustering (Act-DBSCAN). Bei einer Budgetbegrenzung B, darf Act-DBSCAN nur bis zu B ideale paarweise Distanzen verwenden, um das gleiche Ergebnis zu produzieren, wie wenn es die gesamte Distanzmatrix zur Hand h\"atte. Die allgemeine Idee von Act-DBSCAN ist, dass es aktiv die erfolgversprechendsten Paare von Objekten w\"ahlt, um die Abst\"ande zwischen ihnen zu berechnen, und versucht, sich so viel wie m\"oglich dem gew\"unschten Clustering mit jeder Abstandsberechnung zu n\"ahern. Dieses Schema bietet eine effiziente M\"oglichkeit, die Gesamtkosten der Durchf\"uhrung des Clusterings zu reduzieren. So schr\"ankt sie die potenzielle Schw\"ache des DBSCAN beim Umgang mit dem Distance Sparseness Problem von komplexen Daten ein. Als fundamentaler Clustering-Algorithmus, hat dichte-basiertes Clustering viele Anwendungen in den unterschiedlichen Bereichen. Im zweiten Teil dieser Arbeit konzentrieren wir uns auf eine Anwendung des dichte-basierten Clusterings in den Neurowissenschaften: Die Segmentierung der wei{\ss}en Substanz bei Faserbahnen im menschlichen Gehirn, die vom Diffusion Tensor Imaging (DTI) erfasst werden. Wir schlagen ein Modell vor, um die \"Ahnlichkeit zwischen zwei Fasern als einer Kombination von struktureller und konnektivit\"atsbezogener \"Ahnlichkeit von Faserbahnen zu beurteilen. Verschiedene Abstandsma{\ss}e aus Bereichen wie dem Time-Sequence Mining werden angepasst, um die strukturelle \"Ahnlichkeit von Fasern zu berechnen. Dichte-basiertes Clustering wird als Segmentierungsalgorithmus verwendet. Wir zeigen, wie A-DBSCAN und A-DBSCAN-XS als neuartige L\"osungen f\"ur die Segmentierung von sehr gro{\ss}en Faserdatens\"atzen verwendet werden, und bieten innovative Funktionen, um Experten w\"ahrend des Fasersegmentierungsprozesses zu unterst\"utzen

A flexible metaheuristic framework for solving rich vehicle routing problems

Author: Vogel Ulrich
Publication venue: Shaker Verlag
Publication date: 01/01/2012
Field of study

Route planning is one of the most studied research topics in the operations research area. While the standard vehicle routing problem (VRP) is the classical problem formulation, additional requirements arising from practical scenarios such as time windows or vehicle compartments are covered in a wide range of so-called rich VRPs. Many solution algorithms for various VRP variants have been developed over time as well, especially within the class of so-called metaheuristics. In practice, routing software must be tailored to the business rules and planning problems of a specific company to provide valuable decision support. This also concerns the embedded solution methods of such decision support systems. Yet, publications dealing with flexibility and customization of VRP heuristics are rare. To fill this gap this thesis describes the design of a flexible framework to facilitate and accelerate the development of custom metaheuristics for the solution of a broad range of rich VRPs. The first part of the thesis provides background information to the reader on the field of vehicle routing problems and on metaheuristic solution methods - the most common and widely-used solution methods to solve VRPs. Specifically, emphasis is put on methods based on local search (for intensification of the search) and large neighborhood search (for diversification of the search), which are combined to hybrid methods and which are the foundation of the proposed framework. Then, the main part elaborates on the concepts and the design of the metaheuristic VRP framework. The framework fulfills requirements of flexibility, simplicity, accuracy, and speed, enforcing the structuring and standardization of the development process and enabling the reuse of code. Essentially, it provides a library of well-known and accepted heuristics for the standard VRP together with a set of mechanisms to adapt these heuristics to specific VRPs. Heuristics and adaptation mechanisms such as templates for user-definable checking functions are explained on a pseudocode level first, and the most relevant classes of a reference implementation using the Microsoft .NET framework are presented afterwards. Finally, the third part of the thesis demonstrates the use of the framework for developing problem-specific solution methods by exemplifying specific customizations for five rich VRPs with diverse characteristics, namely the VRP with time windows, the VRP with compartments, the split delivery VRP, the periodic VRP, and the truck and trailer routing problem. These adaptations refer to data structures and neighborhood search methods and can serve as a source of inspiration to the reader when designing algorithms for new, so far unstudied VRPs. Computational results are presented to show the effectiveness and efficiency of the proposed framework and methods, which are competitive with current state-of-the-art solvers of the literature. Special attention is given to the overall robustness of heuristics, which is an important aspect for practical application

Kölner UniversitätsPublikationsServer

35th Symposium on Theoretical Aspects of Computer Science: STACS 2018, February 28-March 3, 2018, Caen, France

Author: STACS
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik GmbH, Dagstuhl Publishing
Publication date: 01/02/2018
Field of study

Digitale Bibliothek Thüringen

Density-based algorithms for active and anytime clustering

Author: Mai Son
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 26/09/2014
Field of study

Digitale Hochschulschriften der LMU

Approches générales de résolution pour les problèmes multi-attributs de tournées de véhicules et confection d'horaires

Author: Vidal Thibaut
Publication venue
Publication date: 01/03/2013
Field of study

Thèse réalisée en cotutelle entre l'Université de Montréal et l'Université de Technologie de TroyesLe problème de tournées de véhicules (VRP) implique de planifier les itinéraires d'une flotte de véhicules afin de desservir un ensemble de clients à moindre coût. Ce problème d'optimisation combinatoire NP-difficile apparait dans de nombreux domaines d'application, notamment en logistique, télécommunications, robotique ou gestion de crise dans des contextes militaires et humanitaires. Ces applications amènent différents contraintes, objectifs et décisions supplémentaires ; des "attributs" qui viennent compléter les formulations classiques du problème. Les nombreux VRP Multi-Attributs (MAVRP) qui s'ensuivent sont le support d'une littérature considérable, mais qui manque de méthodes généralistes capables de traiter efficacement un éventail significatif de variantes. Par ailleurs, la résolution de problèmes "riches", combinant de nombreux attributs, pose d'importantes difficultés méthodologiques. Cette thèse contribue à relever ces défis par le biais d'analyses structurelles des problèmes, de développements de stratégies métaheuristiques, et de méthodes unifiées. Nous présentons tout d'abord une étude transversale des concepts à succès de 64 méta-heuristiques pour 15 MAVRP afin d'en cerner les "stratégies gagnantes". Puis, nous analysons les problèmes et algorithmes d'ajustement d'horaires en présence d'une séquence de tâches fixée, appelés problèmes de "timing". Ces méthodes, développées indépendamment dans différents domaines de recherche liés au transport, ordonnancement, allocation de ressource et même régression isotonique, sont unifiés dans une revue multidisciplinaire. Un algorithme génétique hybride efficace est ensuite proposé, combinant l'exploration large des méthodes évolutionnaires, les capacités d'amélioration agressive des métaheuristiques à voisinage, et une évaluation bi-critère des solutions considérant coût et contribution à la diversité de la population. Les meilleures solutions connues de la littérature sont retrouvées ou améliorées pour le VRP classique ainsi que des variantes avec multiples dépôts et périodes. La méthode est étendue aux VRP avec contraintes de fenêtres de temps, durée de route, et horaires de conducteurs. Ces applications mettent en jeu de nouvelles méthodes d'évaluation efficaces de contraintes temporelles relaxées, des phases de décomposition, et des recherches arborescentes pour l'insertion des pauses des conducteurs. Un algorithme de gestion implicite du placement des dépôts au cours de recherches locales, par programmation dynamique, est aussi proposé. Des études expérimentales approfondies démontrent la contribution notable des nouvelles stratégies au sein de plusieurs cadres méta-heuristiques. Afin de traiter la variété des attributs, un cadre de résolution heuristique modulaire est présenté ainsi qu'un algorithme génétique hybride unifié (UHGS). Les attributs sont gérés par des composants élémentaires adaptatifs. Des expérimentations sur 26 variantes du VRP et 39 groupes d'instances démontrent la performance remarquable de UHGS qui, avec une unique implémentation et paramétrage, égalise ou surpasse les nombreux algorithmes dédiés, issus de plus de 180 articles, révélant ainsi que la généralité ne s'obtient pas forcément aux dépends de l'efficacité pour cette classe de problèmes. Enfin, pour traiter les problèmes riches, UHGS est étendu au sein d'un cadre de résolution parallèle coopératif à base de décomposition, d'intégration de solutions partielles, et de recherche guidée. L'ensemble de ces travaux permet de jeter un nouveau regard sur les MAVRP et les problèmes de timing, leur résolution par des méthodes méta-heuristiques, ainsi que les méthodes généralistes pour l'optimisation combinatoire.The Vehicle Routing Problem (VRP) involves designing least cost delivery routes to service a geographically-dispersed set of customers while taking into account vehicle-capacity constraints. This NP-hard combinatorial optimization problem is linked with multiple applications in logistics, telecommunications, robotics, crisis management in military and humanitarian frameworks, among others. Practical routing applications are usually quite distinct from the academic cases, encompassing additional sets of specific constraints, objectives and decisions which breed further new problem variants. The resulting "Multi-Attribute" Vehicle Routing Problems (MAVRP) are the support of a vast literature which, however, lacks unified methods capable of addressing multiple MAVRP. In addition, some "rich" VRPs, i.e. those that involve several attributes, may be difficult to address because of the wide array of combined and possibly antagonistic decisions they require. This thesis contributes to address these challenges by means of problem structure analysis, new metaheuristics and unified method developments. The "winning strategies" of 64 state-of-the-art algorithms for 15 different MAVRP are scrutinized in a unifying review. Another analysis is targeted on "timing" problems and algorithms for adjusting the execution dates of a given sequence of tasks. Such methods, independently studied in different research domains related to routing, scheduling, resource allocation, and even isotonic regression are here surveyed in a multidisciplinary review. A Hybrid Genetic Search with Advanced Diversity Control (HGSADC) is then introduced, which combines the exploration breadth of population-based evolutionary search, the aggressive-improvement capabilities of neighborhood-based metaheuristics, and a bi-criteria evaluation of solutions based on cost and diversity measures. Results of remarkable quality are achieved on classic benchmark instances of the capacitated VRP, the multi-depot VRP, and the periodic VRP. Further extensions of the method to VRP variants with constraints on time windows, limited route duration, and truck drivers' statutory pauses are also proposed. New route and neighborhood evaluation procedures are introduced to manage penalized infeasible solutions w.r.t. to time-window and duration constraints. Tree-search procedures are used for drivers' rest scheduling, as well as advanced search limitation strategies, memories and decomposition phases. A dynamic programming-based neighborhood search is introduced to optimally select the depot, vehicle type, and first customer visited in the route during local searches. The notable contribution of these new methodological elements is assessed within two different metaheuristic frameworks. To further advance general-purpose MAVRP methods, we introduce a new component-based heuristic resolution framework and a Unified Hybrid Genetic Search (UHGS), which relies on modular self-adaptive components for addressing problem specifics. Computational experiments demonstrate the groundbreaking performance of UHGS. With a single implementation, unique parameter setting and termination criterion, this algorithm matches or outperforms all current problem-tailored methods from more than 180 articles, on 26 vehicle routing variants and 39 benchmark sets. To address rich problems, UHGS was included in a new parallel cooperative solution framework called "Integrative Cooperative Search (ICS)", based on problem decompositions, partial solutions integration, and global search guidance. This compendium of results provides a novel view on a wide range of MAVRP and timing problems, on efficient heuristic searches, and on general-purpose solution methods for combinatorial optimization problems

Dépôt Institutionnel Numérique

28th Annual Symposium on Combinatorial Pattern Matching : CPM 2017, July 4-6, 2017, Warsaw, Poland

Author
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/07/2017
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Subject index volumes 1–92

Author
Publication venue: Published by Elsevier B.V.
Publication date
Field of study

Elsevier - Publisher Connector

Subseries Join and Compression of Time Series Data Based on Non-uniform Segmentation

Author: Lin Yi
Publication venue: 'University of Waterloo'
Publication date: 01/01/2008
Field of study

A time series is composed of a sequence of data items that are measured at uniform intervals. Many application areas generate or manipulate time series, including finance, medicine, digital audio, and motion capture. Efficiently searching a large time series database is still a challenging problem, especially when partial or subseries matches are needed. This thesis proposes a new denition of subseries join, a symmetric generalization of subseries matching, which finds similar subseries in two or more time series datasets. A solution is proposed to compute the subseries join based on a hierarchical feature representation. This hierarchical feature representation is generated by an anisotropic diffusion scale-space analysis and a non-uniform segmentation method. Each segment is represented by a minimal polynomial envelope in a reduced-dimensionality space. Based on the hierarchical feature representation, all features in a dataset are indexed in an R-tree, and candidate matching features of two datasets are found by an R-tree join operation. Given candidate matching features, a dynamic programming algorithm is developed to compute the final subseries join. To improve storage efficiency, a hierarchical compression scheme is proposed to compress features. The minimal polynomial envelope representation is transformed to a Bezier spline envelope representation. The control points of each Bezier spline are then hierarchically differenced and an arithmetic coding is used to compress these differences. To empirically evaluate their effectiveness, the proposed subseries join and compression techniques are tested on various publicly available datasets. A large motion capture database is also used to verify the techniques in a real-world application. The experiments show that the proposed subseries join technique can better tolerate noise and local scaling than previous work, and the proposed compression technique can also achieve about 85% higher compression rates than previous work with the same distortion error

University of Waterloo's Institutional Repository