8 research outputs found

    Bidirectional string anchors: A new string sampling mechanism

    Get PDF
    The minimizers sampling mechanism is a popular mechanism for string sampling introduced independently by Schleimer et al. [SIGMOD 2003] and by Roberts et al. [Bioinf. 2004]. Given two positive integers w and k, it selects the lexicographically smallest length-k substring in every fragment of w consecutive length-k substrings (in every sliding window of length w+k-1). Minimizers samples are approximately uniform, locally consistent, and computable in linear time. Although they do not have good worst-case guarantees on their size, they are often small in practice. They thus have been successfully employed in several string processing applications. Two main disadvantages of minimizers sampling mechanisms are: first, they also do not have good guarantees on the expected size of their samples for every combination of w and k; and, second, indexes that are constructed over their samples do not have good worst-case guarantees for on-line pattern searches. To alleviate these disadvantages, we introduce bidirectional string anchors (bd-anchors), a new string sampling mechanism. Given a positive integer , our mechanism selects the lexicographically smallest rotation in every length- fragment (in every sliding window of length ). We show that bd-anchors samples are also approximately uniform, locally consistent, and computable in linear time. In addition, our experimen

    Boosting Local Search for the Maximum Independent Set Problem

    Get PDF
    An independent set of a graph G = (V, E) with vertices V and edges E is a subset S ⊆ V, such that the subgraph induced by S does not contain any edges. The goal of the maximum independent set problem (MIS problem) is to find an independent set of maximum size. It is equivalent to the well-known vertex cover problem (VC problem) and maximum clique problem. This thesis consists of two main parts. In the first one we compare the currently best algorithms for finding near-optimal independent sets and vertex covers in large, sparse graphs. They are Iterated Local Search (ILS) by Andrade et al. [2], a heuristic that uses local search for the MIS problem and NuMVC by Cai et al. [6], a local search algorithm for the VC problem. As of now, there are no methods to solve these large instances exactly in any reasonable time. Therefore these heuristic algorithms are the best option. In the second part we analyze a series of techniques, some of which lead to a significant speed up of the ILS algorithm. This is done by removing specific ver

    Succinct Data Structures for Parameterized Pattern Matching and Related Problems

    Get PDF
    Let T be a fixed text-string of length n and P be a varying pattern-string of length |P| \u3c= n. Both T and P contain characters from a totally ordered alphabet Sigma of size sigma \u3c= n. Suffix tree is the ubiquitous data structure for answering a pattern matching query: report all the positions i in T such that T[i + k - 1] = P[k], 1 \u3c= k \u3c= |P|. Compressed data structures support pattern matching queries, using much lesser space than the suffix tree, mainly by relying on a crucial property of the leaves in the tree. Unfortunately, in many suffix tree variants (such as parameterized suffix tree, order-preserving suffix tree, and 2-dimensional suffix tree), this property does not hold. Consequently, compressed representations of these suffix tree variants have been elusive. We present the first compressed data structures for two important variants of the pattern matching problem: (1) Parameterized Matching -- report a position i in T if T[i + k - 1] = f(P[k]), 1 \u3c= k \u3c= |P|, for a one-to-one function f that renames the characters in P to the characters in T[i,i+|P|-1], and (2) Order-preserving Matching -- report a position i in T if T[i + j - 1] and T[i + k -1] have the same relative order as that of P[j] and P[k], 1 \u3c= j \u3c k \u3c= |P|. For each of these two problems, the existing suffix tree variant requires O(n*log n) bits of space and answers a query in O(|P|*log sigma + occ) time, where occ is the number of starting positions where a match exists. We present data structures that require O(n*log sigma) bits of space and answer a query in O((|P|+occ) poly(log n)) time. As a byproduct, we obtain compressed data structures for a few other variants, as well as introduce two new techniques (of independent interest) for designing compressed data structures for pattern matching

    Dynamic capacities and priorities in stable matching

    Full text link
    Cette thèse aborde les facettes dynamiques des principes fondamentaux du problème de l'appariement stable plusieurs-à-un. Nous menons notre étude dans le contexte du choix de l'école et de l'appariement entre les hôpitaux et les résidents. Dans la première étude, en utilisant le modèle résident-hôpital, nous étudions la complexité de calcul de l'optimisation des variations de capacité des hôpitaux afin de maximiser les résultats pour les résidents, tout en respectant les contraintes de stabilité et de budget. Nos résultats révèlent que le problème de décision est NP-complet et que le problème d'optimisation est inapproximable, même dans le cas de préférences strictes et d'allocations de capacités disjointes. Ces résultats posent des défis importants aux décideurs qui cherchent des solutions efficaces aux problèmes urgents du monde réel. Dans la seconde étude, en utilisant le modèle du choix de l'école, nous explorons l'optimisation conjointe de l'augmentation des capacités scolaires et de la réalisation d'appariements stables optimaux pour les étudiants au sein d'un marché élargi. Nous concevons une formulation innovante de programmation mathématique qui modélise la stabilité et l'expansion des capacités, et nous développons une méthode efficace de plan de coupe pour la résoudre. Des données réelles issues du système chilien de choix d'école valident l'impact potentiel de la planification de la capacité dans des conditions de stabilité. Dans la troisième étude, nous nous penchons sur la stabilité de l'appariement dans le cadre de priorités dynamiques, en nous concentrant principalement sur le choix de l'école. Nous introduisons un modèle qui tient compte des priorités des frères et sœurs, ce qui nécessite de nouveaux concepts de stabilité. Notre recherche identifie des scénarios où des appariements stables existent, accompagnés de mécanismes en temps polynomial pour leur découverte. Cependant, dans certains cas, nous prouvons également que la recherche d'un appariement stable de cardinalité maximale est NP-difficile sous des priorités dynamiques, ce qui met en lumière les défis liés à ces problèmes d'appariement. Collectivement, cette recherche contribue à une meilleure compréhension des capacités et des priorités dynamiques dans les scénarios d'appariement stable et ouvre de nouvelles questions et de nouvelles voies pour relever les défis d'allocation complexes dans le monde réel.This research addresses the dynamic facets in the fundamentals of the many-to-one stable matching problem. We conduct our study in the context of school choice and hospital-resident matching. In the first study, using the resident-hospital model, we investigate the computational complexity of optimizing hospital capacity variations to maximize resident outcomes, while respecting stability and budget constraints. Our findings reveal the NP-completeness of the decision problem and the inapproximability of the optimization problem, even under strict preferences and disjoint capacity allocations. These results pose significant challenges for policymakers seeking efficient solutions to pressing real-world issues. In the second study, using the school choice model, we explore the joint optimization of increasing school capacities and achieving student-optimal stable matchings within an expanded market. We devise an innovative mathematical programming formulation that models stability and capacity expansion, and we develop an effective cutting-plane method to solve it. Real-world data from the Chilean school choice system validates the potential impact of capacity planning under stability conditions. In the third study, we delve into stable matching under dynamic priorities, primarily focusing on school choice. We introduce a model that accounts for sibling priorities, necessitating novel stability concepts. Our research identifies scenarios where stable matchings exist, accompanied by polynomial-time mechanisms for their discovery. However, in some cases, we also prove the NP-hardness of finding a maximum cardinality stable matching under dynamic priorities, shedding light on challenges related to these matching problems. Collectively, this research contributes to a deeper understanding of dynamic capacities and priorities within stable matching scenarios and opens new questions and new avenues for tackling complex allocation challenges in real-world settings
    corecore