22,912 research outputs found

    Pattern Matching and Consensus Problems on Weighted Sequences and Profiles

    Get PDF
    We study pattern matching problems on two major representations of uncertain sequences used in molecular biology: weighted sequences (also known as position weight matrices, PWM) and profiles (i.e., scoring matrices). In the simple version, in which only the pattern or only the text is uncertain, we obtain efficient algorithms with theoretically-provable running times using a variation of the lookahead scoring technique. We also consider a general variant of the pattern matching problems in which both the pattern and the text are uncertain. Central to our solution is a special case where the sequences have equal length, called the consensus problem. We propose algorithms for the consensus problem parameterized by the number of strings that match one of the sequences. As our basic approach, a careful adaptation of the classic meet-in-the-middle algorithm for the knapsack problem is used. On the lower bound side, we prove that our dependence on the parameter is optimal up to lower-order terms conditioned on the optimality of the original algorithm for the knapsack problem.Comment: 22 page

    Computational Performance Evaluation of Two Integer Linear Programming Models for the Minimum Common String Partition Problem

    Full text link
    In the minimum common string partition (MCSP) problem two related input strings are given. "Related" refers to the property that both strings consist of the same set of letters appearing the same number of times in each of the two strings. The MCSP seeks a minimum cardinality partitioning of one string into non-overlapping substrings that is also a valid partitioning for the second string. This problem has applications in bioinformatics e.g. in analyzing related DNA or protein sequences. For strings with lengths less than about 1000 letters, a previously published integer linear programming (ILP) formulation yields, when solved with a state-of-the-art solver such as CPLEX, satisfactory results. In this work, we propose a new, alternative ILP model that is compared to the former one. While a polyhedral study shows the linear programming relaxations of the two models to be equally strong, a comprehensive experimental comparison using real-world as well as artificially created benchmark instances indicates substantial computational advantages of the new formulation.Comment: arXiv admin note: text overlap with arXiv:1405.5646 This paper version replaces the one submitted on January 10, 2015, due to detected error in the calculation of the variables involved in the ILP model

    On the singular spectrum of the Almost Mathieu operator. Arithmetics and Cantor spectra of integrable models

    Get PDF
    I review a recent progress towards solution of the Almost Mathieu equation (A.G. Abanov, J.C. Talstra, P.B. Wiegmann, Nucl. Phys. B 525, 571, 1998), known also as Harper's equation or Azbel-Hofstadter problem. The spectrum of this equation is known to be a pure singular continuum with a rich hierarchical structure. Few years ago it has been found that the almost Mathieu operator is integrable. An asymptotic solution of this operator became possible due analysis the Bethe Ansatz equations.Comment: Based on the lecture given at 13th Nishinomiya-Yukawa Memorial Symposium on Dynamics of Fields and Strings, Nishinomiya, Japan, 12-13 Nov 1998, and talk given at YITP Workshop on New Aspects of Strings and Fields, Kyoto, Japan, 16-18 Nov 199

    Consensus Strings with Small Maximum Distance and Small Distance Sum

    Get PDF
    The parameterised complexity of consensus string problems (Closest String, Closest Substring, Closest String with Outliers) is investigated in a more general setting, i. e., with a bound on the maximum Hamming distance and a bound on the sum of Hamming distances between solution and input strings. We completely settle the parameterised complexity of these generalised variants of Closest String and Closest Substring, and partly for Closest String with Outliers; in addition, we answer some open questions from the literature regarding the classical problem variants with only one distance bound. Finally, we investigate the question of polynomial kernels and respective lower bounds

    Average-Case Optimal Approximate Circular String Matching

    Full text link
    Approximate string matching is the problem of finding all factors of a text t of length n that are at a distance at most k from a pattern x of length m. Approximate circular string matching is the problem of finding all factors of t that are at a distance at most k from x or from any of its rotations. In this article, we present a new algorithm for approximate circular string matching under the edit distance model with optimal average-case search time O(n(k + log m)/m). Optimal average-case search time can also be achieved by the algorithms for multiple approximate string matching (Fredriksson and Navarro, 2004) using x and its rotations as the set of multiple patterns. Here we reduce the preprocessing time and space requirements compared to that approach

    Dividing population genetic distance data with the software Partitioning Optimization with Restricted Growth Strings (PORGS): an application for Chinook salmon (Oncorhynchus tshawytscha), Vancouver Island, British Columbia

    Get PDF
    A new method of finding the optimal group membership and number of groupings to partition population genetic distance data is presented. The software program Partitioning Optimization with Restricted Growth Strings (PORGS), visits all possible set partitions and deems acceptable partitions to be those that reduce mean intracluster distance. The optimal number of groups is determined with the gap statistic which compares PORGS results with a reference distribution. The PORGS method was validated by a simulated data set with a known distribution. For efficiency, where values of n were larger, restricted growth strings (RGS) were used to bipartition populations during a nested search (bi-PORGS). Bi-PORGS was applied to a set of genetic data from 18 Chinook salmon (Oncorhynchus tshawytscha) populations from the west coast of Vancouver Island. The optimal grouping of these populations corresponded to four geographic locations: 1) Quatsino Sound, 2) Nootka Sound, 3) Clayoquot +Barkley sounds, and 4) southwest Vancouver Island. However, assignment of populations to groups did not strictly reflect the geographical divisions; fish of Barkley Sound origin that had strayed into the Gold River and close genetic similarity between transferred and donor populations meant groupings crossed geographic boundaries. Overall, stock structure determined by this partitioning method was similar to that determined by the unweighted pair-group method with arithmetic averages (UPGMA), an agglomerative clustering algorithm

    Faster Binary Mean Computation Under Dynamic Time Warping

    Get PDF
    Many consensus string problems are based on Hamming distance. We replace Hamming distance by the more flexible (e.g., easily coping with different input string lengths) dynamic time warping distance, best known from applications in time series mining. Doing so, we study the problem of finding a mean string that minimizes the sum of (squared) dynamic time warping distances to a given set of input strings. While this problem is known to be NP-hard (even for strings over a three-element alphabet), we address the binary alphabet case which is known to be polynomial-time solvable. We significantly improve on a previously known algorithm in terms of worst-case running time. Moreover, we also show the practical usefulness of one of our algorithms in experiments with real-world and synthetic data. Finally, we identify special cases solvable in linear time (e.g., finding a mean of only two binary input strings) and report some empirical findings concerning combinatorial properties of optimal means

    New perspectives on realism, tractability, and complexity in economics

    Get PDF
    Fuzzy logic and genetic algorithms are used to rework more realistic (and more complex) models of competitive markets. The resulting equilibria are significantly different from the ones predicted from the usual static analysis; the methodology solves the Walrasian problem of how markets can reach equilibrium, starting with firms trading at disparate prices. The modified equilibria found in these complex market models involve some mutual self-restraint on the part of the agents involved, relative to economically rational behaviour. Research (using similar techniques) into the evolution of collaborative behaviours in economics, and of altruism generally, is summarized; and the joint significance of these two bodies of work for public policy is reviewed. The possible extension of the fuzzy/ genetic methodology to other technical aspects of economics (including international trade theory, and development) is also discussed, as are the limitations to the usefulness of any type of theory in political domains. For the latter purpose, a more differentiated concept of rationality, appropriate to ill-structured choices, is developed. The philosophical case for laissez-faire policies is considered briefly; and the prospects for change in the way we ‘do economics’ are analysed
    • 

    corecore