22,912 research outputs found
Pattern Matching and Consensus Problems on Weighted Sequences and Profiles
We study pattern matching problems on two major representations of uncertain
sequences used in molecular biology: weighted sequences (also known as position
weight matrices, PWM) and profiles (i.e., scoring matrices). In the simple
version, in which only the pattern or only the text is uncertain, we obtain
efficient algorithms with theoretically-provable running times using a
variation of the lookahead scoring technique. We also consider a general
variant of the pattern matching problems in which both the pattern and the text
are uncertain. Central to our solution is a special case where the sequences
have equal length, called the consensus problem. We propose algorithms for the
consensus problem parameterized by the number of strings that match one of the
sequences. As our basic approach, a careful adaptation of the classic
meet-in-the-middle algorithm for the knapsack problem is used. On the lower
bound side, we prove that our dependence on the parameter is optimal up to
lower-order terms conditioned on the optimality of the original algorithm for
the knapsack problem.Comment: 22 page
Computational Performance Evaluation of Two Integer Linear Programming Models for the Minimum Common String Partition Problem
In the minimum common string partition (MCSP) problem two related input
strings are given. "Related" refers to the property that both strings consist
of the same set of letters appearing the same number of times in each of the
two strings. The MCSP seeks a minimum cardinality partitioning of one string
into non-overlapping substrings that is also a valid partitioning for the
second string. This problem has applications in bioinformatics e.g. in
analyzing related DNA or protein sequences. For strings with lengths less than
about 1000 letters, a previously published integer linear programming (ILP)
formulation yields, when solved with a state-of-the-art solver such as CPLEX,
satisfactory results. In this work, we propose a new, alternative ILP model
that is compared to the former one. While a polyhedral study shows the linear
programming relaxations of the two models to be equally strong, a comprehensive
experimental comparison using real-world as well as artificially created
benchmark instances indicates substantial computational advantages of the new
formulation.Comment: arXiv admin note: text overlap with arXiv:1405.5646 This paper
version replaces the one submitted on January 10, 2015, due to detected error
in the calculation of the variables involved in the ILP model
On the singular spectrum of the Almost Mathieu operator. Arithmetics and Cantor spectra of integrable models
I review a recent progress towards solution of the Almost Mathieu equation
(A.G. Abanov, J.C. Talstra, P.B. Wiegmann, Nucl. Phys. B 525, 571, 1998), known
also as Harper's equation or Azbel-Hofstadter problem. The spectrum of this
equation is known to be a pure singular continuum with a rich hierarchical
structure. Few years ago it has been found that the almost Mathieu operator is
integrable. An asymptotic solution of this operator became possible due
analysis the Bethe Ansatz equations.Comment: Based on the lecture given at 13th Nishinomiya-Yukawa Memorial
Symposium on Dynamics of Fields and Strings, Nishinomiya, Japan, 12-13 Nov
1998, and talk given at YITP Workshop on New Aspects of Strings and Fields,
Kyoto, Japan, 16-18 Nov 199
Consensus Strings with Small Maximum Distance and Small Distance Sum
The parameterised complexity of consensus string problems (Closest String, Closest Substring, Closest String with Outliers) is investigated in a more general setting, i. e., with a bound on the maximum Hamming distance and a bound on the sum of Hamming distances between solution and input strings. We completely settle the parameterised complexity of these generalised variants of Closest String and Closest Substring, and partly for Closest String with Outliers; in addition, we answer some open questions from the literature regarding the classical problem variants with only one distance bound. Finally, we investigate the question of polynomial kernels and respective lower bounds
Average-Case Optimal Approximate Circular String Matching
Approximate string matching is the problem of finding all factors of a text t
of length n that are at a distance at most k from a pattern x of length m.
Approximate circular string matching is the problem of finding all factors of t
that are at a distance at most k from x or from any of its rotations. In this
article, we present a new algorithm for approximate circular string matching
under the edit distance model with optimal average-case search time O(n(k + log
m)/m). Optimal average-case search time can also be achieved by the algorithms
for multiple approximate string matching (Fredriksson and Navarro, 2004) using
x and its rotations as the set of multiple patterns. Here we reduce the
preprocessing time and space requirements compared to that approach
Dividing population genetic distance data with the software Partitioning Optimization with Restricted Growth Strings (PORGS): an application for Chinook salmon (Oncorhynchus tshawytscha), Vancouver Island, British Columbia
A new method of finding the optimal group membership and number of groupings to partition population genetic distance data is presented. The software program Partitioning Optimization with Restricted Growth Strings (PORGS), visits all possible set partitions and deems
acceptable partitions to be those that reduce mean intracluster distance. The optimal number of groups is determined with the gap statistic which compares PORGS results with a reference distribution. The PORGS method was validated by a simulated data set with a known distribution.
For efficiency, where values of n were larger, restricted growth strings (RGS) were used to bipartition populations during a nested search (bi-PORGS). Bi-PORGS was applied to a set of genetic data from 18 Chinook salmon (Oncorhynchus
tshawytscha) populations from the west coast of Vancouver Island. The optimal grouping of these populations
corresponded to four geographic locations: 1) Quatsino Sound, 2) Nootka Sound, 3) Clayoquot +Barkley sounds,
and 4) southwest Vancouver Island. However, assignment of populations to groups did not strictly reflect the geographical divisions; fish of Barkley Sound origin that had strayed into the Gold River and close genetic similarity
between transferred and donor populations meant groupings crossed geographic boundaries. Overall, stock structure determined by this partitioning method was similar to that
determined by the unweighted pair-group method with arithmetic averages (UPGMA), an agglomerative clustering algorithm
Faster Binary Mean Computation Under Dynamic Time Warping
Many consensus string problems are based on Hamming distance. We replace Hamming distance by the more flexible (e.g., easily coping with different input string lengths) dynamic time warping distance, best known from applications in time series mining. Doing so, we study the problem of finding a mean string that minimizes the sum of (squared) dynamic time warping distances to a given set of input strings. While this problem is known to be NP-hard (even for strings over a three-element alphabet), we address the binary alphabet case which is known to be polynomial-time solvable. We significantly improve on a previously known algorithm in terms of worst-case running time. Moreover, we also show the practical usefulness of one of our algorithms in experiments with real-world and synthetic data. Finally, we identify special cases solvable in linear time (e.g., finding a mean of only two binary input strings) and report some empirical findings concerning combinatorial properties of optimal means
New perspectives on realism, tractability, and complexity in economics
Fuzzy logic and genetic algorithms are used to rework more realistic (and more complex) models of competitive markets. The resulting equilibria are significantly different from the ones predicted from the usual static analysis; the methodology solves the Walrasian problem of how markets can reach equilibrium, starting with firms trading at disparate prices.
The modified equilibria found in these complex market models involve some mutual self-restraint on the part of the agents involved, relative to economically rational behaviour. Research (using similar techniques) into the evolution of collaborative behaviours in economics, and of altruism generally, is summarized; and the joint significance of these two bodies of work for public policy is reviewed.
The possible extension of the fuzzy/ genetic methodology to other technical aspects of economics (including international trade theory, and development) is also discussed, as are the limitations to the usefulness of any type of theory in political domains. For the latter purpose, a more differentiated concept of rationality, appropriate to ill-structured choices, is developed. The philosophical case for laissez-faire policies is considered briefly; and the prospects for change in the way we âdo economicsâ are analysed
- âŠ