9,034 research outputs found
Closest string with outliers
Background: Given n strings s1, …, sn each of length ℓ and a nonnegative integer d, the CLOSEST STRING problem asks to find a center string s such that none of the input strings has Hamming distance greater than d from s. Finding a common pattern in many – but not necessarily all – input strings is an important task that plays a role in many applications in bioinformatics. Results: Although the closest string model is robust to the oversampling of strings in the input, it is severely affected by the existence of outliers. We propose a refined model, the CLOSEST STRING WITH OUTLIERS (CSWO) problem, to overcome this limitation. This new model asks for a center string s that is within Hamming distance d to at least n – k of the n input strings, where k is a parameter describing the maximum number of outliers. A CSWO solution not only provides the center string as a representative for the set of strings but also reveals the outliers of the set. We provide fixed parameter algorithms for CSWO when d and k are parameters, for both bounded and unbounded alphabets. We also show that when the alphabet is unbounded the problem is W[1]-hard with respect to n – k, ℓ, and d. Conclusions: Our refined model abstractly models finding common patterns in several but not all input strings
Satisfiability, sequence niches, and molecular codes in cellular signaling
Biological information processing as implemented by regulatory and signaling
networks in living cells requires sufficient specificity of molecular
interaction to distinguish signals from one another, but much of regulation and
signaling involves somewhat fuzzy and promiscuous recognition of molecular
sequences and structures, which can leave systems vulnerable to crosstalk. This
paper examines a simple computational model of protein-protein interactions
which reveals both a sharp onset of crosstalk and a fragmentation of the
neutral network of viable solutions as more proteins compete for regions of
sequence space, revealing intrinsic limits to reliable signaling in the face of
promiscuity. These results suggest connections to both phase transitions in
constraint satisfaction problems and coding theory bounds on the size of
communication codes
On the String Consensus Problem and the Manhattan Sequence Consensus Problem
In the Manhattan Sequence Consensus problem (MSC problem) we are given
integer sequences, each of length , and we are to find an integer sequence
of length (called a consensus sequence), such that the maximum
Manhattan distance of from each of the input sequences is minimized. For
binary sequences Manhattan distance coincides with Hamming distance, hence in
this case the string consensus problem (also called string center problem or
closest string problem) is a special case of MSC. Our main result is a
practically efficient -time algorithm solving MSC for sequences.
Practicality of our algorithms has been verified experimentally. It improves
upon the quadratic algorithm by Amir et al.\ (SPIRE 2012) for string consensus
problem for binary strings. Similarly as in Amir's algorithm we use a
column-based framework. We replace the implied general integer linear
programming by its easy special cases, due to combinatorial properties of the
MSC for . We also show that for a general parameter any instance
can be reduced in linear time to a kernel of size , so the problem is
fixed-parameter tractable. Nevertheless, for this is still too large
for any naive solution to be feasible in practice.Comment: accepted to SPIRE 201
An Efficient Rank Based Approach for Closest String and Closest Substring
This paper aims to present a new genetic approach that uses rank distance for solving two known NP-hard problems, and to compare rank distance with other distance measures for strings. The two NP-hard problems we are trying to solve are closest string and closest substring. For each problem we build a genetic algorithm and we describe the genetic operations involved. Both genetic algorithms use a fitness function based on rank distance. We compare our algorithms with other genetic algorithms that use different distance measures, such as Hamming distance or Levenshtein distance, on real DNA sequences. Our experiments show that the genetic algorithms based on rank distance have the best results
Encoding Hard String Problems with Answer Set Programming
Despite the simple, one-dimensional nature of strings, several computationally hard problems on strings are known. Tackling hard problems beyond sizes of toy instances with straight-forward solutions is infeasible. To solve these problems on datasets of even small sizes, effort has to be put into the conception of algorithms leveraging profound characteristics of the input. Finding these characteristics can be eased by rapidly creating and evaluating prototypes of new concepts in how to tackle hard problems. Such a rapid-prototyping method for hard problems is answer set programming (ASP). In this light, we study the application of ASP on five NP-hard optimization problems in the field of strings. We provide MAX-SAT and ASP encodings, and empirically reason about the merits and flaws when working with ASP solvers
Swiftly Computing Center Strings
Hufsky F, Kuchenbecker L, Jahn K, Stoye J, Böcker S. Swiftly Computing Center Strings. BMC Bioinformatics. 2011;12(1): 106
- …