Search CORE

9,034 research outputs found

Closest string with outliers

Author: A Ben-Dor
B Ma
B Ma
Bin Ma
Christina Boucher
D Marx
G Pavesi
J Dopazo
J Gramm
J Gramm
J Lanctot
K Lucas
L Wang
M Fellows
M Fellows
M Frances
M Garey
M Li
M Tompa
P Pevzner
R Downey
R Zhao
V Proutski
W Lenstra
X Deng
ZZ Chen
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background: Given n strings s1, …, sn each of length ℓ and a nonnegative integer d, the CLOSEST STRING problem asks to find a center string s such that none of the input strings has Hamming distance greater than d from s. Finding a common pattern in many – but not necessarily all – input strings is an important task that plays a role in many applications in bioinformatics. Results: Although the closest string model is robust to the oversampling of strings in the input, it is severely affected by the existence of outliers. We propose a refined model, the CLOSEST STRING WITH OUTLIERS (CSWO) problem, to overcome this limitation. This new model asks for a center string s that is within Hamming distance d to at least n – k of the n input strings, where k is a parameter describing the maximum number of outliers. A CSWO solution not only provides the center string as a representative for the set of strings but also reveals the outliers of the set. We provide fixed parameter algorithms for CSWO when d and k are parameters, for both bounded and unbounded alphabets. We also show that when the alphabet is unbounded the problem is W[1]-hard with respect to n – k, ℓ, and d. Conclusions: Our refined model abstractly models finding common patterns in several but not all input strings

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Satisfiability, sequence niches, and molecular codes in cellular signaling

Author: Bijlsma
Brannetti
Burack
Burger
C.R. Myers
Castagnoli
Cesareni
Correale
Djordjevic
Edelman
Figge
Friedgut
Gerland
Gomes
Gravner
Hellingwerf
Itzkovitz
Kirkpatrick
Landgraf
Lau
Marathe
Mayer
McClean
Mezard
Mezard
Mitchell
Monasson
Morrison
Noirel
Pawson
Percus
Poelwijk
Ramani
Sear
Sengupta
Shannon
Shannon
Tlusty
van Nimwegen
Zarrinpar
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 18/12/2007
Field of study

Biological information processing as implemented by regulatory and signaling networks in living cells requires sufficient specificity of molecular interaction to distinguish signals from one another, but much of regulation and signaling involves somewhat fuzzy and promiscuous recognition of molecular sequences and structures, which can leave systems vulnerable to crosstalk. This paper examines a simple computational model of protein-protein interactions which reveals both a sharp onset of crosstalk and a fragmentation of the neutral network of viable solutions as more proteins compete for regions of sequence space, revealing intrinsic limits to reliable signaling in the face of promiscuity. These results suggest connections to both phase transitions in constraint satisfaction problems and coding theory bounds on the size of communication codes

arXiv.org e-Print Archive

Crossref

On the String Consensus Problem and the Manhattan Sequence Consensus Problem

Author: A. Amir
A. Amir
A. Frank
B. Ma
C. Boucher
G.D. Cohen
H.W. Lenstra Jr.
J. Gramm
J.J. Sylvester
K. Fischer
M. Frances
R. Kannan
R.L. Graham
Publication venue
Publication date: 01/01/2014
Field of study

In the Manhattan Sequence Consensus problem (MSC problem) we are given

k

integer sequences, each of length

l

, and we are to find an integer sequence

x

of length

l

(called a consensus sequence), such that the maximum Manhattan distance of

x

from each of the input sequences is minimized. For binary sequences Manhattan distance coincides with Hamming distance, hence in this case the string consensus problem (also called string center problem or closest string problem) is a special case of MSC. Our main result is a practically efficient

O(l)

-time algorithm solving MSC for

k\le 5

sequences. Practicality of our algorithms has been verified experimentally. It improves upon the quadratic algorithm by Amir et al.\ (SPIRE 2012) for string consensus problem for

k=5

binary strings. Similarly as in Amir's algorithm we use a column-based framework. We replace the implied general integer linear programming by its easy special cases, due to combinatorial properties of the MSC for

k\le 5

. We also show that for a general parameter

k

any instance can be reduced in linear time to a kernel of size

k!

, so the problem is fixed-parameter tractable. Nevertheless, for

k\ge 4

this is still too large for any naive solution to be feasible in practice.Comment: accepted to SPIRE 201

arXiv.org e-Print Archive

Crossref

An Efficient Rank Based Approach for Closest String and Closest Substring

Author: A Ben-Dor
A Dinu
AS Fraser
AS Fraser
AWC Liew
C de la Higuera
Chuhsing Kate Hsiao
DJ States
EV Koonin
F Nicolas
F Nicolas
J Gramm
J Palmer
JC Wooley
K Lanctot
L Schmitt
L Wang
Liviu P. Dinu
LP Dinu
LP Dinu
LP Dinu
LP Dinu
LP Dinu
LP Dinu
M Chimani
M Frances
M Karpovsky
M Li
P Diaconis
R Holmquist
Radu Ionescu
S Roman
VI Levenshtein
VY Popov
W Banzhaf
X Deng
X Liu
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

This paper aims to present a new genetic approach that uses rank distance for solving two known NP-hard problems, and to compare rank distance with other distance measures for strings. The two NP-hard problems we are trying to solve are closest string and closest substring. For each problem we build a genetic algorithm and we describe the genetic operations involved. Both genetic algorithms use a fitness function based on rank distance. We compare our algorithms with other genetic algorithms that use different distance measures, such as Hamming distance or Levenshtein distance, on real DNA sequences. Our experiments show that the genetic algorithms based on rank distance have the best results

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Encoding Hard String Problems with Answer Set Programming

Author
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)
Publication date: 01/01/2023
Field of study

Despite the simple, one-dimensional nature of strings, several computationally hard problems on strings are known. Tackling hard problems beyond sizes of toy instances with straight-forward solutions is infeasible. To solve these problems on datasets of even small sizes, effort has to be put into the conception of algorithms leveraging profound characteristics of the input. Finding these characteristics can be eased by rapidly creating and evaluating prototypes of new concepts in how to tackle hard problems. Such a rapid-prototyping method for hard problems is answer set programming (ASP). In this light, we study the application of ASP on five NP-hard optimization problems in the field of strings. We provide MAX-SAT and ASP encodings, and empirically reason about the merits and flaws when working with ASP solvers

Dagstuhl Research Online Publication Server

Swiftly Computing Center Strings

Author: B Ma
C Meneses
F Nicolas
Franziska Hufsky
I Yanai
J Davila
J Gramm
Jens Stoye
JK Lanctot
Katharina Jahn
L Wang
Léon Kuchenbecker
M Frances
S Böcker
S Faro
S Rahmann
Sebastian Böcker
T Kelsey
X Liu
Y Wang
ZZ Chen
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Hufsky F, Kuchenbecker L, Jahn K, Stoye J, Böcker S. Swiftly Computing Center Strings. BMC Bioinformatics. 2011;12(1): 106

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Publications at Bielefeld University

MPG.PuRe