Search CORE

15,370 research outputs found

Distributed PCP Theorems for Hardness of Approximation in P

Author: Abboud Amir
Rubinstein Aviad
Williams Ryan
Publication venue
Publication date: 01/01/1952
Field of study

We present a new distributed model of probabilistically checkable proofs (PCP). A satisfying assignment

x \in \{0,1\}^n

to a CNF formula

\varphi

is shared between two parties, where Alice knows

x_1, \dots, x_{n/2}

, Bob knows

x_{n/2+1},\dots,x_n

, and both parties know

\varphi

. The goal is to have Alice and Bob jointly write a PCP that

x

satisfies

\varphi

, while exchanging little or no information. Unfortunately, this model as-is does not allow for nontrivial query complexity. Instead, we focus on a non-deterministic variant, where the players are helped by Merlin, a third party who knows all of

x

. Using our framework, we obtain, for the first time, PCP-like reductions from the Strong Exponential Time Hypothesis (SETH) to approximation problems in P. In particular, under SETH we show that there are no truly-subquadratic approximation algorithms for Bichromatic Maximum Inner Product over {0,1}-vectors, Bichromatic LCS Closest Pair over permutations, Approximate Regular Expression Matching, and Diameter in Product Metric. All our inapproximability factors are nearly-tight. In particular, for the first two problems we obtain nearly-polynomial factors of

2^{(\log n)^{1-o(1)}}

; only

(1+o(1))

-factor lower bounds (under SETH) were known before

arXiv.org e-Print Archive

Biblioteca Virtual del Patrimonio Bibliográfico (Virtual Library of Bibliographical Heritage)

Crossref

String Matching with Variable Length Gaps

Author: Aho
Crochemore
David Kofoed Wind
Fredriksson
Hjalte Wedel Vildhøj
Hofmann
Inge Li Gørtz
Knuth
Morgante
Myers
Myers
Myers
Navarro
Navarro
Philip Bille
Thompson
Publication venue
Publication date: 01/01/2010
Field of study

We consider string matching with variable length gaps. Given a string

T

and a pattern

P

consisting of strings separated by variable length gaps (arbitrary strings of length in a specified range), the problem is to find all ending positions of substrings in

T

that match

P

. This problem is a basic primitive in computational biology applications. Let

m

and

n

be the lengths of

P

and

T

, respectively, and let

k

be the number of strings in

P

. We present a new algorithm achieving time

O(n\log k + m +\alpha)

and space

O(m + A)

, where

A

is the sum of the lower bounds of the lengths of the gaps in

P

and

\alpha

is the total number of occurrences of the strings in

P

within

T

. Compared to the previous results this bound essentially achieves the best known time and space complexities simultaneously. Consequently, our algorithm obtains the best known bounds for almost all combinations of

m

n

k

A

, and

\alpha

. Our algorithm is surprisingly simple and straightforward to implement. We also present algorithms for finding and encoding the positions of all strings in

P

for every match of the pattern.Comment: draft of full version, extended abstract at SPIRE 201

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

Crossref

Online Research Database In Technology

A cascaded approach to normalising gene mentions in biomedical literature

Author: Keane John A.
Nenadic Goran
Yang Hui
Publication venue: 'Biomedical Informatics'
Publication date: 01/01/2007
Field of study

Linking gene and protein names mentioned in the literature to unique identifiers in referent genomic databases is an essential step in accessing and integrating knowledge in the biomedical domain. However, it remains a challenging task due to lexical and terminological variation, and ambiguity of gene name mentions in documents. We present a generic and effective rule-based approach to link gene mentions in the literature to referent genomic databases, where pre-processing of both gene synonyms in the databases and gene mentions in text are first applied. The mapping method employs a cascaded approach, which combines exact, exact-like and token-based approximate matching by using flexible representations of a gene synonym dictionary and gene mentions generated during the pre-processing phase. We also consider multi-gene name mentions and permutation of components in gene names. A systematic evaluation of the suggested methods has identified steps that are beneficial for improving either precision or recall in gene name identification. The results of the experiments on the BioCreAtIvE2 data sets (identification of human gene names) demonstrated that our methods achieved highly encouraging results with F-measure of up to 81.20%

Crossref

University of Birmingham Research Portal

Open Research Online (The Open University)

PubMed Central

The University of Manchester - Institutional Repository

Recommended from our members

A constraint based structure description language for Biosequences

Author: Eidhammer I
Gilbert D
Grindhaug SH
Jonassen J
Ratnayake R
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2001
Field of study

Brunel University Research Archive

Fast Searching in Packed Strings

Author: A. Amir
D.E. Knuth
E.W. Myers
G. Navarro
J. Tarhio
K. Fredriksson
K. Fredriksson
R. Baeza-Yates
R.A. Baeza-Yates
R.M. Karp
R.S. Boyer
S. Wu
S.T. Klein
T.A. Welch
V.L. Arlazarov
W. Masek
W. Rytter
Publication venue
Publication date: 01/01/2009
Field of study

Given strings

P

and

Q

the (exact) string matching problem is to find all positions of substrings in

Q

matching

P

. The classical Knuth-Morris-Pratt algorithm [SIAM J. Comput., 1977] solves the string matching problem in linear time which is optimal if we can only read one character at the time. However, most strings are stored in a computer in a packed representation with several characters in a single word, giving us the opportunity to read multiple characters simultaneously. In this paper we study the worst-case complexity of string matching on strings given in packed representation. Let

m \leq n

be the lengths

P

and

Q

, respectively, and let

\sigma

denote the size of the alphabet. On a standard unit-cost word-RAM with logarithmic word size we present an algorithm using time O\left(\frac{n}{\log_\sigma n} + m + \occ\right). Here \occ is the number of occurrences of

P

Q

. For

m = o(n)

this improves the

O(n)

bound of the Knuth-Morris-Pratt algorithm. Furthermore, if

m = O(n/\log_\sigma n)

our algorithm is optimal since any algorithm must spend at least \Omega(\frac{(n+m)\log \sigma}{\log n} + \occ) = \Omega(\frac{n}{\log_\sigma n} + \occ) time to read the input and report all occurrences. The result is obtained by a novel automaton construction based on the Knuth-Morris-Pratt algorithm combined with a new compact representation of subautomata allowing an optimal tabulation-based simulation.Comment: To appear in Journal of Discrete Algorithms. Special Issue on CPM 200

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

Crossref

Online Research Database In Technology

A Dialogue of Multipoles: Matched Asymptotic Expansion for Caged Black Holes

Author: A. Erdélyi
A. Erdélyi
A. Ishibashi
A. Ronveaux ed.
B. Kol
B. Kol
B. Kol
B. Kol
B. Kol
Barak Kol
C. Müller
D. Gorbonos
D. Karasik
Dan Gorbonos
E. Poisson
E. Sorkin
E. Sorkin
E. Sorkin
F.R. Tangherlini
H. Kodama
H. Kodama
H. Kodama
H. Kudoh
H. Kudoh
H. Kudoh
K. Alvi
M.W. Choptuik .
P.-J. De Smet
R.C. Myers
R.M. Wald
R.M. Wald
S.S. Gubser
T. Damour
T. de Donder
T. Harmark
T. Harmark
T. Wiseman
T. Wiseman
U.H. Gerlach
V. Fock
V. Fock
Z.X. Wang
Publication venue: 'IOP Publishing'
Publication date: 12/07/2004
Field of study

No analytic solution is known to date for a black hole in a compact dimension. We develop an analytic perturbation theory where the small parameter is the size of the black hole relative to the size of the compact dimension. We set up a general procedure for an arbitrary order in the perturbation series based on an asymptotic matched expansion between two coordinate patches: the near horizon zone and the asymptotic zone. The procedure is ordinary perturbation expansion in each zone, where additionally some boundary data comes from the other zone, and so the procedure alternates between the zones. It can be viewed as a dialogue of multipoles where the black hole changes its shape (mass multipoles) in response to the field (multipoles) created by its periodic "mirrors", and that in turn changes its field and so on. We present the leading correction to the full metric including the first correction to the area-temperature relation, the leading term for black hole eccentricity and the "Archimedes effect". The next order corrections will appear in a sequel. On the way we determine independently the static perturbations of the Schwarzschild black hole in dimension d>=5, where the system of equations can be reduced to "a master equation" - a single ordinary differential equation. The solutions are hypergeometric functions which in some cases reduce to polynomials.Comment: 47 pages, 12 figures, minor corrections described at the end of the introductio

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Error-tolerant Finite State Recognition with Applications to Morphological Analysis and Spelling Correction

Author: Oflazer Kemal
Publication venue
Publication date: 21/07/1995
Field of study

Error-tolerant recognition enables the recognition of strings that deviate mildly from any string in the regular set recognized by the underlying finite state recognizer. Such recognition has applications in error-tolerant morphological processing, spelling correction, and approximate string matching in information retrieval. After a description of the concepts and algorithms involved, we give examples from two applications: In the context of morphological analysis, error-tolerant recognition allows misspelled input word forms to be corrected, and morphologically analyzed concurrently. We present an application of this to error-tolerant analysis of agglutinative morphology of Turkish words. The algorithm can be applied to morphological analysis of any language whose morphology is fully captured by a single (and possibly very large) finite state transducer, regardless of the word formation processes and morphographemic phenomena involved. In the context of spelling correction, error-tolerant recognition can be used to enumerate correct candidate forms from a given misspelled string within a certain edit distance. Again, it can be applied to any language with a word list comprising all inflected forms, or whose morphology is fully described by a finite state transducer. We present experimental results for spelling correction for a number of languages. These results indicate that such recognition works very efficiently for candidate generation in spelling correction for many European languages such as English, Dutch, French, German, Italian (and others) with very large word lists of root and inflected forms (some containing well over 200,000 forms), generating all candidate solutions within 10 to 45 milliseconds (with edit distance 1) on a SparcStation 10/41. For spelling correction in Turkish, error-tolerantComment: Replaces 9504031. gzipped, uuencoded postscript file. To appear in Computational Linguistics Volume 22 No:1, 1996, Also available as ftp://ftp.cs.bilkent.edu.tr/pub/ko/clpaper9512.ps.

arXiv.org e-Print Archive

CiteSeerX

Bilkent University Institutional Repository