64,083 research outputs found
BOSS: Bayesian Optimization over String Spaces
This article develops a Bayesian optimization (BO) method which acts directly
over raw strings, proposing the first uses of string kernels and genetic
algorithms within BO loops. Recent applications of BO over strings have been
hindered by the need to map inputs into a smooth and unconstrained latent
space. Learning this projection is computationally and data-intensive. Our
approach instead builds a powerful Gaussian process surrogate model based on
string kernels, naturally supporting variable length inputs, and performs
efficient acquisition function maximization for spaces with syntactical
constraints. Experiments demonstrate considerably improved optimization over
existing approaches across a broad range of constraints, including the popular
setting where syntax is governed by a context-free grammar
Decompositions of Grammar Constraints
A wide range of constraints can be compactly specified using automata or
formal languages. In a sequence of recent papers, we have shown that an
effective means to reason with such specifications is to decompose them into
primitive constraints. We can then, for instance, use state of the art SAT
solvers and profit from their advanced features like fast unit propagation,
clause learning, and conflict-based search heuristics. This approach holds
promise for solving combinatorial problems in scheduling, rostering, and
configuration, as well as problems in more diverse areas like bioinformatics,
software testing and natural language processing. In addition, decomposition
may be an effective method to propagate other global constraints.Comment: Proceedings of the Twenty-Third AAAI Conference on Artificial
Intelligenc
Flexible RNA design under structure and sequence constraints using formal languages
The problem of RNA secondary structure design (also called inverse folding)
is the following: given a target secondary structure, one aims to create a
sequence that folds into, or is compatible with, a given structure. In several
practical applications in biology, additional constraints must be taken into
account, such as the presence/absence of regulatory motifs, either at a
specific location or anywhere in the sequence. In this study, we investigate
the design of RNA sequences from their targeted secondary structure, given
these additional sequence constraints. To this purpose, we develop a general
framework based on concepts of language theory, namely context-free grammars
and finite automata. We efficiently combine a comprehensive set of constraints
into a unifying context-free grammar of moderate size. From there, we use
generic generic algorithms to perform a (weighted) random generation, or an
exhaustive enumeration, of candidate sequences. The resulting method, whose
complexity scales linearly with the length of the RNA, was implemented as a
standalone program. The resulting software was embedded into a publicly
available dedicated web server. The applicability demonstrated of the method on
a concrete case study dedicated to Exon Splicing Enhancers, in which our
approach was successfully used in the design of \emph{in vitro} experiments.Comment: ACM BCB 2013 - ACM Conference on Bioinformatics, Computational
Biology and Biomedical Informatics (2013
Taking Primitive Optimality Theory Beyond the Finite State
Primitive Optimality Theory (OTP) (Eisner, 1997a; Albro, 1998), a
computational model of Optimality Theory (Prince and Smolensky, 1993), employs
a finite state machine to represent the set of active candidates at each stage
of an Optimality Theoretic derivation, as well as weighted finite state
machines to represent the constraints themselves. For some purposes, however,
it would be convenient if the set of candidates were limited by some set of
criteria capable of being described only in a higher-level grammar formalism,
such as a Context Free Grammar, a Context Sensitive Grammar, or a Multiple
Context Free Grammar (Seki et al., 1991). Examples include reduplication and
phrasal stress models. Here we introduce a mechanism for OTP-like Optimality
Theory in which the constraints remain weighted finite state machines, but sets
of candidates are represented by higher-level grammars. In particular, we use
multiple context-free grammars to model reduplication in the manner of
Correspondence Theory (McCarthy and Prince, 1995), and develop an extended
version of the Earley Algorithm (Earley, 1970) to apply the constraints to a
reduplicating candidate set.Comment: 11 pages, 5 figures, worksho
- …