1,151 research outputs found
A Coverage Criterion for Spaced Seeds and its Applications to Support Vector Machine String Kernels and k-Mer Distances
Spaced seeds have been recently shown to not only detect more alignments, but
also to give a more accurate measure of phylogenetic distances (Boden et al.,
2013, Horwege et al., 2014, Leimeister et al., 2014), and to provide a lower
misclassification rate when used with Support Vector Machines (SVMs) (On-odera
and Shibuya, 2013), We confirm by independent experiments these two results,
and propose in this article to use a coverage criterion (Benson and Mak, 2008,
Martin, 2013, Martin and No{\'e}, 2014), to measure the seed efficiency in both
cases in order to design better seed patterns. We show first how this coverage
criterion can be directly measured by a full automaton-based approach. We then
illustrate how this criterion performs when compared with two other criteria
frequently used, namely the single-hit and multiple-hit criteria, through
correlation coefficients with the correct classification/the true distance. At
the end, for alignment-free distances, we propose an extension by adopting the
coverage criterion, show how it performs, and indicate how it can be
efficiently computed.Comment: http://online.liebertpub.com/doi/abs/10.1089/cmb.2014.017
Deep Learning as a Parton Shower
We make the connection between certain deep learning architectures and the
renormalisation group explicit in the context of QCD by using a deep learning
network to construct a toy parton shower model. The model aims to describe
proton-proton collisions at the Large Hadron Collider. A convolutional
autoencoder learns a set of kernels that efficiently encode the behaviour of
fully showered QCD collision events. The network is structured recursively so
as to ensure self-similarity, and the number of trained network parameters is
low. Randomness is introduced via a novel custom masking layer, which also
preserves existing parton splittings by using layer-skipping connections. By
applying a shower merging procedure, the network can be evaluated on unshowered
events produced by a matrix element calculation. The trained network behaves as
a parton shower that qualitatively reproduces jet-based observables.Comment: 26 pages, 13 figure
Evolutionary distances in the twilight zone -- a rational kernel approach
Phylogenetic tree reconstruction is traditionally based on multiple sequence
alignments (MSAs) and heavily depends on the validity of this information
bottleneck. With increasing sequence divergence, the quality of MSAs decays
quickly. Alignment-free methods, on the other hand, are based on abstract
string comparisons and avoid potential alignment problems. However, in general
they are not biologically motivated and ignore our knowledge about the
evolution of sequences. Thus, it is still a major open question how to define
an evolutionary distance metric between divergent sequences that makes use of
indel information and known substitution models without the need for a multiple
alignment. Here we propose a new evolutionary distance metric to close this
gap. It uses finite-state transducers to create a biologically motivated
similarity score which models substitutions and indels, and does not depend on
a multiple sequence alignment. The sequence similarity score is defined in
analogy to pairwise alignments and additionally has the positive semi-definite
property. We describe its derivation and show in simulation studies and
real-world examples that it is more accurate in reconstructing phylogenies than
competing methods. The result is a new and accurate way of determining
evolutionary distances in and beyond the twilight zone of sequence alignments
that is suitable for large datasets.Comment: to appear in PLoS ON
The Integration of Connectionism and First-Order Knowledge Representation and Reasoning as a Challenge for Artificial Intelligence
Intelligent systems based on first-order logic on the one hand, and on
artificial neural networks (also called connectionist systems) on the other,
differ substantially. It would be very desirable to combine the robust neural
networking machinery with symbolic knowledge representation and reasoning
paradigms like logic programming in such a way that the strengths of either
paradigm will be retained. Current state-of-the-art research, however, fails by
far to achieve this ultimate goal. As one of the main obstacles to be overcome
we perceive the question how symbolic knowledge can be encoded by means of
connectionist systems: Satisfactory answers to this will naturally lead the way
to knowledge extraction algorithms and to integrated neural-symbolic systems.Comment: In Proceedings of INFORMATION'2004, Tokyo, Japan, to appear. 12 page
Coalgebras for Bisimulation of Weighted Automata over Semirings
Weighted automata are a generalization of nondeterministic automata that
associate a weight drawn from a semiring with every transition and every
state. Their behaviours can be formalized either as weighted language
equivalence or weighted bisimulation. In this paper we explore the properties
of weighted automata in the framework of coalgebras over (i) the category
of semimodules over a semiring and -linear maps, and
(ii) the category of sets and maps. We show that the behavioural
equivalences defined by the corresponding final coalgebras in these two cases
characterize weighted language equivalence and weighted bisimulation,
respectively. These results extend earlier work by Bonchi et al. using the
category of vector spaces and linear maps as the underlying
model for weighted automata with weights drawn from a field . The key step
in our work is generalizing the notions of linear relation and linear
bisimulation of Boreale from vector spaces to semimodules using the concept of
the kernel of a -linear map in the sense of universal algebra. We also
provide an abstract procedure for forward partition refinement for computing
weighted language equivalence. Since for weighted automata defined over
semirings the problem is undecidable in general, it is guaranteed to halt only
in special cases. We provide sufficient conditions for the termination of our
procedure. Although the results are similar to those of Bonchi et al., many of
our proofs are new, especially those about the coalgebra in
characterizing weighted language equivalence
- …