139 research outputs found
Graph Kernels
We present a unified framework to study graph kernels, special cases of which include the random
walk (Gärtner et al., 2003; Borgwardt et al., 2005) and marginalized (Kashima et al., 2003, 2004;
Mahé et al., 2004) graph kernels. Through reduction to a Sylvester equation we improve the time
complexity of kernel computation between unlabeled graphs with n vertices from O(n^6) to O(n^3).
We find a spectral decomposition approach even more efficient when computing entire kernel matrices.
For labeled graphs we develop conjugate gradient and fixed-point methods that take O(dn^3)
time per iteration, where d is the size of the label set. By extending the necessary linear algebra to
Reproducing Kernel Hilbert Spaces (RKHS) we obtain the same result for d-dimensional edge kernels,
and O(n^4) in the infinite-dimensional case; on sparse graphs these algorithms only take O(n^2)
time per iteration in all cases. Experiments on graphs from bioinformatics and other application
domains show that these techniques can speed up computation of the kernel by an order of magnitude
or more. We also show that certain rational kernels (Cortes et al., 2002, 2003, 2004) when
specialized to graphs reduce to our random walk graph kernel. Finally, we relate our framework to
R-convolution kernels (Haussler, 1999) and provide a kernel that is close to the optimal assignment
kernel of Fröhlich et al. (2006) yet provably positive semi-definite
Evolutionary distances in the twilight zone -- a rational kernel approach
Phylogenetic tree reconstruction is traditionally based on multiple sequence
alignments (MSAs) and heavily depends on the validity of this information
bottleneck. With increasing sequence divergence, the quality of MSAs decays
quickly. Alignment-free methods, on the other hand, are based on abstract
string comparisons and avoid potential alignment problems. However, in general
they are not biologically motivated and ignore our knowledge about the
evolution of sequences. Thus, it is still a major open question how to define
an evolutionary distance metric between divergent sequences that makes use of
indel information and known substitution models without the need for a multiple
alignment. Here we propose a new evolutionary distance metric to close this
gap. It uses finite-state transducers to create a biologically motivated
similarity score which models substitutions and indels, and does not depend on
a multiple sequence alignment. The sequence similarity score is defined in
analogy to pairwise alignments and additionally has the positive semi-definite
property. We describe its derivation and show in simulation studies and
real-world examples that it is more accurate in reconstructing phylogenies than
competing methods. The result is a new and accurate way of determining
evolutionary distances in and beyond the twilight zone of sequence alignments
that is suitable for large datasets.Comment: to appear in PLoS ON
The Boundedness of Cauchy Integral Operator on a Domain Having Closed Analytic Boundary
In this paper, we prove that the Cauchy integral operators (or Cauchy
transforms) define continuous linear operators on the Smirnov classes for some
certain domain with closed analytic boundary
Effect of the Tunneling Conductance on the Coulomb Staircase
Quantum fluctuations of the charge in the single electron box are
investigated. The rounding of the Coulomb staircase caused by virtual electron
tunneling is determined by perturbation theory up to third order in the
tunneling conductance and compared with precise Monte Carlo data computed with
a new algorithm. The remarkable agreement for large conductance indicates that
presently available experimental data on Coulomb charging effects in metallic
nanostructures can be well explained by finite order perturbative results.Comment: 4 pages, 5 figure
A kernel for time series based on global alignments
We propose in this paper a new family of kernels to handle times series,
notably speech data, within the framework of kernel methods which includes
popular algorithms such as the Support Vector Machine. These kernels elaborate
on the well known Dynamic Time Warping (DTW) family of distances by considering
the same set of elementary operations, namely substitutions and repetitions of
tokens, to map a sequence onto another. Associating to each of these operations
a given score, DTW algorithms use dynamic programming techniques to compute an
optimal sequence of operations with high overall score. In this paper we
consider instead the score spanned by all possible alignments, take a smoothed
version of their maximum and derive a kernel out of this formulation. We prove
that this kernel is positive definite under favorable conditions and show how
it can be tuned effectively for practical applications as we report encouraging
results on a speech recognition task
- …