6,120 research outputs found
Compact Random Feature Maps
Kernel approximation using randomized feature maps has recently gained a lot
of interest. In this work, we identify that previous approaches for polynomial
kernel approximation create maps that are rank deficient, and therefore do not
utilize the capacity of the projected feature space effectively. To address
this challenge, we propose compact random feature maps (CRAFTMaps) to
approximate polynomial kernels more concisely and accurately. We prove the
error bounds of CRAFTMaps demonstrating their superior kernel reconstruction
performance compared to the previous approximation schemes. We show how
structured random matrices can be used to efficiently generate CRAFTMaps, and
present a single-pass algorithm using CRAFTMaps to learn non-linear multi-class
classifiers. We present experiments on multiple standard data-sets with
performance competitive with state-of-the-art results.Comment: 9 page
On Recursive Edit Distance Kernels with Application to Time Series Classification
This paper proposes some extensions to the work on kernels dedicated to
string or time series global alignment based on the aggregation of scores
obtained by local alignments. The extensions we propose allow to construct,
from classical recursive definition of elastic distances, recursive edit
distance (or time-warp) kernels that are positive definite if some sufficient
conditions are satisfied. The sufficient conditions we end-up with are original
and weaker than those proposed in earlier works, although a recursive
regularizing term is required to get the proof of the positive definiteness as
a direct consequence of the Haussler's convolution theorem. The classification
experiment we conducted on three classical time warp distances (two of which
being metrics), using Support Vector Machine classifier, leads to conclude
that, when the pairwise distance matrix obtained from the training data is
\textit{far} from definiteness, the positive definite recursive elastic kernels
outperform in general the distance substituting kernels for the classical
elastic distances we have tested.Comment: 14 page
A tree-based kernel for graphs with continuous attributes
The availability of graph data with node attributes that can be either
discrete or real-valued is constantly increasing. While existing kernel methods
are effective techniques for dealing with graphs having discrete node labels,
their adaptation to non-discrete or continuous node attributes has been
limited, mainly for computational issues. Recently, a few kernels especially
tailored for this domain, and that trade predictive performance for
computational efficiency, have been proposed. In this paper, we propose a graph
kernel for complex and continuous nodes' attributes, whose features are tree
structures extracted from specific graph visits. The kernel manages to keep the
same complexity of state-of-the-art kernels while implicitly using a larger
feature space. We further present an approximated variant of the kernel which
reduces its complexity significantly. Experimental results obtained on six
real-world datasets show that the kernel is the best performing one on most of
them. Moreover, in most cases the approximated version reaches comparable
performances to current state-of-the-art kernels in terms of classification
accuracy while greatly shortening the running times.Comment: This work has been submitted to the IEEE Transactions on Neural
Networks and Learning Systems for possible publication. Copyright may be
transferred without notice, after which this version may no longer be
accessibl
Toward a multilevel representation of protein molecules: comparative approaches to the aggregation/folding propensity problem
This paper builds upon the fundamental work of Niwa et al. [34], which
provides the unique possibility to analyze the relative aggregation/folding
propensity of the elements of the entire Escherichia coli (E. coli) proteome in
a cell-free standardized microenvironment. The hardness of the problem comes
from the superposition between the driving forces of intra- and inter-molecule
interactions and it is mirrored by the evidences of shift from folding to
aggregation phenotypes by single-point mutations [10]. Here we apply several
state-of-the-art classification methods coming from the field of structural
pattern recognition, with the aim to compare different representations of the
same proteins gathered from the Niwa et al. data base; such representations
include sequences and labeled (contact) graphs enriched with chemico-physical
attributes. By this comparison, we are able to identify also some interesting
general properties of proteins. Notably, (i) we suggest a threshold around 250
residues discriminating "easily foldable" from "hardly foldable" molecules
consistent with other independent experiments, and (ii) we highlight the
relevance of contact graph spectra for folding behavior discrimination and
characterization of the E. coli solubility data. The soundness of the
experimental results presented in this paper is proved by the statistically
relevant relationships discovered among the chemico-physical description of
proteins and the developed cost matrix of substitution used in the various
discrimination systems.Comment: 17 pages, 3 figures, 46 reference
- …