19 research outputs found
Complex-valued embeddings of generic proximity data
Proximities are at the heart of almost all machine learning methods. If the
input data are given as numerical vectors of equal lengths, euclidean distance,
or a Hilbertian inner product is frequently used in modeling algorithms. In a
more generic view, objects are compared by a (symmetric) similarity or
dissimilarity measure, which may not obey particular mathematical properties.
This renders many machine learning methods invalid, leading to convergence
problems and the loss of guarantees, like generalization bounds. In many cases,
the preferred dissimilarity measure is not metric, like the earth mover
distance, or the similarity measure may not be a simple inner product in a
Hilbert space but in its generalization a Krein space. If the input data are
non-vectorial, like text sequences, proximity-based learning is used or ngram
embedding techniques can be applied. Standard embeddings lead to the desired
fixed-length vector encoding, but are costly and have substantial limitations
in preserving the original data's full information. As an information
preserving alternative, we propose a complex-valued vector embedding of
proximity data. This allows suitable machine learning algorithms to use these
fixed-length, complex-valued vectors for further processing. The complex-valued
data can serve as an input to complex-valued machine learning algorithms. In
particular, we address supervised learning and use extensions of
prototype-based learning. The proposed approach is evaluated on a variety of
standard benchmarks and shows strong performance compared to traditional
techniques in processing non-metric or non-psd proximity data.Comment: Proximity learning, embedding, complex values, complex-valued
embedding, learning vector quantizatio
Positive Definite Kernels in Machine Learning
This survey is an introduction to positive definite kernels and the set of
methods they have inspired in the machine learning literature, namely kernel
methods. We first discuss some properties of positive definite kernels as well
as reproducing kernel Hibert spaces, the natural extension of the set of
functions associated with a kernel defined
on a space . We discuss at length the construction of kernel
functions that take advantage of well-known statistical models. We provide an
overview of numerous data-analysis methods which take advantage of reproducing
kernel Hilbert spaces and discuss the idea of combining several kernels to
improve the performance on certain tasks. We also provide a short cookbook of
different kernels which are particularly useful for certain data-types such as
images, graphs or speech segments.Comment: draft. corrected a typo in figure
Adaptive spectrum transformation by topology preserving on indefinite proximity data
Similarity-based representation generates indefinite matrices, which are inconsistent with classical kernel-based learning frameworks. In this paper, we present an adaptive spectrum transformation method that provides a positive semidefinite ( psd ) kernel consistent with the intrinsic geometry of proximity data. In the proposed method, an indefinite similarity matrix is rectified by maximizing the Euclidian fac- tor ( EF ) criterion, which represents the similarity of the resulting feature space to Euclidean space. This maximization is achieved by modifying volume elements through applying a conformal transform over the similarity matrix. We performed several experiments to evaluate the performance of the proposed method in comparison with flip, clip, shift , and square spectrum transformation techniques on similarity matrices. Applying the resulting psd matrices as kernels in dimensionality reduction and clustering problems confirms the success of the proposed approach in adapting to data and preserving its topological information. Our experiments show that in classification applications, the superiority of the proposed method is considerable when the negative eigenfraction of the similarity matrix is significant