Search CORE

ARTS repository - University of Groningen

arXiv.org e-Print Archive

Complex-valued embeddings of generic proximity data

Author: Biehl Michael
Münch Maximilian
Schleif Frank-Michael
Straat Michiel
Publication venue
Publication date: 31/08/2020
Field of study

Proximities are at the heart of almost all machine learning methods. If the input data are given as numerical vectors of equal lengths, euclidean distance, or a Hilbertian inner product is frequently used in modeling algorithms. In a more generic view, objects are compared by a (symmetric) similarity or dissimilarity measure, which may not obey particular mathematical properties. This renders many machine learning methods invalid, leading to convergence problems and the loss of guarantees, like generalization bounds. In many cases, the preferred dissimilarity measure is not metric, like the earth mover distance, or the similarity measure may not be a simple inner product in a Hilbert space but in its generalization a Krein space. If the input data are non-vectorial, like text sequences, proximity-based learning is used or ngram embedding techniques can be applied. Standard embeddings lead to the desired fixed-length vector encoding, but are costly and have substantial limitations in preserving the original data's full information. As an information preserving alternative, we propose a complex-valued vector embedding of proximity data. This allows suitable machine learning algorithms to use these fixed-length, complex-valued vectors for further processing. The complex-valued data can serve as an input to complex-valued machine learning algorithms. In particular, we address supervised learning and use extensions of prototype-based learning. The proposed approach is evaluated on a variety of standard benchmarks and shows strong performance compared to traditional techniques in processing non-metric or non-psd proximity data.Comment: Proximity learning, embedding, complex values, complex-valued embedding, learning vector quantizatio

arXiv.org e-Print Archive

Complex-valued embeddings of generic proximity data

Author: Biehl Michael
Münch Maximilian
Schleif Frank-Michael
Straat Michiel
Publication venue
Publication date: 31/08/2020
Field of study

ARTS repository - University of Groningen

Complex-valued embeddings of generic proximity data

Author: Biehl Michael
Münch Maximilian
Schleif Frank-Michael
Straat Michiel
Publication venue
Publication date: 31/08/2020
Field of study

Data-Driven Supervised Learning for Life Science Data

Author: Biehl Michael
Münch Maximilian
Raab Christoph
Schleif Frank-Michael
Publication venue: 'Frontiers Media SA'
Publication date: 06/11/2020
Field of study

Life science data are often encoded in a non-standard way by means of alpha-numeric sequences, graph representations, numerical vectors of variable length, or other formats. Domain-specific or data-driven similarity measures like alignment functions have been employed with great success. The vast majority of more complex data analysis algorithms require fixed-length vectorial input data, asking for substantial preprocessing of life science data. Data-driven measures are widely ignored in favor of simple encodings. These preprocessing steps are not always easy to perform nor particularly effective, with a potential loss of information and interpretability. We present some strategies and concepts of how to employ data-driven similarity measures in the life science context and other complex biological systems. In particular, we show how to use data-driven similarity measures effectively in standard learning algorithms

ARTS repository - University of Groningen