26,968 research outputs found
Tree Edit Distance Learning via Adaptive Symbol Embeddings
Metric learning has the aim to improve classification accuracy by learning a
distance measure which brings data points from the same class closer together
and pushes data points from different classes further apart. Recent research
has demonstrated that metric learning approaches can also be applied to trees,
such as molecular structures, abstract syntax trees of computer programs, or
syntax trees of natural language, by learning the cost function of an edit
distance, i.e. the costs of replacing, deleting, or inserting nodes in a tree.
However, learning such costs directly may yield an edit distance which violates
metric axioms, is challenging to interpret, and may not generalize well. In
this contribution, we propose a novel metric learning approach for trees which
we call embedding edit distance learning (BEDL) and which learns an edit
distance indirectly by embedding the tree nodes as vectors, such that the
Euclidean distance between those vectors supports class discrimination. We
learn such embeddings by reducing the distance to prototypical trees from the
same class and increasing the distance to prototypical trees from different
classes. In our experiments, we show that BEDL improves upon the
state-of-the-art in metric learning for trees on six benchmark data sets,
ranging from computer science over biomedical data to a natural-language
processing data set containing over 300,000 nodes.Comment: Paper at the International Conference of Machine Learning (2018),
2018-07-10 to 2018-07-15 in Stockholm, Swede
Combining Static and Dynamic Features for Multivariate Sequence Classification
Model precision in a classification task is highly dependent on the feature
space that is used to train the model. Moreover, whether the features are
sequential or static will dictate which classification method can be applied as
most of the machine learning algorithms are designed to deal with either one or
another type of data. In real-life scenarios, however, it is often the case
that both static and dynamic features are present, or can be extracted from the
data. In this work, we demonstrate how generative models such as Hidden Markov
Models (HMM) and Long Short-Term Memory (LSTM) artificial neural networks can
be used to extract temporal information from the dynamic data. We explore how
the extracted information can be combined with the static features in order to
improve the classification performance. We evaluate the existing techniques and
suggest a hybrid approach, which outperforms other methods on several public
datasets.Comment: Presented at IEEE DSAA 201
- …