83 research outputs found
Tree Edit Distance Learning via Adaptive Symbol Embeddings
Metric learning has the aim to improve classification accuracy by learning a
distance measure which brings data points from the same class closer together
and pushes data points from different classes further apart. Recent research
has demonstrated that metric learning approaches can also be applied to trees,
such as molecular structures, abstract syntax trees of computer programs, or
syntax trees of natural language, by learning the cost function of an edit
distance, i.e. the costs of replacing, deleting, or inserting nodes in a tree.
However, learning such costs directly may yield an edit distance which violates
metric axioms, is challenging to interpret, and may not generalize well. In
this contribution, we propose a novel metric learning approach for trees which
we call embedding edit distance learning (BEDL) and which learns an edit
distance indirectly by embedding the tree nodes as vectors, such that the
Euclidean distance between those vectors supports class discrimination. We
learn such embeddings by reducing the distance to prototypical trees from the
same class and increasing the distance to prototypical trees from different
classes. In our experiments, we show that BEDL improves upon the
state-of-the-art in metric learning for trees on six benchmark data sets,
ranging from computer science over biomedical data to a natural-language
processing data set containing over 300,000 nodes.Comment: Paper at the International Conference of Machine Learning (2018),
2018-07-10 to 2018-07-15 in Stockholm, Swede
The Shallow and the Deep:A biased introduction to neural networks and old school machine learning
The Shallow and the Deep is a collection of lecture notes that offers an accessible introduction to neural networks and machine learning in general. However, it was clear from the beginning that these notes would not be able to cover this rapidly changing and growing field in its entirety. The focus lies on classical machine learning techniques, with a bias towards classification and regression. Other learning paradigms and many recent developments in, for instance, Deep Learning are not addressed or only briefly touched upon.Biehl argues that having a solid knowledge of the foundations of the field is essential, especially for anyone who wants to explore the world of machine learning with an ambition that goes beyond the application of some software package to some data set. Therefore, The Shallow and the Deep places emphasis on fundamental concepts and theoretical background. This also involves delving into the history and pre-history of neural networks, where the foundations for most of the recent developments were laid. These notes aim to demystify machine learning and neural networks without losing the appreciation for their impressive power and versatility
Galaxy classification: A machine learning analysis of GAMA catalogue data
We present a machine learning analysis of five labelled galaxy catalogues
from the Galaxy And Mass Assembly (GAMA): The SersicCatVIKING and
SersicCatUKIDSS catalogues containing morphological features, the
GaussFitSimple catalogue containing spectroscopic features, the MagPhys
catalogue including physical parameters for galaxies, and the Lambdar
catalogue, which contains photometric measurements. Extending work previously
presented at the ESANN 2018 conference - in an analysis based on Generalized
Relevance Matrix Learning Vector Quantization and Random Forests - we find that
neither the data from the individual catalogues nor a combined dataset based on
all 5 catalogues fully supports the visual-inspection-based galaxy
classification scheme employed to categorise the galaxies. In particular, only
one class, the Little Blue Spheroids, is consistently separable from the other
classes. To aid further insight into the nature of the employed visual-based
classification scheme with respect to physical and morphological features, we
present the galaxy parameters that are discriminative for the achieved class
distinctions.Comment: Accepted for the ESANN 2018 Special Issue of Neurocomputin
Galaxy classification: A machine learning analysis of GAMA catalogue data
We present a machine learning analysis of five labelled galaxy catalogues from the Galaxy And Mass Assembly (GAMA): The SersicCatVIKING and SersicCatUKIDSS catalogues containing morphological features, the GaussFitSimplecatalogue containing spectroscopic features, the MagPhys catalogue including physical parameters for galaxies, and the Lambdar catalogue, which contains photometric measurements. Extending work previously presented at the ESANN 2018 conference – in an analysis based on Generalized Relevance Matrix Learning Vector Quantization and Random Forests – we find that neither the data from the individual catalogues nor a combined dataset based on all 5 catalogues fully supports the visual-inspection-based galaxy classification scheme employed to categorise the galaxies. In particular, only one class, the Little Blue Spheroids, is consistently separable from the other classes. To aid further insight into the nature of the employed visual-based classification scheme with respect to physical and morphological features, we present the galaxy parameters that are discriminative for the achieved class distinctions
- …