7,176 research outputs found
Tree Edit Distance Learning via Adaptive Symbol Embeddings
Metric learning has the aim to improve classification accuracy by learning a
distance measure which brings data points from the same class closer together
and pushes data points from different classes further apart. Recent research
has demonstrated that metric learning approaches can also be applied to trees,
such as molecular structures, abstract syntax trees of computer programs, or
syntax trees of natural language, by learning the cost function of an edit
distance, i.e. the costs of replacing, deleting, or inserting nodes in a tree.
However, learning such costs directly may yield an edit distance which violates
metric axioms, is challenging to interpret, and may not generalize well. In
this contribution, we propose a novel metric learning approach for trees which
we call embedding edit distance learning (BEDL) and which learns an edit
distance indirectly by embedding the tree nodes as vectors, such that the
Euclidean distance between those vectors supports class discrimination. We
learn such embeddings by reducing the distance to prototypical trees from the
same class and increasing the distance to prototypical trees from different
classes. In our experiments, we show that BEDL improves upon the
state-of-the-art in metric learning for trees on six benchmark data sets,
ranging from computer science over biomedical data to a natural-language
processing data set containing over 300,000 nodes.Comment: Paper at the International Conference of Machine Learning (2018),
2018-07-10 to 2018-07-15 in Stockholm, Swede
Dimensionality Reduction Mappings
A wealth of powerful dimensionality reduction methods has been established which can be used for data visualization and preprocessing. These are accompanied by formal evaluation schemes, which allow a quantitative evaluation along general principles and which even lead to further visualization schemes based on these objectives. Most methods, however, provide a mapping of a priorly given finite set of points only, requiring additional steps for out-of-sample extensions. We propose a general view on dimensionality reduction based on the concept of cost functions, and, based on this general principle, extend dimensionality reduction to explicit mappings of the data manifold. This offers simple out-of-sample extensions. Further, it opens a way towards a theory of data visualization taking the perspective of its generalization ability to new data points. We demonstrate the approach based on a simple global linear mapping as well as prototype-based local linear mappings.
Materials property prediction using symmetry-labeled graphs as atomic-position independent descriptors
Computational materials screening studies require fast calculation of the
properties of thousands of materials. The calculations are often performed with
Density Functional Theory (DFT), but the necessary computer time sets
limitations for the investigated material space. Therefore, the development of
machine learning models for prediction of DFT calculated properties are
currently of interest. A particular challenge for \emph{new} materials is that
the atomic positions are generally not known. We present a machine learning
model for the prediction of DFT-calculated formation energies based on Voronoi
quotient graphs and local symmetry classification without the need for detailed
information about atomic positions. The model is implemented as a message
passing neural network and tested on the Open Quantum Materials Database (OQMD)
and the Materials Project database. The test mean absolute error is 20 meV on
the OQMD database and 40 meV on Materials Project Database. The possibilities
for prediction in a realistic computational screening setting is investigated
on a dataset of 5976 ABSe selenides with very limited overlap with the OQMD
training set. Pretraining on OQMD and subsequent training on 100 selenides
result in a mean absolute error below 0.1 eV for the formation energy of the
selenides.Comment: 14 pages including references and 13 figure
- …