525,114 research outputs found
Tree Edit Distance Learning via Adaptive Symbol Embeddings
Metric learning has the aim to improve classification accuracy by learning a
distance measure which brings data points from the same class closer together
and pushes data points from different classes further apart. Recent research
has demonstrated that metric learning approaches can also be applied to trees,
such as molecular structures, abstract syntax trees of computer programs, or
syntax trees of natural language, by learning the cost function of an edit
distance, i.e. the costs of replacing, deleting, or inserting nodes in a tree.
However, learning such costs directly may yield an edit distance which violates
metric axioms, is challenging to interpret, and may not generalize well. In
this contribution, we propose a novel metric learning approach for trees which
we call embedding edit distance learning (BEDL) and which learns an edit
distance indirectly by embedding the tree nodes as vectors, such that the
Euclidean distance between those vectors supports class discrimination. We
learn such embeddings by reducing the distance to prototypical trees from the
same class and increasing the distance to prototypical trees from different
classes. In our experiments, we show that BEDL improves upon the
state-of-the-art in metric learning for trees on six benchmark data sets,
ranging from computer science over biomedical data to a natural-language
processing data set containing over 300,000 nodes.Comment: Paper at the International Conference of Machine Learning (2018),
2018-07-10 to 2018-07-15 in Stockholm, Swede
Latent Tree Learning with Differentiable Parsers: Shift-Reduce Parsing and Chart Parsing
Latent tree learning models represent sentences by composing their words
according to an induced parse tree, all based on a downstream task. These
models often outperform baselines which use (externally provided) syntax trees
to drive the composition order. This work contributes (a) a new latent tree
learning model based on shift-reduce parsing, with competitive downstream
performance and non-trivial induced trees, and (b) an analysis of the trees
learned by our shift-reduce model and by a chart-based model.Comment: ACL 2018 workshop on Relevance of Linguistic Structure in Neural
Architectures for NL
On Probability Distributions for Trees: Representations, Inference and Learning
We study probability distributions over free algebras of trees. Probability
distributions can be seen as particular (formal power) tree series [Berstel et
al 82, Esik et al 03], i.e. mappings from trees to a semiring K . A widely
studied class of tree series is the class of rational (or recognizable) tree
series which can be defined either in an algebraic way or by means of
multiplicity tree automata. We argue that the algebraic representation is very
convenient to model probability distributions over a free algebra of trees.
First, as in the string case, the algebraic representation allows to design
learning algorithms for the whole class of probability distributions defined by
rational tree series. Note that learning algorithms for rational tree series
correspond to learning algorithms for weighted tree automata where both the
structure and the weights are learned. Second, the algebraic representation can
be easily extended to deal with unranked trees (like XML trees where a symbol
may have an unbounded number of children). Both properties are particularly
relevant for applications: nondeterministic automata are required for the
inference problem to be relevant (recall that Hidden Markov Models are
equivalent to nondeterministic string automata); nowadays applications for Web
Information Extraction, Web Services and document processing consider unranked
trees
Automatically combining static malware detection techniques
Malware detection techniques come in many different flavors, and cover different effectiveness and efficiency trade-offs. This paper evaluates a number of machine learning techniques to combine multiple static Android malware detection techniques using automatically constructed decision trees. We identify the best methods to construct the trees. We demonstrate that those trees classify sample apps better and faster than individual techniques alone
- …