2,339 research outputs found
Structured Prediction of Sequences and Trees using Infinite Contexts
Linguistic structures exhibit a rich array of global phenomena, however
commonly used Markov models are unable to adequately describe these phenomena
due to their strong locality assumptions. We propose a novel hierarchical model
for structured prediction over sequences and trees which exploits global
context by conditioning each generation decision on an unbounded context of
prior decisions. This builds on the success of Markov models but without
imposing a fixed bound in order to better represent global phenomena. To
facilitate learning of this large and unbounded model, we use a hierarchical
Pitman-Yor process prior which provides a recursive form of smoothing. We
propose prediction algorithms based on A* and Markov Chain Monte Carlo
sampling. Empirical results demonstrate the potential of our model compared to
baseline finite-context Markov models on part-of-speech tagging and syntactic
parsing
Spectral Analysis of Kernel and Neural Embeddings: Optimization and Generalization
We extend the recent results of (Arora et al. 2019). by spectral analysis of
the representations corresponding to the kernel and neural embeddings. They
showed that in a simple single-layer network, the alignment of the labels to
the eigenvectors of the corresponding Gram matrix determines both the
convergence of the optimization during training as well as the generalization
properties. We generalize their result to the kernel and neural representations
and show these extensions improve both optimization and generalization of the
basic setup studied in (Arora et al. 2019). In particular, we first extend the
setup with the Gaussian kernel and the approximations by random Fourier
features as well as with the embeddings produced by two-layer networks trained
on different tasks. We then study the use of more sophisticated kernels and
embeddings, those designed optimally for deep neural networks and those
developed for the classification task of interest given the data and the
training labels, independent of any specific classification model
Active classification with comparison queries
We study an extension of active learning in which the learning algorithm may
ask the annotator to compare the distances of two examples from the boundary of
their label-class. For example, in a recommendation system application (say for
restaurants), the annotator may be asked whether she liked or disliked a
specific restaurant (a label query); or which one of two restaurants did she
like more (a comparison query).
We focus on the class of half spaces, and show that under natural
assumptions, such as large margin or bounded bit-description of the input
examples, it is possible to reveal all the labels of a sample of size using
approximately queries. This implies an exponential improvement over
classical active learning, where only label queries are allowed. We complement
these results by showing that if any of these assumptions is removed then, in
the worst case, queries are required.
Our results follow from a new general framework of active learning with
additional queries. We identify a combinatorial dimension, called the
\emph{inference dimension}, that captures the query complexity when each
additional query is determined by examples (such as comparison queries,
each of which is determined by the two compared examples). Our results for half
spaces follow by bounding the inference dimension in the cases discussed above.Comment: 23 pages (not including references), 1 figure. The new version
contains a minor fix in the proof of Lemma 4.
- …