1,718 research outputs found
Hyperspherical Prototype Networks
This paper introduces hyperspherical prototype networks, which unify
classification and regression with prototypes on hyperspherical output spaces.
For classification, a common approach is to define prototypes as the mean
output vector over training examples per class. Here, we propose to use
hyperspheres as output spaces, with class prototypes defined a priori with
large margin separation. We position prototypes through data-independent
optimization, with an extension to incorporate priors from class semantics. By
doing so, we do not require any prototype updating, we can handle any training
size, and the output dimensionality is no longer constrained to the number of
classes. Furthermore, we generalize to regression, by optimizing outputs as an
interpolation between two prototypes on the hypersphere. Since both tasks are
now defined by the same loss function, they can be jointly trained for
multi-task problems. Experimentally, we show the benefit of hyperspherical
prototype networks for classification, regression, and their combination over
other prototype methods, softmax cross-entropy, and mean squared error
approaches.Comment: NeurIPS 201
Neural Metric Learning for Fast End-to-End Relation Extraction
Relation extraction (RE) is an indispensable information extraction task in
several disciplines. RE models typically assume that named entity recognition
(NER) is already performed in a previous step by another independent model.
Several recent efforts, under the theme of end-to-end RE, seek to exploit
inter-task correlations by modeling both NER and RE tasks jointly. Earlier work
in this area commonly reduces the task to a table-filling problem wherein an
additional expensive decoding step involving beam search is applied to obtain
globally consistent cell labels. In efforts that do not employ table-filling,
global optimization in the form of CRFs with Viterbi decoding for the NER
component is still necessary for competitive performance. We introduce a novel
neural architecture utilizing the table structure, based on repeated
applications of 2D convolutions for pooling local dependency and metric-based
features, that improves on the state-of-the-art without the need for global
optimization. We validate our model on the ADE and CoNLL04 datasets for
end-to-end RE and demonstrate gain (in F-score) over prior best
results with training and testing times that are seven to ten times faster ---
the latter highly advantageous for time-sensitive end user applications
Word, graph and manifold embedding from Markov processes
Continuous vector representations of words and objects appear to carry
surprisingly rich semantic content. In this paper, we advance both the
conceptual and theoretical understanding of word embeddings in three ways.
First, we ground embeddings in semantic spaces studied in
cognitive-psychometric literature and introduce new evaluation tasks. Second,
in contrast to prior work, we take metric recovery as the key object of study,
unify existing algorithms as consistent metric recovery methods based on
co-occurrence counts from simple Markov random walks, and propose a new
recovery algorithm. Third, we generalize metric recovery to graphs and
manifolds, relating co-occurence counts on random walks in graphs and random
processes on manifolds to the underlying metric to be recovered, thereby
reconciling manifold estimation and embedding algorithms. We compare embedding
algorithms across a range of tasks, from nonlinear dimensionality reduction to
three semantic language tasks, including analogies, sequence completion, and
classification
On Approximation Guarantees for Greedy Low Rank Optimization
We provide new approximation guarantees for greedy low rank matrix estimation
under standard assumptions of restricted strong convexity and smoothness. Our
novel analysis also uncovers previously unknown connections between the low
rank estimation and combinatorial optimization, so much so that our bounds are
reminiscent of corresponding approximation bounds in submodular maximization.
Additionally, we also provide statistical recovery guarantees. Finally, we
present empirical comparison of greedy estimation with established baselines on
two important real-world problems
Learning Cross-lingual Embeddings from Twitter via Distant Supervision
Cross-lingual embeddings represent the meaning of words from different
languages in the same vector space. Recent work has shown that it is possible
to construct such representations by aligning independently learned monolingual
embedding spaces, and that accurate alignments can be obtained even without
external bilingual data. In this paper we explore a research direction that has
been surprisingly neglected in the literature: leveraging noisy user-generated
text to learn cross-lingual embeddings particularly tailored towards social
media applications. While the noisiness and informal nature of the social media
genre poses additional challenges to cross-lingual embedding methods, we find
that it also provides key opportunities due to the abundance of code-switching
and the existence of a shared vocabulary of emoji and named entities. Our
contribution consists of a very simple post-processing step that exploits these
phenomena to significantly improve the performance of state-of-the-art
alignment methods.Comment: Accepted to ICWSM 2020. 11 pages, 1 appendix. Pre-trained embeddings
available at https://github.com/pedrada88/crossembeddings-twitte
Adversarial Gain
Adversarial examples can be defined as inputs to a model which induce a
mistake - where the model output is different than that of an oracle, perhaps
in surprising or malicious ways. Original models of adversarial attacks are
primarily studied in the context of classification and computer vision tasks.
While several attacks have been proposed in natural language processing (NLP)
settings, they often vary in defining the parameters of an attack and what a
successful attack would look like. The goal of this work is to propose a
unifying model of adversarial examples suitable for NLP tasks in both
generative and classification settings. We define the notion of adversarial
gain: based in control theory, it is a measure of the change in the output of a
system relative to the perturbation of the input (caused by the so-called
adversary) presented to the learner. This definition, as we show, can be used
under different feature spaces and distance conditions to determine attack or
defense effectiveness across different intuitive manifolds. This notion of
adversarial gain not only provides a useful way for evaluating adversaries and
defenses, but can act as a building block for future work in robustness under
adversaries due to its rooted nature in stability and manifold theory
Unsupervised Inductive Graph-Level Representation Learning via Graph-Graph Proximity
We introduce a novel approach to graph-level representation learning, which
is to embed an entire graph into a vector space where the embeddings of two
graphs preserve their graph-graph proximity. Our approach, UGRAPHEMB, is a
general framework that provides a novel means to performing graph-level
embedding in a completely unsupervised and inductive manner. The learned neural
network can be considered as a function that receives any graph as input,
either seen or unseen in the training set, and transforms it into an embedding.
A novel graph-level embedding generation mechanism called Multi-Scale Node
Attention (MSNA), is proposed. Experiments on five real graph datasets show
that UGRAPHEMB achieves competitive accuracy in the tasks of graph
classification, similarity ranking, and graph visualization.Comment: IJCAI 2019 camera ready version with supplementary materia
Representing Sets as Summed Semantic Vectors
Representing meaning in the form of high dimensional vectors is a common and
powerful tool in biologically inspired architectures. While the meaning of a
set of concepts can be summarized by taking a (possibly weighted) sum of their
associated vectors, this has generally been treated as a one-way operation. In
this paper we show how a technique built to aid sparse vector decomposition
allows in many cases the exact recovery of the inputs and weights to such a
sum, allowing a single vector to represent an entire set of vectors from a
dictionary. We characterize the number of vectors that can be recovered under
various conditions, and explore several ways such a tool can be used for
vector-based reasoning.Comment: In Biologically Inspired Cognitive Architectures 201
Interactions of Computational Complexity Theory and Mathematics
[This paper is a (self contained) chapter in a new book, Mathematics and
Computation, whose draft is available on my homepage at
https://www.math.ias.edu/avi/book ].
We survey some concrete interaction areas between computational complexity
theory and different fields of mathematics. We hope to demonstrate here that
hardly any area of modern mathematics is untouched by the computational
connection (which in some cases is completely natural and in others may seem
quite surprising). In my view, the breadth, depth, beauty and novelty of these
connections is inspiring, and speaks to a great potential of future
interactions (which indeed, are quickly expanding). We aim for variety. We give
short, simple descriptions (without proofs or much technical detail) of ideas,
motivations, results and connections; this will hopefully entice the reader to
dig deeper. Each vignette focuses only on a single topic within a large
mathematical filed. We cover the following:
Number Theory: Primality testing
Combinatorial Geometry: Point-line incidences
Operator Theory: The Kadison-Singer problem
Metric Geometry: Distortion of embeddings
Group Theory: Generation and random generation
Statistical Physics: Monte-Carlo Markov chains
Analysis and Probability: Noise stability
Lattice Theory: Short vectors
Invariant Theory: Actions on matrix tuplesComment: 27 page
The Order Dimension of the Poset of Regions in a Hyperplane Arrangement
We show that the order dimension of the weak order on a Coxeter group of type
A, B or D is equal to the rank of the Coxeter group, and give bounds on the
order dimensions for the other finite types. This result arises from a unified
approach which, in particular, leads to a simpler treatment of the previously
known cases, types A and B. The result for weak orders follows from an upper
bound on the dimension of the poset of regions of an arbitrary hyperplane
arrangement. In some cases, including the weak orders, the upper bound is the
chromatic number of a certain graph. For the weak orders, this graph has the
positive roots as its vertex set, and the edges are related to the pairwise
inner products of the roots.Comment: Minor changes, including a correction and an added figure in the
proof of Proposition 2.2. 19 pages, 6 figure
- …