1,718 research outputs found

    Hyperspherical Prototype Networks

    Full text link
    This paper introduces hyperspherical prototype networks, which unify classification and regression with prototypes on hyperspherical output spaces. For classification, a common approach is to define prototypes as the mean output vector over training examples per class. Here, we propose to use hyperspheres as output spaces, with class prototypes defined a priori with large margin separation. We position prototypes through data-independent optimization, with an extension to incorporate priors from class semantics. By doing so, we do not require any prototype updating, we can handle any training size, and the output dimensionality is no longer constrained to the number of classes. Furthermore, we generalize to regression, by optimizing outputs as an interpolation between two prototypes on the hypersphere. Since both tasks are now defined by the same loss function, they can be jointly trained for multi-task problems. Experimentally, we show the benefit of hyperspherical prototype networks for classification, regression, and their combination over other prototype methods, softmax cross-entropy, and mean squared error approaches.Comment: NeurIPS 201

    Neural Metric Learning for Fast End-to-End Relation Extraction

    Full text link
    Relation extraction (RE) is an indispensable information extraction task in several disciplines. RE models typically assume that named entity recognition (NER) is already performed in a previous step by another independent model. Several recent efforts, under the theme of end-to-end RE, seek to exploit inter-task correlations by modeling both NER and RE tasks jointly. Earlier work in this area commonly reduces the task to a table-filling problem wherein an additional expensive decoding step involving beam search is applied to obtain globally consistent cell labels. In efforts that do not employ table-filling, global optimization in the form of CRFs with Viterbi decoding for the NER component is still necessary for competitive performance. We introduce a novel neural architecture utilizing the table structure, based on repeated applications of 2D convolutions for pooling local dependency and metric-based features, that improves on the state-of-the-art without the need for global optimization. We validate our model on the ADE and CoNLL04 datasets for end-to-end RE and demonstrate ≈1%\approx 1\% gain (in F-score) over prior best results with training and testing times that are seven to ten times faster --- the latter highly advantageous for time-sensitive end user applications

    Word, graph and manifold embedding from Markov processes

    Full text link
    Continuous vector representations of words and objects appear to carry surprisingly rich semantic content. In this paper, we advance both the conceptual and theoretical understanding of word embeddings in three ways. First, we ground embeddings in semantic spaces studied in cognitive-psychometric literature and introduce new evaluation tasks. Second, in contrast to prior work, we take metric recovery as the key object of study, unify existing algorithms as consistent metric recovery methods based on co-occurrence counts from simple Markov random walks, and propose a new recovery algorithm. Third, we generalize metric recovery to graphs and manifolds, relating co-occurence counts on random walks in graphs and random processes on manifolds to the underlying metric to be recovered, thereby reconciling manifold estimation and embedding algorithms. We compare embedding algorithms across a range of tasks, from nonlinear dimensionality reduction to three semantic language tasks, including analogies, sequence completion, and classification

    On Approximation Guarantees for Greedy Low Rank Optimization

    Full text link
    We provide new approximation guarantees for greedy low rank matrix estimation under standard assumptions of restricted strong convexity and smoothness. Our novel analysis also uncovers previously unknown connections between the low rank estimation and combinatorial optimization, so much so that our bounds are reminiscent of corresponding approximation bounds in submodular maximization. Additionally, we also provide statistical recovery guarantees. Finally, we present empirical comparison of greedy estimation with established baselines on two important real-world problems

    Learning Cross-lingual Embeddings from Twitter via Distant Supervision

    Full text link
    Cross-lingual embeddings represent the meaning of words from different languages in the same vector space. Recent work has shown that it is possible to construct such representations by aligning independently learned monolingual embedding spaces, and that accurate alignments can be obtained even without external bilingual data. In this paper we explore a research direction that has been surprisingly neglected in the literature: leveraging noisy user-generated text to learn cross-lingual embeddings particularly tailored towards social media applications. While the noisiness and informal nature of the social media genre poses additional challenges to cross-lingual embedding methods, we find that it also provides key opportunities due to the abundance of code-switching and the existence of a shared vocabulary of emoji and named entities. Our contribution consists of a very simple post-processing step that exploits these phenomena to significantly improve the performance of state-of-the-art alignment methods.Comment: Accepted to ICWSM 2020. 11 pages, 1 appendix. Pre-trained embeddings available at https://github.com/pedrada88/crossembeddings-twitte

    Adversarial Gain

    Full text link
    Adversarial examples can be defined as inputs to a model which induce a mistake - where the model output is different than that of an oracle, perhaps in surprising or malicious ways. Original models of adversarial attacks are primarily studied in the context of classification and computer vision tasks. While several attacks have been proposed in natural language processing (NLP) settings, they often vary in defining the parameters of an attack and what a successful attack would look like. The goal of this work is to propose a unifying model of adversarial examples suitable for NLP tasks in both generative and classification settings. We define the notion of adversarial gain: based in control theory, it is a measure of the change in the output of a system relative to the perturbation of the input (caused by the so-called adversary) presented to the learner. This definition, as we show, can be used under different feature spaces and distance conditions to determine attack or defense effectiveness across different intuitive manifolds. This notion of adversarial gain not only provides a useful way for evaluating adversaries and defenses, but can act as a building block for future work in robustness under adversaries due to its rooted nature in stability and manifold theory

    Unsupervised Inductive Graph-Level Representation Learning via Graph-Graph Proximity

    Full text link
    We introduce a novel approach to graph-level representation learning, which is to embed an entire graph into a vector space where the embeddings of two graphs preserve their graph-graph proximity. Our approach, UGRAPHEMB, is a general framework that provides a novel means to performing graph-level embedding in a completely unsupervised and inductive manner. The learned neural network can be considered as a function that receives any graph as input, either seen or unseen in the training set, and transforms it into an embedding. A novel graph-level embedding generation mechanism called Multi-Scale Node Attention (MSNA), is proposed. Experiments on five real graph datasets show that UGRAPHEMB achieves competitive accuracy in the tasks of graph classification, similarity ranking, and graph visualization.Comment: IJCAI 2019 camera ready version with supplementary materia

    Representing Sets as Summed Semantic Vectors

    Full text link
    Representing meaning in the form of high dimensional vectors is a common and powerful tool in biologically inspired architectures. While the meaning of a set of concepts can be summarized by taking a (possibly weighted) sum of their associated vectors, this has generally been treated as a one-way operation. In this paper we show how a technique built to aid sparse vector decomposition allows in many cases the exact recovery of the inputs and weights to such a sum, allowing a single vector to represent an entire set of vectors from a dictionary. We characterize the number of vectors that can be recovered under various conditions, and explore several ways such a tool can be used for vector-based reasoning.Comment: In Biologically Inspired Cognitive Architectures 201

    Interactions of Computational Complexity Theory and Mathematics

    Full text link
    [This paper is a (self contained) chapter in a new book, Mathematics and Computation, whose draft is available on my homepage at https://www.math.ias.edu/avi/book ]. We survey some concrete interaction areas between computational complexity theory and different fields of mathematics. We hope to demonstrate here that hardly any area of modern mathematics is untouched by the computational connection (which in some cases is completely natural and in others may seem quite surprising). In my view, the breadth, depth, beauty and novelty of these connections is inspiring, and speaks to a great potential of future interactions (which indeed, are quickly expanding). We aim for variety. We give short, simple descriptions (without proofs or much technical detail) of ideas, motivations, results and connections; this will hopefully entice the reader to dig deeper. Each vignette focuses only on a single topic within a large mathematical filed. We cover the following: ∙\bullet Number Theory: Primality testing ∙\bullet Combinatorial Geometry: Point-line incidences ∙\bullet Operator Theory: The Kadison-Singer problem ∙\bullet Metric Geometry: Distortion of embeddings ∙\bullet Group Theory: Generation and random generation ∙\bullet Statistical Physics: Monte-Carlo Markov chains ∙\bullet Analysis and Probability: Noise stability ∙\bullet Lattice Theory: Short vectors ∙\bullet Invariant Theory: Actions on matrix tuplesComment: 27 page

    The Order Dimension of the Poset of Regions in a Hyperplane Arrangement

    Get PDF
    We show that the order dimension of the weak order on a Coxeter group of type A, B or D is equal to the rank of the Coxeter group, and give bounds on the order dimensions for the other finite types. This result arises from a unified approach which, in particular, leads to a simpler treatment of the previously known cases, types A and B. The result for weak orders follows from an upper bound on the dimension of the poset of regions of an arbitrary hyperplane arrangement. In some cases, including the weak orders, the upper bound is the chromatic number of a certain graph. For the weak orders, this graph has the positive roots as its vertex set, and the edges are related to the pairwise inner products of the roots.Comment: Minor changes, including a correction and an added figure in the proof of Proposition 2.2. 19 pages, 6 figure
    • …
    corecore