Search CORE

10,486 research outputs found

Characterizing the impact of geometric properties of word embeddings on task performance

Author: Ferhatosmanoglu Hakan
Fosler-Lussier Eric
Haldar Aparajita
Newman-Griffis Denis
Whitaker Brendan
Publication venue
Publication date: 01/01/2019
Field of study

Analysis of word embedding properties to inform their use in downstream NLP tasks has largely been studied by assessing nearest neighbors. However, geometric properties of the continuous feature space contribute directly to the use of embedding features in downstream models, and are largely unexplored. We consider four properties of word embedding geometry, namely: position relative to the origin, distribution of features in the vector space, global pairwise distances, and local pairwise distances. We define a sequence of transformations to generate new embeddings that expose subsets of these properties to downstream models and evaluate change in task performance to understand the contribution of each property to NLP models. We transform publicly available pretrained embeddings from three popular toolkits (word2vec, GloVe, and FastText) and evaluate on a variety of intrinsic tasks, which model linguistic information in the vector space, and extrinsic tasks, which use vectors as input to machine learning models. We find that intrinsic evaluations are highly sensitive to absolute position, while extrinsic tasks rely primarily on local similarity. Our findings suggest that future embedding models and post-processing techniques should focus primarily on similarity to nearby points in vector space.Comment: Appearing in the Third Workshop on Evaluating Vector Space Representations for NLP (RepEval 2019). 7 pages + reference

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Warwick Research Archives Portal Repository

White Rose Research Online

Defining Equitable Geographic Districts in Road Networks via Stable Matching

Author: Eppstein David
Goodrich Michael
Korkmaz Doruk
Mamano Nil
Publication venue
Publication date: 20/09/2017
Field of study

We introduce a novel method for defining geographic districts in road networks using stable matching. In this approach, each geographic district is defined in terms of a center, which identifies a location of interest, such as a post office or polling place, and all other network vertices must be labeled with the center to which they are associated. We focus on defining geographic districts that are equitable, in that every district has the same number of vertices and the assignment is stable in terms of geographic distance. That is, there is no unassigned vertex-center pair such that both would prefer each other over their current assignments. We solve this problem using a version of the classic stable matching problem, called symmetric stable matching, in which the preferences of the elements in both sets obey a certain symmetry. In our case, we study a graph-based version of stable matching in which nodes are stably matched to a subset of nodes denoted as centers, prioritized by their shortest-path distances, so that each center is apportioned a certain number of nodes. We show that, for a planar graph or road network with

n

nodes and

k

centers, the problem can be solved in

O(n\sqrt{n}\log n)

time, which improves upon the

O(nk)

runtime of using the classic Gale-Shapley stable matching algorithm when

k

is large. Finally, we provide experimental results on road networks for these algorithms and a heuristic algorithm that performs better than the Gale-Shapley algorithm for any range of values of

k

.Comment: 9 pages, 4 figures, to appear in 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL 2017) November 7-10, 2017, Redondo Beach, California, US

arXiv.org e-Print Archive

Crossref

Low-shot learning with large-scale diffusion

Author: Douze Matthijs
Hariharan Bharath
Jégou Hervé
Szlam Arthur
Publication venue
Publication date: 15/06/2018
Field of study

This paper considers the problem of inferring image labels from images when only a few annotated examples are available at training time. This setup is often referred to as low-shot learning, where a standard approach is to re-train the last few layers of a convolutional neural network learned on separate classes for which training examples are abundant. We consider a semi-supervised setting based on a large collection of images to support label propagation. This is possible by leveraging the recent advances on large-scale similarity graph construction. We show that despite its conceptual simplicity, scaling label propagation up to hundred millions of images leads to state of the art accuracy in the low-shot learning regime

arXiv.org e-Print Archive

Crossref

The Branched Polymer Growth Model Revisited

Author: Andrade
André L. Botelho
Bunde
Camacho
de Gennes
Flory
Havlin
Lubensky
Lubensky
Lucena
Lucena
Lyklema
Majid
Neves
Onody
Onody
Porto
Roberto N. Onody
Ubiraci P.C. Neves
Publication venue: 'Elsevier BV'
Publication date: 21/02/2003
Field of study

The Branched Polymer Growth Model (BPGM) has been employed to study the kinetic growth of ramified polymers in the presence of impurities. In this article, the BPGM is revisited on the square lattice and a subtle modification in its dynamics is proposed in order to adapt it to a scenario closer to reality and experimentation. This new version of the model is denominated the Adapted Branched Polymer Growth Model (ABPGM). It is shown that the ABPGM preserves the functionalities of the monomers and so recovers the branching probability b as an input parameter which effectively controls the relative incidence of bifurcations. The critical locus separating infinite from finite growth regimes of the ABPGM is obtained in the (b,c) space (where c is the impurity concentration). Unlike the original model, the phase diagram of the ABPGM exhibits a peculiar reentrance.Comment: 8 pages, 10 figures. To be published in PHYSICA

arXiv.org e-Print Archive

Crossref

Net and Prune: A Linear Time Algorithm for Euclidean Distance Problems

Author: Har-Peled Sariel
Raichel Banjamin
Publication venue
Publication date: 25/09/2014
Field of study

We provide a general framework for getting expected linear time constant factor approximations (and in many cases FPTAS's) to several well known problems in Computational Geometry, such as

k

-center clustering and farthest nearest neighbor. The new approach is robust to variations in the input problem, and yet it is simple, elegant and practical. In particular, many of these well studied problems which fit easily into our framework, either previously had no linear time approximation algorithm, or required rather involved algorithms and analysis. A short list of the problems we consider include farthest nearest neighbor,

k

-center clustering, smallest disk enclosing

k

points,

k

th largest distance,

k

th smallest

m

-nearest neighbor distance,

k

th heaviest edge in the MST and other spanning forest type problems, problems involving upward closed set systems, and more. Finally, we show how to extend our framework such that the linear running time bound holds with high probability

arXiv.org e-Print Archive

CiteSeerX

Fast Construction of Nets in Low Dimensional Metrics, and Their Applications

Author: Har-Peled Sariel
Mendel Manor
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2005
Field of study

We present a near linear time algorithm for constructing hierarchical nets in finite metric spaces with constant doubling dimension. This data-structure is then applied to obtain improved algorithms for the following problems: Approximate nearest neighbor search, well-separated pair decomposition, compact representation scheme, doubling measure, and computation of the (approximate) Lipschitz constant of a function. In all cases, the running (preprocessing) time is near-linear and the space being used is linear.Comment: 41 pages. Extensive clean-up of minor English error

arXiv.org e-Print Archive

CiteSeerX

Caltech Authors