10,486 research outputs found
Characterizing the impact of geometric properties of word embeddings on task performance
Analysis of word embedding properties to inform their use in downstream NLP
tasks has largely been studied by assessing nearest neighbors. However,
geometric properties of the continuous feature space contribute directly to the
use of embedding features in downstream models, and are largely unexplored. We
consider four properties of word embedding geometry, namely: position relative
to the origin, distribution of features in the vector space, global pairwise
distances, and local pairwise distances. We define a sequence of
transformations to generate new embeddings that expose subsets of these
properties to downstream models and evaluate change in task performance to
understand the contribution of each property to NLP models. We transform
publicly available pretrained embeddings from three popular toolkits (word2vec,
GloVe, and FastText) and evaluate on a variety of intrinsic tasks, which model
linguistic information in the vector space, and extrinsic tasks, which use
vectors as input to machine learning models. We find that intrinsic evaluations
are highly sensitive to absolute position, while extrinsic tasks rely primarily
on local similarity. Our findings suggest that future embedding models and
post-processing techniques should focus primarily on similarity to nearby
points in vector space.Comment: Appearing in the Third Workshop on Evaluating Vector Space
Representations for NLP (RepEval 2019). 7 pages + reference
Defining Equitable Geographic Districts in Road Networks via Stable Matching
We introduce a novel method for defining geographic districts in road
networks using stable matching. In this approach, each geographic district is
defined in terms of a center, which identifies a location of interest, such as
a post office or polling place, and all other network vertices must be labeled
with the center to which they are associated. We focus on defining geographic
districts that are equitable, in that every district has the same number of
vertices and the assignment is stable in terms of geographic distance. That is,
there is no unassigned vertex-center pair such that both would prefer each
other over their current assignments. We solve this problem using a version of
the classic stable matching problem, called symmetric stable matching, in which
the preferences of the elements in both sets obey a certain symmetry. In our
case, we study a graph-based version of stable matching in which nodes are
stably matched to a subset of nodes denoted as centers, prioritized by their
shortest-path distances, so that each center is apportioned a certain number of
nodes. We show that, for a planar graph or road network with nodes and
centers, the problem can be solved in time, which improves
upon the runtime of using the classic Gale-Shapley stable matching
algorithm when is large. Finally, we provide experimental results on road
networks for these algorithms and a heuristic algorithm that performs better
than the Gale-Shapley algorithm for any range of values of .Comment: 9 pages, 4 figures, to appear in 25th ACM SIGSPATIAL International
Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL
2017) November 7-10, 2017, Redondo Beach, California, US
Low-shot learning with large-scale diffusion
This paper considers the problem of inferring image labels from images when
only a few annotated examples are available at training time. This setup is
often referred to as low-shot learning, where a standard approach is to
re-train the last few layers of a convolutional neural network learned on
separate classes for which training examples are abundant. We consider a
semi-supervised setting based on a large collection of images to support label
propagation. This is possible by leveraging the recent advances on large-scale
similarity graph construction.
We show that despite its conceptual simplicity, scaling label propagation up
to hundred millions of images leads to state of the art accuracy in the
low-shot learning regime
The Branched Polymer Growth Model Revisited
The Branched Polymer Growth Model (BPGM) has been employed to study the
kinetic growth of ramified polymers in the presence of impurities. In this
article, the BPGM is revisited on the square lattice and a subtle modification
in its dynamics is proposed in order to adapt it to a scenario closer to
reality and experimentation. This new version of the model is denominated the
Adapted Branched Polymer Growth Model (ABPGM). It is shown that the ABPGM
preserves the functionalities of the monomers and so recovers the branching
probability b as an input parameter which effectively controls the relative
incidence of bifurcations. The critical locus separating infinite from finite
growth regimes of the ABPGM is obtained in the (b,c) space (where c is the
impurity concentration). Unlike the original model, the phase diagram of the
ABPGM exhibits a peculiar reentrance.Comment: 8 pages, 10 figures. To be published in PHYSICA
Net and Prune: A Linear Time Algorithm for Euclidean Distance Problems
We provide a general framework for getting expected linear time constant
factor approximations (and in many cases FPTAS's) to several well known
problems in Computational Geometry, such as -center clustering and farthest
nearest neighbor. The new approach is robust to variations in the input
problem, and yet it is simple, elegant and practical. In particular, many of
these well studied problems which fit easily into our framework, either
previously had no linear time approximation algorithm, or required rather
involved algorithms and analysis. A short list of the problems we consider
include farthest nearest neighbor, -center clustering, smallest disk
enclosing points, th largest distance, th smallest -nearest
neighbor distance, th heaviest edge in the MST and other spanning forest
type problems, problems involving upward closed set systems, and more. Finally,
we show how to extend our framework such that the linear running time bound
holds with high probability
Fast Construction of Nets in Low Dimensional Metrics, and Their Applications
We present a near linear time algorithm for constructing hierarchical nets in
finite metric spaces with constant doubling dimension. This data-structure is
then applied to obtain improved algorithms for the following problems:
Approximate nearest neighbor search, well-separated pair decomposition, compact
representation scheme, doubling measure, and computation of the (approximate)
Lipschitz constant of a function. In all cases, the running (preprocessing)
time is near-linear and the space being used is linear.Comment: 41 pages. Extensive clean-up of minor English error
- …