112,362 research outputs found
Asynchronous Training of Word Embeddings for Large Text Corpora
Word embeddings are a powerful approach for analyzing language and have been
widely popular in numerous tasks in information retrieval and text mining.
Training embeddings over huge corpora is computationally expensive because the
input is typically sequentially processed and parameters are synchronously
updated. Distributed architectures for asynchronous training that have been
proposed either focus on scaling vocabulary sizes and dimensionality or suffer
from expensive synchronization latencies.
In this paper, we propose a scalable approach to train word embeddings by
partitioning the input space instead in order to scale to massive text corpora
while not sacrificing the performance of the embeddings. Our training procedure
does not involve any parameter synchronization except a final sub-model merge
phase that typically executes in a few minutes. Our distributed training scales
seamlessly to large corpus sizes and we get comparable and sometimes even up to
45% performance improvement in a variety of NLP benchmarks using models trained
by our distributed procedure which requires of the time taken by the
baseline approach. Finally we also show that we are robust to missing words in
sub-models and are able to effectively reconstruct word representations.Comment: This paper contains 9 pages and has been accepted in the WSDM201
Load-Balancing for Parallel Delaunay Triangulations
Computing the Delaunay triangulation (DT) of a given point set in
is one of the fundamental operations in computational geometry.
Recently, Funke and Sanders (2017) presented a divide-and-conquer DT algorithm
that merges two partial triangulations by re-triangulating a small subset of
their vertices - the border vertices - and combining the three triangulations
efficiently via parallel hash table lookups. The input point division should
therefore yield roughly equal-sized partitions for good load-balancing and also
result in a small number of border vertices for fast merging. In this paper, we
present a novel divide-step based on partitioning the triangulation of a small
sample of the input points. In experiments on synthetic and real-world data
sets, we achieve nearly perfectly balanced partitions and small border
triangulations. This almost cuts running time in half compared to
non-data-sensitive division schemes on inputs exhibiting an exploitable
underlying structure.Comment: Short version submitted to EuroPar 201
- …