442 research outputs found
Cross-lingual Entity Alignment via Joint Attribute-Preserving Embedding
Entity alignment is the task of finding entities in two knowledge bases (KBs)
that represent the same real-world object. When facing KBs in different natural
languages, conventional cross-lingual entity alignment methods rely on machine
translation to eliminate the language barriers. These approaches often suffer
from the uneven quality of translations between languages. While recent
embedding-based techniques encode entities and relationships in KBs and do not
need machine translation for cross-lingual entity alignment, a significant
number of attributes remain largely unexplored. In this paper, we propose a
joint attribute-preserving embedding model for cross-lingual entity alignment.
It jointly embeds the structures of two KBs into a unified vector space and
further refines it by leveraging attribute correlations in the KBs. Our
experimental results on real-world datasets show that this approach
significantly outperforms the state-of-the-art embedding approaches for
cross-lingual entity alignment and could be complemented with methods based on
machine translation
Divide and Conquer Kernel Ridge Regression: A Distributed Algorithm with Minimax Optimal Rates
We establish optimal convergence rates for a decomposition-based scalable
approach to kernel ridge regression. The method is simple to describe: it
randomly partitions a dataset of size N into m subsets of equal size, computes
an independent kernel ridge regression estimator for each subset, then averages
the local solutions into a global predictor. This partitioning leads to a
substantial reduction in computation time versus the standard approach of
performing kernel ridge regression on all N samples. Our two main theorems
establish that despite the computational speed-up, statistical optimality is
retained: as long as m is not too large, the partition-based estimator achieves
the statistical minimax rate over all estimators using the set of N samples. As
concrete examples, our theory guarantees that the number of processors m may
grow nearly linearly for finite-rank kernels and Gaussian kernels and
polynomially in N for Sobolev spaces, which in turn allows for substantial
reductions in computational cost. We conclude with experiments on both
simulated data and a music-prediction task that complement our theoretical
results, exhibiting the computational and statistical benefits of our approach
Randomized Smoothing for Stochastic Optimization
We analyze convergence rates of stochastic optimization procedures for
non-smooth convex optimization problems. By combining randomized smoothing
techniques with accelerated gradient methods, we obtain convergence rates of
stochastic optimization procedures, both in expectation and with high
probability, that have optimal dependence on the variance of the gradient
estimates. To the best of our knowledge, these are the first variance-based
rates for non-smooth optimization. We give several applications of our results
to statistical estimation problems, and provide experimental results that
demonstrate the effectiveness of the proposed algorithms. We also describe how
a combination of our algorithm with recent work on decentralized optimization
yields a distributed stochastic optimization algorithm that is order-optimal.Comment: 39 pages, 3 figure
Distributed representation of multi-sense words: A loss-driven approach
Word2Vec's Skip Gram model is the current state-of-the-art approach for
estimating the distributed representation of words. However, it assumes a
single vector per word, which is not well-suited for representing words that
have multiple senses. This work presents LDMI, a new model for estimating
distributional representations of words. LDMI relies on the idea that, if a
word carries multiple senses, then having a different representation for each
of its senses should lead to a lower loss associated with predicting its
co-occurring words, as opposed to the case when a single vector representation
is used for all the senses. After identifying the multi-sense words, LDMI
clusters the occurrences of these words to assign a sense to each occurrence.
Experiments on the contextual word similarity task show that LDMI leads to
better performance than competing approaches.Comment: PAKDD 2018 Best paper award runner-u
S-OHEM: Stratified Online Hard Example Mining for Object Detection
One of the major challenges in object detection is to propose detectors with
highly accurate localization of objects. The online sampling of high-loss
region proposals (hard examples) uses the multitask loss with equal weight
settings across all loss types (e.g, classification and localization, rigid and
non-rigid categories) and ignores the influence of different loss distributions
throughout the training process, which we find essential to the training
efficacy. In this paper, we present the Stratified Online Hard Example Mining
(S-OHEM) algorithm for training higher efficiency and accuracy detectors.
S-OHEM exploits OHEM with stratified sampling, a widely-adopted sampling
technique, to choose the training examples according to this influence during
hard example mining, and thus enhance the performance of object detectors. We
show through systematic experiments that S-OHEM yields an average precision
(AP) improvement of 0.5% on rigid categories of PASCAL VOC 2007 for both the
IoU threshold of 0.6 and 0.7. For KITTI 2012, both results of the same metric
are 1.6%. Regarding the mean average precision (mAP), a relative increase of
0.3% and 0.5% (1% and 0.5%) is observed for VOC07 (KITTI12) using the same set
of IoU threshold. Also, S-OHEM is easy to integrate with existing region-based
detectors and is capable of acting with post-recognition level regressors.Comment: 9 pages, 3 figures, accepted by CCCV 201
- …