6,824 research outputs found
Regression analysis with compositional data containing zero values
Regression analysis with compositional data containing zero valuesComment: The paper has been accepted for publication in the Chilean Journal of
Statistics. It consists of 12 pages with 4 figure
Stochastic Attraction-Repulsion Embedding for Large Scale Image Localization
This paper tackles the problem of large-scale image-based localization (IBL)
where the spatial location of a query image is determined by finding out the
most similar reference images in a large database. For solving this problem, a
critical task is to learn discriminative image representation that captures
informative information relevant for localization. We propose a novel
representation learning method having higher location-discriminating power. It
provides the following contributions: 1) we represent a place (location) as a
set of exemplar images depicting the same landmarks and aim to maximize
similarities among intra-place images while minimizing similarities among
inter-place images; 2) we model a similarity measure as a probability
distribution on L_2-metric distances between intra-place and inter-place image
representations; 3) we propose a new Stochastic Attraction and Repulsion
Embedding (SARE) loss function minimizing the KL divergence between the learned
and the actual probability distributions; 4) we give theoretical comparisons
between SARE, triplet ranking and contrastive losses. It provides insights into
why SARE is better by analyzing gradients. Our SARE loss is easy to implement
and pluggable to any CNN. Experiments show that our proposed method improves
the localization performance on standard benchmarks by a large margin.
Demonstrating the broad applicability of our method, we obtained the third
place out of 209 teams in the 2018 Google Landmark Retrieval Challenge. Our
code and model are available at https://github.com/Liumouliu/deepIBL.Comment: ICC
Extracting Biomolecular Interactions Using Semantic Parsing of Biomedical Text
We advance the state of the art in biomolecular interaction extraction with
three contributions: (i) We show that deep, Abstract Meaning Representations
(AMR) significantly improve the accuracy of a biomolecular interaction
extraction system when compared to a baseline that relies solely on surface-
and syntax-based features; (ii) In contrast with previous approaches that infer
relations on a sentence-by-sentence basis, we expand our framework to enable
consistent predictions over sets of sentences (documents); (iii) We further
modify and expand a graph kernel learning framework to enable concurrent
exploitation of automatically induced AMR (semantic) and dependency structure
(syntactic) representations. Our experiments show that our approach yields
interaction extraction systems that are more robust in environments where there
is a significant mismatch between training and test conditions.Comment: Appearing in Proceedings of the Thirtieth AAAI Conference on
Artificial Intelligence (AAAI-16
Approximate Bayesian computation via the energy statistic
Approximate Bayesian computation (ABC) has become an essential part of the
Bayesian toolbox for addressing problems in which the likelihood is
prohibitively expensive or entirely unknown, making it intractable. ABC defines
a pseudo-posterior by comparing observed data with simulated data,
traditionally based on some summary statistics, the elicitation of which is
regarded as a key difficulty. Recently, using data discrepancy measures has
been proposed in order to bypass the construction of summary statistics. Here
we propose to use the importance-sampling ABC (IS-ABC) algorithm relying on the
so-called two-sample energy statistic. We establish a new asymptotic result for
the case where both the observed sample size and the simulated data sample size
increase to infinity, which highlights to what extent the data discrepancy
measure impacts the asymptotic pseudo-posterior. The result holds in the broad
setting of IS-ABC methodologies, thus generalizing previous results that have
been established only for rejection ABC algorithms. Furthermore, we propose a
consistent V-statistic estimator of the energy statistic, under which we show
that the large sample result holds, and prove that the rejection ABC algorithm,
based on the energy statistic, generates pseudo-posterior distributions that
achieves convergence to the correct limits, when implemented with rejection
thresholds that converge to zero, in the finite sample setting. Our proposed
energy statistic based ABC algorithm is demonstrated on a variety of models,
including a Gaussian mixture, a moving-average model of order two, a bivariate
beta and a multivariate -and- distribution. We find that our proposed
method compares well with alternative discrepancy measures.Comment: 25 pages, 6 figures, 5 table
Information theoretic novelty detection
We present a novel approach to online change detection problems when the training sample size is small. The proposed approach is based on estimating the expected information content of a new data point and allows an accurate control of the false positive rate even for small data sets. In the case of the Gaussian distribution, our approach is analytically tractable and closely related
to classical statistical tests. We then propose an approximation scheme to extend our approach to the case of the mixture of Gaussians. We evaluate extensively our approach on synthetic data and on three real benchmark data
sets. The experimental validation shows that our method maintains a good overall accuracy, but significantly improves the control over the false positive rate
- …