128,349 research outputs found
Sparse Overcomplete Word Vector Representations
Current distributed representations of words show little resemblance to
theories of lexical semantics. The former are dense and uninterpretable, the
latter largely based on familiar, discrete classes (e.g., supersenses) and
relations (e.g., synonymy and hypernymy). We propose methods that transform
word vectors into sparse (and optionally binary) vectors. The resulting
representations are more similar to the interpretable features typically used
in NLP, though they are discovered automatically from raw corpora. Because the
vectors are highly sparse, they are computationally easy to work with. Most
importantly, we find that they outperform the original vectors on benchmark
tasks.Comment: Proceedings of ACL 201
Geodesics on the manifold of multivariate generalized Gaussian distributions with an application to multicomponent texture discrimination
We consider the Rao geodesic distance (GD) based on the Fisher information as a similarity measure on the manifold of zero-mean multivariate generalized Gaussian distributions (MGGD). The MGGD is shown to be an adequate model for the heavy-tailed wavelet statistics in multicomponent images, such as color or multispectral images. We discuss the estimation of MGGD parameters using various methods. We apply the GD between MGGDs to color texture discrimination in several classification experiments, taking into account the correlation structure between the spectral bands in the wavelet domain. We compare the performance, both in terms of texture discrimination capability and computational load, of the GD and the Kullback-Leibler divergence (KLD). Likewise, both uni- and multivariate generalized Gaussian models are evaluated, characterized by a fixed or a variable shape parameter. The modeling of the interband correlation significantly improves classification efficiency, while the GD is shown to consistently outperform the KLD as a similarity measure
Sloshing in the LNG shipping industry: risk modelling through multivariate heavy-tail analysis
In the liquefied natural gas (LNG) shipping industry, the phenomenon of
sloshing can lead to the occurrence of very high pressures in the tanks of the
vessel. The issue of modelling or estimating the probability of the
simultaneous occurrence of such extremal pressures is now crucial from the risk
assessment point of view. In this paper, heavy-tail modelling, widely used as a
conservative approach to risk assessment and corresponding to a worst-case risk
analysis, is applied to the study of sloshing. Multivariate heavy-tailed
distributions are considered, with Sloshing pressures investigated by means of
small-scale replica tanks instrumented with d >1 sensors. When attempting to
fit such nonparametric statistical models, one naturally faces computational
issues inherent in the phenomenon of dimensionality. The primary purpose of
this article is to overcome this barrier by introducing a novel methodology.
For d-dimensional heavy-tailed distributions, the structure of extremal
dependence is entirely characterised by the angular measure, a positive measure
on the intersection of a sphere with the positive orthant in Rd. As d
increases, the mutual extremal dependence between variables becomes difficult
to assess. Based on a spectral clustering approach, we show here how a low
dimensional approximation to the angular measure may be found. The
nonparametric method proposed for model sloshing has been successfully applied
to pressure data. The parsimonious representation thus obtained proves to be
very convenient for the simulation of multivariate heavy-tailed distributions,
allowing for the implementation of Monte-Carlo simulation schemes in estimating
the probability of failure. Besides confirming its performance on artificial
data, the methodology has been implemented on a real data set specifically
collected for risk assessment of sloshing in the LNG shipping industry
- …