336,376 research outputs found
Response Spectrum Estimation Using Support Vector Machines
This study investigates the applicability and efficiency of support vector machines for the problem of estimating the earthquake response spectra from the Fourier amplitude spectra of the ground motion acceleration. Two methods are commonly used for this purpose: time domain simulations, and the random vibration theory. The use of time domain simulations offers high accuracy at high computational cost, while the use random vibration theory, although not computationally intensive, requires knowledge of the statistical distribution of the response amplitudes. This study treats the task of estimating response spectra from the Fourier spectra as a nonlinear regression problem, and constructs a supervised machine learning algorithm with minimal sensitivity to noise and outliers. In this method, pairs of vectors consisting of Fourier amplitude spectra and pseudo-velocity response spectra are transformed into a high dimensional feature space where the nonlinear relationship between them can be represented as a line. No assumptions regarding the probability density function of response amplitudes are required. A practical application is presented using artificially generated accelerograms, and it is shown that the support vector machines can predict the response spectra for a wide range of vibration periods
From robust tests to Bayes-like posterior distributions
In the Bayes paradigm and for a given loss function, we propose the
construction of a new type of posterior distributions, that extends the
classical Bayes one, for estimating the law of an -sample. The loss
functions we have in mind are based on the total variation and Hellinger
distances as well as some -ones. We prove that, with a
probability close to one, this new posterior distribution concentrates its mass
in a neighbourhood of the law of the data, for the chosen loss function,
provided that this law belongs to the support of the prior or, at least, lies
close enough to it. We therefore establish that the new posterior distribution
enjoys some robustness properties with respect to a possible misspecification
of the prior, or more precisely, its support. For the total variation and
squared Hellinger losses, we also show that the posterior distribution keeps
its concentration properties when the data are only independent, hence not
necessarily i.i.d., provided that most of their marginals or the average of
these are close enough to some probability distribution around which the prior
puts enough mass. The posterior distribution is therefore also stable with
respect to the equidistribution assumption. We illustrate these results by
several applications. We consider the problems of estimating a location
parameter or both the location and the scale of a density in a nonparametric
framework. Finally, we also tackle the problem of estimating a density, with
the squared Hellinger loss, in a high-dimensional parametric model under some
sparsity conditions. The results established in this paper are non-asymptotic
and provide, as much as possible, explicit constants
Estimating differential entropy using recursive copula splitting
A method for estimating the Shannon differential entropy of multidimensional
random variables using independent samples is described. The method is based on
decomposing the distribution into a product of the marginal distributions and
the joint dependency, also known as the copula. The entropy of marginals is
estimated using one-dimensional methods. The entropy of the copula, which
always has a compact support, is estimated recursively by splitting the data
along statistically dependent dimensions. Numerical examples demonstrate that
the method is accurate for distributions with compact and non-compact supports,
which is imperative when the support is not known or of mixed type (in
different dimensions). At high dimensions (larger than 20), our method is not
only more accurate, but also significantly more efficient than existing
approaches
On Convergence of Epanechnikov Mean Shift
Epanechnikov Mean Shift is a simple yet empirically very effective algorithm
for clustering. It localizes the centroids of data clusters via estimating
modes of the probability distribution that generates the data points, using the
`optimal' Epanechnikov kernel density estimator. However, since the procedure
involves non-smooth kernel density functions, the convergence behavior of
Epanechnikov mean shift lacks theoretical support as of this writing---most of
the existing analyses are based on smooth functions and thus cannot be applied
to Epanechnikov Mean Shift. In this work, we first show that the original
Epanechnikov Mean Shift may indeed terminate at a non-critical point, due to
the non-smoothness nature. Based on our analysis, we propose a simple remedy to
fix it. The modified Epanechnikov Mean Shift is guaranteed to terminate at a
local maximum of the estimated density, which corresponds to a cluster
centroid, within a finite number of iterations. We also propose a way to avoid
running the Mean Shift iterates from every data point, while maintaining good
clustering accuracies under non-overlapping spherical Gaussian mixture models.
This further pushes Epanechnikov Mean Shift to handle very large and
high-dimensional data sets. Experiments show surprisingly good performance
compared to the Lloyd's K-means algorithm and the EM algorithm.Comment: AAAI 201
- …