8,441 research outputs found
Sequential Monte Carlo EM for multivariate probit models
Multivariate probit models (MPM) have the appealing feature of capturing some
of the dependence structure between the components of multidimensional binary
responses. The key for the dependence modelling is the covariance matrix of an
underlying latent multivariate Gaussian. Most approaches to MLE in multivariate
probit regression rely on MCEM algorithms to avoid computationally intensive
evaluations of multivariate normal orthant probabilities. As an alternative to
the much used Gibbs sampler a new SMC sampler for truncated multivariate
normals is proposed. The algorithm proceeds in two stages where samples are
first drawn from truncated multivariate Student distributions and then
further evolved towards a Gaussian. The sampler is then embedded in a MCEM
algorithm. The sequential nature of SMC methods can be exploited to design a
fully sequential version of the EM, where the samples are simply updated from
one iteration to the next rather than resampled from scratch. Recycling the
samples in this manner significantly reduces the computational cost. An
alternative view of the standard conditional maximisation step provides the
basis for an iterative procedure to fully perform the maximisation needed in
the EM algorithm. The identifiability of MPM is also thoroughly discussed. In
particular, the likelihood invariance can be embedded in the EM algorithm to
ensure that constrained and unconstrained maximisation are equivalent. A simple
iterative procedure is then derived for either maximisation which takes
effectively no computational time. The method is validated by applying it to
the widely analysed Six Cities dataset and on a higher dimensional simulated
example. Previous approaches to the Six Cities overly restrict the parameter
space but, by considering the correct invariance, the maximum likelihood is
quite naturally improved when treating the full unrestricted model.Comment: 26 pages, 2 figures. In press, Computational Statistics & Data
Analysi
Open TURNS: An industrial software for uncertainty quantification in simulation
The needs to assess robust performances for complex systems and to answer
tighter regulatory processes (security, safety, environmental control, and
health impacts, etc.) have led to the emergence of a new industrial simulation
challenge: to take uncertainties into account when dealing with complex
numerical simulation frameworks. Therefore, a generic methodology has emerged
from the joint effort of several industrial companies and academic
institutions. EDF R&D, Airbus Group and Phimeca Engineering started a
collaboration at the beginning of 2005, joined by IMACS in 2014, for the
development of an Open Source software platform dedicated to uncertainty
propagation by probabilistic methods, named OpenTURNS for Open source Treatment
of Uncertainty, Risk 'N Statistics. OpenTURNS addresses the specific industrial
challenges attached to uncertainties, which are transparency, genericity,
modularity and multi-accessibility. This paper focuses on OpenTURNS and
presents its main features: openTURNS is an open source software under the LGPL
license, that presents itself as a C++ library and a Python TUI, and which
works under Linux and Windows environment. All the methodological tools are
described in the different sections of this paper: uncertainty quantification,
uncertainty propagation, sensitivity analysis and metamodeling. A section also
explains the generic wrappers way to link openTURNS to any external code. The
paper illustrates as much as possible the methodological tools on an
educational example that simulates the height of a river and compares it to the
height of a dyke that protects industrial facilities. At last, it gives an
overview of the main developments planned for the next few years
Recent advances in directional statistics
Mainstream statistical methodology is generally applicable to data observed
in Euclidean space. There are, however, numerous contexts of considerable
scientific interest in which the natural supports for the data under
consideration are Riemannian manifolds like the unit circle, torus, sphere and
their extensions. Typically, such data can be represented using one or more
directions, and directional statistics is the branch of statistics that deals
with their analysis. In this paper we provide a review of the many recent
developments in the field since the publication of Mardia and Jupp (1999),
still the most comprehensive text on directional statistics. Many of those
developments have been stimulated by interesting applications in fields as
diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics,
image analysis, text mining, environmetrics, and machine learning. We begin by
considering developments for the exploratory analysis of directional data
before progressing to distributional models, general approaches to inference,
hypothesis testing, regression, nonparametric curve estimation, methods for
dimension reduction, classification and clustering, and the modelling of time
series, spatial and spatio-temporal data. An overview of currently available
software for analysing directional data is also provided, and potential future
developments discussed.Comment: 61 page
Multigraded Hilbert Series of noncommutative modules
In this paper, we propose methods for computing the Hilbert series of
multigraded right modules over the free associative algebra. In particular, we
compute such series for noncommutative multigraded algebras. Using results from
the theory of regular languages, we provide conditions when the methods are
effective and hence the sum of the Hilbert series is a rational function.
Moreover, a characterization of finite-dimensional algebras is obtained in
terms of the nilpotency of a key matrix involved in the computations. Using
this result, efficient variants of the methods are also developed for the
computation of Hilbert series of truncated infinite-dimensional algebras whose
(non-truncated) Hilbert series may not be rational functions. We consider some
applications of the computation of multigraded Hilbert series to algebras that
are invariant under the action of the general linear group. In fact, in this
case such series are symmetric functions which can be decomposed in terms of
Schur functions. Finally, we present an efficient and complete implementation
of (standard) graded and multigraded Hilbert series that has been developed in
the kernel of the computer algebra system Singular. A large set of tests
provides a comprehensive experimentation for the proposed algorithms and their
implementations.Comment: 28 pages, to appear in Journal of Algebr
Generalized Linear Models for Geometrical Current predictors. An application to predict garment fit
The aim of this paper is to model an ordinal response variable in terms
of vector-valued functional data included on a vector-valued RKHS. In particular,
we focus on the vector-valued RKHS obtained when a geometrical object (body) is
characterized by a current and on the ordinal regression model. A common way to
solve this problem in functional data analysis is to express the data in the orthonormal
basis given by decomposition of the covariance operator. But our data present very important differences with respect to the usual functional data setting. On the one
hand, they are vector-valued functions, and on the other, they are functions in an
RKHS with a previously defined norm. We propose to use three different bases: the
orthonormal basis given by the kernel that defines the RKHS, a basis obtained from
decomposition of the integral operator defined using the covariance function, and a
third basis that combines the previous two. The three approaches are compared and
applied to an interesting problem: building a model to predict the fit of children’s
garment sizes, based on a 3D database of the Spanish child population. Our proposal
has been compared with alternative methods that explore the performance of other
classifiers (Suppport Vector Machine and k-NN), and with the result of applying
the classification method proposed in this work, from different characterizations of
the objects (landmarks and multivariate anthropometric measurements instead of
currents), obtaining in all these cases worst results
- …