5,028 research outputs found
Maximum likelihood estimation of a finite mixture of logistic regression models in a continuous data stream
In marketing we are often confronted with a continuous stream of responses to
marketing messages. Such streaming data provide invaluable information
regarding message effectiveness and segmentation. However, streaming data are
hard to analyze using conventional methods: their high volume and the fact that
they are continuously augmented means that it takes considerable time to
analyze them. We propose a method for estimating a finite mixture of logistic
regression models which can be used to cluster customers based on a continuous
stream of responses. This method, which we coin oFMLR, allows segments to be
identified in data streams or extremely large static datasets. Contrary to
black box algorithms, oFMLR provides model estimates that are directly
interpretable. We first introduce oFMLR, explaining in passing general topics
such as online estimation and the EM algorithm, making this paper a high level
overview of possible methods of dealing with large data streams in marketing
practice. Next, we discuss model convergence, identifiability, and relations to
alternative, Bayesian, methods; we also identify more general issues that arise
from dealing with continuously augmented data sets. Finally, we introduce the
oFMLR [R] package and evaluate the method by numerical simulation and by
analyzing a large customer clickstream dataset.Comment: 1 figure. Working paper including [R] packag
Flexible modelling in statistics: past, present and future
In times where more and more data become available and where the data exhibit
rather complex structures (significant departure from symmetry, heavy or light
tails), flexible modelling has become an essential task for statisticians as
well as researchers and practitioners from domains such as economics, finance
or environmental sciences. This is reflected by the wealth of existing
proposals for flexible distributions; well-known examples are Azzalini's
skew-normal, Tukey's -and-, mixture and two-piece distributions, to cite
but these. My aim in the present paper is to provide an introduction to this
research field, intended to be useful both for novices and professionals of the
domain. After a description of the research stream itself, I will narrate the
gripping history of flexible modelling, starring emblematic heroes from the
past such as Edgeworth and Pearson, then depict three of the most used flexible
families of distributions, and finally provide an outlook on future flexible
modelling research by posing challenging open questions.Comment: 27 pages, 4 figure
Bayesian Analysis of ODE's: solver optimal accuracy and Bayes factors
In most relevant cases in the Bayesian analysis of ODE inverse problems, a
numerical solver needs to be used. Therefore, we cannot work with the exact
theoretical posterior distribution but only with an approximate posterior
deriving from the error in the numerical solver. To compare a numerical and the
theoretical posterior distributions we propose to use Bayes Factors (BF),
considering both of them as models for the data at hand. We prove that the
theoretical vs a numerical posterior BF tends to 1, in the same order (of the
step size used) as the numerical forward map solver does. For higher order
solvers (eg. Runge-Kutta) the Bayes Factor is already nearly 1 for step sizes
that would take far less computational effort. Considerable CPU time may be
saved by using coarser solvers that nevertheless produce practically error free
posteriors. Two examples are presented where nearly 90% CPU time is saved while
all inference results are identical to using a solver with a much finer time
step.Comment: 28 pages, 6 figure
Recent advances in directional statistics
Mainstream statistical methodology is generally applicable to data observed
in Euclidean space. There are, however, numerous contexts of considerable
scientific interest in which the natural supports for the data under
consideration are Riemannian manifolds like the unit circle, torus, sphere and
their extensions. Typically, such data can be represented using one or more
directions, and directional statistics is the branch of statistics that deals
with their analysis. In this paper we provide a review of the many recent
developments in the field since the publication of Mardia and Jupp (1999),
still the most comprehensive text on directional statistics. Many of those
developments have been stimulated by interesting applications in fields as
diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics,
image analysis, text mining, environmetrics, and machine learning. We begin by
considering developments for the exploratory analysis of directional data
before progressing to distributional models, general approaches to inference,
hypothesis testing, regression, nonparametric curve estimation, methods for
dimension reduction, classification and clustering, and the modelling of time
series, spatial and spatio-temporal data. An overview of currently available
software for analysing directional data is also provided, and potential future
developments discussed.Comment: 61 page
- …