4,727 research outputs found

    Clustering and cluster inference of complex data structures

    Get PDF
    Finite mixtures provide a flexible and powerful tool for fitting univariate and multivariate distributions that cannot be captured by standard statistical distributions. In particular, multivariate mixtures have been widely used to perform modelling and cluster analysis of high-dimensional data in a wide range of applications. Modes of mixture densities have been used with great success for organizing mixture components into homogenous groups. But the results are limited to normal mixtures. Beyond the clustering application existing research in this area has provided fundamental results regarding the upper bound of the number of modes, but they too are limited to normal mixtures. This thesis provides new modality theorems and important analytical results on the upper bound of the number of modes for multivariate t-mixtures and compares them with existing results on normal mixtures. Graphical tools for merging t-mixtures and the effect of degrees-of-freedom are also thoroughly examined. The most important contribution of this thesis are a set of fundamental results on the modality of skewed normal and skewed normal mixtures. First, we show that the topography of high-dimensional skew normal mixtures can be analyzed rigorously in lower dimensions by defining the corresponding ridgeline manifold that contains all critical points, as well as the ridges of the density. But unlike the normal or t-mixtures we need to solve an implicit equation to obtain this manifold. The plot of the elevations on the ridgeline can still be used to develop tools to explore the number of modes and for merging mixture components. Though analytical results on the number of modes cannot be explored any more, the elevation plots lead to a new conjecture on the upper bound on the number of modes of skew normal mixture. Unlike the normal and t-distribution, for skew normal distributions even the one-component counterpart have very interesting modal features. Firstly, as the modes cannot be written in closed form, we design and provide software tools to calculate the modes in any dimensions. We also provide a thorough study exploring the relationship between the means and modes of skew normals and provide fundamental results on the limiting behaviour of the mean and mode as the skewness parameter increases. We also provide another new result showing that though the mean can vary widely as the skewness parameter varies, the mode is a much more robust measure of the central tendency as the mode of skew distribution only varies within a smaller range. Two R-package available on github containing the numerical tools for calculating the modes of skew normals and function specific to merging of skew normal components is provided as part of this thesis. Additionally, application of the merging tool developed of skew normal mixtures is demonstrated using flow-cytomtery data

    Flexible modelling in statistics: past, present and future

    Get PDF
    In times where more and more data become available and where the data exhibit rather complex structures (significant departure from symmetry, heavy or light tails), flexible modelling has become an essential task for statisticians as well as researchers and practitioners from domains such as economics, finance or environmental sciences. This is reflected by the wealth of existing proposals for flexible distributions; well-known examples are Azzalini's skew-normal, Tukey's gg-and-hh, mixture and two-piece distributions, to cite but these. My aim in the present paper is to provide an introduction to this research field, intended to be useful both for novices and professionals of the domain. After a description of the research stream itself, I will narrate the gripping history of flexible modelling, starring emblematic heroes from the past such as Edgeworth and Pearson, then depict three of the most used flexible families of distributions, and finally provide an outlook on future flexible modelling research by posing challenging open questions.Comment: 27 pages, 4 figure

    EMMIXcskew: an R Package for the Fitting of a Mixture of Canonical Fundamental Skew t-Distributions

    Get PDF
    This paper presents an R package EMMIXcskew for the fitting of the canonical fundamental skew t-distribution (CFUST) and finite mixtures of this distribution (FM-CFUST) via maximum likelihood (ML). The CFUST distribution provides a flexible family of models to handle non-normal data, with parameters for capturing skewness and heavy-tails in the data. It formally encompasses the normal, t, and skew-normal distributions as special and/or limiting cases. A few other versions of the skew t-distributions are also nested within the CFUST distribution. In this paper, an Expectation-Maximization (EM) algorithm is described for computing the ML estimates of the parameters of the FM-CFUST model, and different strategies for initializing the algorithm are discussed and illustrated. The methodology is implemented in the EMMIXcskew package, and examples are presented using two real datasets. The EMMIXcskew package contains functions to fit the FM-CFUST model, including procedures for generating different initial values. Additional features include random sample generation and contour visualization in 2D and 3D

    Bayesian modelling of skewness and kurtosis with two-piece scale and shape distributions

    Get PDF
    We formalise and generalise the definition of the family of univariate double two--piece distributions, obtained by using a density--based transformation of unimodal symmetric continuous distributions with a shape parameter. The resulting distributions contain five interpretable parameters that control the mode, as well as the scale and shape in each direction. Four-parameter subfamilies of this class of distributions that capture different types of asymmetry are discussed. We propose interpretable scale and location-invariant benchmark priors and derive conditions for the propriety of the corresponding posterior distribution. The prior structures used allow for meaningful comparisons through Bayes factors within flexible families of distributions. These distributions are applied to data from finance, internet traffic and medicine, comparing them with appropriate competitors
    • …
    corecore