232 research outputs found

    Self-organising maps : statistical analysis, treatment and applications.

    Get PDF
    This thesis presents some substantial theoretical analyses and optimal treatments of Kohonen's self-organising map (SOM) algorithm, and explores the practical application potential of the algorithm for vector quantisation, pattern classification, and image processing. It consists of two major parts. In the first part, the SOM algorithm is investigated and analysed from a statistical viewpoint. The proof of its universal convergence for any dimensionality is obtained using a novel and extended form of the Central Limit Theorem. Its feature space is shown to be an approximate multivariate Gaussian process, which will eventually converge and form a mapping, which minimises the mean-square distortion between the feature and input spaces. The diminishing effect of the initial states and implicit effects of the learning rate and neighbourhood function on its convergence and ordering are analysed and discussed. Distinct and meaningful definitions, and associated measures, of its ordering are presented in relation to map's fault-tolerance. The SOM algorithm is further enhanced by incorporating a proposed constraint, or Bayesian modification, in order to achieve optimal vector quantisation or pattern classification. The second part of this thesis addresses the task of unsupervised texture-image segmentation by means of SOM networks and model-based descriptions. A brief review of texture analysis in terms of definitions, perceptions, and approaches is given. Markov random field model-based approaches are discussed in detail. Arising from this a hierarchical self-organised segmentation structure, which consists of a local MRF parameter estimator, a SOM network, and a simple voting layer, is proposed and is shown, by theoretical analysis and practical experiment, to achieve a maximum likelihood or maximum a posteriori segmentation. A fast, simple, but efficient boundary relaxation algorithm is proposed as a post-processor to further refine the resulting segmentation. The class number validation problem in a fully unsupervised segmentation is approached by a classical, simple, and on-line minimum mean-square-error method. Experimental results indicate that this method is very efficient for texture segmentation problems. The thesis concludes with some suggestions for further work on SOM neural networks

    A Statistical Approach to Topological Data Analysis

    Get PDF
    Until very recently, topological data analysis and topological inference methods mostlyrelied on deterministic approaches. The major part of this habilitation thesis presents astatistical approach to such topological methods. We first develop model selection toolsfor selecting simplicial complexes in a given filtration. Next, we study the estimationof persistent homology on metric spaces. We also study a robust version of topologicaldata analysis. Related to this last topic, we also investigate the problem of Wassersteindeconvolution. The second part of the habilitation thesis gathers our contributions inother fields of statistics, including a model selection method for Gaussian mixtures, animplementation of the slope heuristic for calibrating penalties, and a study of Breiman’spermutation importance measure in the context of random forests

    Incremental Refinements and Multiple Descriptions with Feedback

    Get PDF
    It is well known that independent (separate) encoding of K correlated sources may incur some rate loss compared to joint encoding, even if the decoding is done jointly. This loss is particularly evident in the multiple descriptions problem, where the sources are repetitions of the same source, but each description must be individually good. We observe that under mild conditions about the source and distortion measure, the rate ratio Rindependent(K)/Rjoint goes to one in the limit of small rate/high distortion. Moreover, we consider the excess rate with respect to the rate-distortion function, Rindependent(K, M) - R(D), in M rounds of K independent encodings with a final distortion level D. We provide two examples - a Gaussian source with mean-squared error and an exponential source with one-sided error - for which the excess rate vanishes in the limit as the number of rounds M goes to infinity, for any fixed D and K. This result has an interesting interpretation for a multi-round variant of the multiple descriptions problem, where after each round the encoder gets a (block) feedback regarding which of the descriptions arrived: In the limit as the number of rounds M goes to infinity (i.e., many incremental rounds), the total rate of received descriptions approaches the rate-distortion function. We provide theoretical and experimental evidence showing that this phenomenon is in fact more general than in the two examples above.Comment: 62 pages. Accepted in the IEEE Transactions on Information Theor

    Reassessing the Paradigms of Statistical Model-Building

    Get PDF
    Statistical model-building is the science of constructing models from data and from information about the data-generation process, with the aim of analysing those data and drawing inference from that analysis. Many statistical tasks are undertaken during this analysis; they include classification, forecasting, prediction and testing. Model-building has assumed substantial importance, as new technologies enable data on highly complex phenomena to be gathered in very large quantities. This creates a demand for more complex models, and requires the model-building process itself to be adaptive. The word “paradigm” refers to philosophies, frameworks and methodologies for developing and interpreting statistical models, in the context of data, and applying them for inference. In order to solve contemporary statistical problems it is often necessary to combine techniques from previously separate paradigms. The workshop addressed model-building paradigms that are at the frontiers of modern statistical research. It tried to create synergies, by delineating the connections and collisions among different paradigms. It also endeavoured to shape the future evolution of paradigms
    corecore