330 research outputs found

    The EM Algorithm

    Get PDF
    The Expectation-Maximization (EM) algorithm is a broadly applicable approach to the iterative computation of maximum likelihood (ML) estimates, useful in a variety of incomplete-data problems. Maximum likelihood estimation and likelihood-based inference are of central importance in statistical theory and data analysis. Maximum likelihood estimation is a general-purpose method with attractive properties. It is the most-often used estimation technique in the frequentist framework; it is also relevant in the Bayesian framework (Chapter III.11). Often Bayesian solutions are justified with the help of likelihoods and maximum likelihood estimates (MLE), and Bayesian solutions are similar to penalized likelihood estimates. Maximum likelihood estimation is an ubiquitous technique and is used extensively in every area where statistical techniques are used. --

    Cluster validity in clustering methods

    Get PDF

    Shape and Topology Constrained Image Segmentation with Stochastic Models

    Get PDF
    The central theme of this thesis has been to develop robust algorithms for the task of image segmentation. All segmentation techniques that have been proposed in this thesis are based on the sound modeling of the image formation process. This approach to image partition enables the derivation of objective functions, which make all modeling assumptions explicit. Based on the Parametric Distributional Clustering (PDC) technique, improved variants have been derived, which explicitly incorporate topological assumptions in the corresponding cost functions. In this thesis, the questions of robustness and generalizability of segmentation solutions have been addressed in an empirical manner, giving comprehensive example sets for both problems. It has been shown, that the PDC framework is indeed capable of producing highly robust image partitions. In the context of PDC-based segmentation, a probabilistic representation of shape has been constructed. Furthermore, likelihood maps for given objects of interest were derived from the PDC cost function. Interpreting the shape information as a prior for the segmentation task, it has been combined with the likelihoods in a Bayesian setting. The resulting posterior probability for the occurrence of an object of a specified semantic category has been demonstrated to achieve excellent segmentation quality on very hard testbeds of images from the Corel gallery

    A Nonparametric Approach to Segmentation of Ladar Images

    Get PDF
    The advent of advanced laser radar (ladar) systems that record full-waveform signal data has inspired numerous inquisitions which aspire to extract additional, previously unavailable, information about the illuminated scene from the collected data. The quality of the information, however, is often related to the limitations of the ladar camera used to collect the data. This research project uses full-waveform analysis of ladar signals, and basic principles of optics, to propose a new formulation for an accepted signal model. A new waveform model taking into account backscatter reflectance is the key to overcoming specific deficiencies of the ladar camera at hand, namely the ability to discern pulse-spreading effects of elongated targets. A concert of non-parametric statistics and familiar image processing methods are used to calculate the orientation angle of the illuminated objects, and the deficiency of the hardware is circumvented. Segmentation of the various ladar images performed as part of the angle estimation, and this is shown to be a new and effective strategy for analyzing the output of the AFIT ladar camera

    On clustering stability

    Get PDF
    JEL Classification: C100; C150; C380This work is dedicated to the evaluation of the stability of clustering solutions, namely the stability of crisp clusterings or partitions. We specifically refer to stability as the concordance of clusterings across several samples. In order to evaluate stability, we use a weighted cross-validation procedure, the result of which is summarized by simple and paired agreement indices values. To exclude the amount of agreement by chance of these values, we propose a new method – IADJUST – that resorts to simulated crossclassification tables. This contribution makes viable the correction of any index of agreement. Experiments on stability rely on 540 simulated data sets, design factors being the number of clusters, their balance and overlap. Six real data with a priori known clusters are also considered. The experiments conducted enable to illustrate the precision and pertinence of the IADJUST procedure and allow to know the distribution of indices under the hypothesis of agreement by chance. Therefore, we recommend the use of adjusted indices to be common practice when addressing stability. We then compare the stability of two clustering algorithms and conclude that Expectation-Maximization (EM) results are more stable when referring to unbalanced data sets than K means results. Finally, we explore the relationship between stability and external validity of a clustering solution. When all experimental scenarios’ results are considered there is a strong correlation between stability and external validity. However, within a specific experimental scenario (when a practical clustering task is considered), we find no relationship between stability and agreement with ground truth.Este trabalho é dedicado à avaliação da estabilidade de agrupamentos, nomeadamente de partições. Consideramos a estabilidade como sendo a concordância dos agrupamentos obtidos sobre diversas amostras. Para avaliar a estabilidade, usamos um procedimento de validação cruzada ponderada, cujo resultado é resumido pelos valores de índices de concordância simples e pareados. Para excluir, destes valores, a parcela de concordância por acaso, propomos um novo método - IADJUST - que recorre à simulação de tabelas cruzadas de classificação. Essa contribuição torna viável a correção de qualquer índice de concordância. A análise experimental da estabilidade baseia-se em 540 conjuntos de dados simulados, controlando os números de grupos, dimensões relativas e graus de sobreposição dos grupos. Também consideramos seis conjuntos de dados reais com classes a priori conhecidas. As experiências realizadas permitem ilustrar a precisão e pertinência do procedimento IADJUST e conhecer a distribuição dos índices sob a hipótese de concordância por acaso. Assim sendo, recomendamos a utilização de índices ajustados como prática comum ao abordar a estabilidade. Comparamos, então, a estabilidade de dois algoritmos de agrupamento e concluímos que as soluções do algoritmo Expectation Maximization são mais estáveis que as do K-médias em conjuntos de dados não balanceados. Finalmente, estudamos a relação entre a estabilidade e validade externa de um agrupamento. Agregando os resultados dos cenários experimentais obtemos uma forte correlação entre estabilidade e validade externa. No entanto, num cenário experimental particular (para uma tarefa prática de agrupamento), não encontramos relação entre estabilidade e a concordância com a verdadeira estrutura dos dados

    Doctor of Philosophy

    Get PDF
    dissertationFunctional magnetic resonance imaging (fMRI) measures the change of oxygen consumption level in the blood vessels of the human brain, hence indirectly detecting the neuronal activity. Resting-state fMRI (rs-fMRI) is used to identify the intrinsic functional patterns of the brain when there is no external stimulus. Accurate estimation of intrinsic activity is important for understanding the functional organization and dynamics of the brain, as well as differences in the functional networks of patients with mental disorders. This dissertation aims to robustly estimate the functional connectivities and networks of the human brain using rs-fMRI data of multiple subjects. We use Markov random field (MRF), an undirected graphical model to represent the statistical dependency among the functional network variables. Graphical models describe multivariate probability distributions that can be factorized and represented by a graph. By defining the nodes and the edges along with their weights according to our assumptions, we build soft constraints into the graph structure as prior information. We explore various approximate optimization methods including variational Bayesian, graph cuts, and Markov chain Monte Carlo sampling (MCMC). We develop the random field models to solve three related problems. In the first problem, the goal is to detect the pairwise connectivity between gray matter voxels in a rs-fMRI dataset of the single subject. We define a six-dimensional graph to represent our prior information that two voxels are more likely to be connected if their spatial neighbors are connected. The posterior mean of the connectivity variables are estimated by variational inference, also known as mean field theory in statistical physics. The proposed method proves to outperform the standard spatial smoothing and is able to detect finer patterns of brain activity. Our second work aims to identify multiple functional systems. We define a Potts model, a special case of MRF, on the network label variables, and define von Mises-Fisher distribution on the normalized fMRI signal. The inference is significantly more difficult than the binary classification in the previous problem. We use MCMC to draw samples from the posterior distribution of network labels. In the third application, we extend the graphical model to the multiple subject scenario. By building a graph including the network labels of both a group map and the subject label maps, we define a hierarchical model that has richer structure than the flat single-subject model, and captures the shared patterns as well as the variation among the subjects. All three solutions are data-driven Bayesian methods, which estimate model parameters from the data. The experiments show that by the regularization of MRF, the functional network maps we estimate are more accurate and more consistent across multiple sessions

    Multivariate Poisson hidden Markov models for analysis of spatial counts

    Get PDF
    Multivariate count data are found in a variety of fields. For modeling such data, one may consider the multivariate Poisson distribution. Overdispersion is a problem when modeling the data with the multivariate Poisson distribution. Therefore, in this thesis we propose a new multivariate Poisson hidden Markov model based on the extension of independent multivariate Poisson finite mixture models, as a solution to this problem. This model, which can take into account the spatial nature of weed counts, is applied to weed species counts in an agricultural field. The distribution of counts depends on the underlying sequence of states, which are unobserved or hidden. These hidden states represent the regions where weed counts are relatively homogeneous. Analysis of these data involves the estimation of the number of hidden states, Poisson means and covariances. Parameter estimation is done using a modified EM algorithm for maximum likelihood estimation. We extend the univariate Markov-dependent Poisson finite mixture model to the multivariate Poisson case (bivariate and trivariate) to model counts of two or three species. Also, we contribute to the hidden Markov model research area by developing Splus/R codes for the analysis of the multivariate Poisson hidden Markov model. Splus/R codes are written for the estimation of multivariate Poisson hidden Markov model using the EM algorithm and the forward-backward procedure and the bootstrap estimation of standard errors. The estimated parameters are used to calculate the goodness of fit measures of the models.Results suggest that the multivariate Poisson hidden Markov model, with five states and an independent covariance structure, gives a reasonable fit to this dataset. Since this model deals with overdispersion and spatial information, it will help to get an insight about weed distribution for herbicide applications. This model may lead researchers to find other factors such as soil moisture, fertilizer level, etc., to determine the states, which govern the distribution of the weed counts

    ADVANCED STATISTICAL LEARNING METHODS FOR HETEROGENEOUS MEDICAL IMAGING DATA

    Get PDF
    Most neuro-related diseases and disabling diseases display significant heterogeneity at the imaging and clinical scales. Characterizing such heterogeneity could transform our understanding of the etiology of these conditions and inspire new approaches to urgently needed preventions, diagnoses, and treatments. However, existing statistical methods face major challenges in delineating such heterogeneity at subject, group and study levels. In order to address these challenges, this work proposes several statistical learning methods for heterogeneous imaging data with different structures. First, we propose a dynamic spatial random effects model for longitudinal imaging dataset, which aims at characterizing both the imaging intensity progression and the temporal-spatial heterogeneity of diseased regions across subjects and time. The key components of proposed model include a spatial random effects model and a dynamic conditional random field model. The proposed model can effectively detect the dynamic diseased regions in each patient and present a dynamic statistical disease mapping within each subpopulation of interest. Second, to address the group level heterogeneity in non-Euclidean data, we develop a penalized model-based clustering framework to cluster high dimensional manifold data in symmetric spaces. Specifically, a mixture of geodesic factor analyzers is proposed with mixing proportions determined through a logistic model and Riemannian normal distribution in each component for data in symmetric spaces. Penalized likelihood approaches are used to realize variable selection procedures. We apply the proposed model to the ADNI hippocampal surface data, which shows excellent clustering performance and remarkably reveal meaningful clusters in the mixed population with controls and subjects with AD. Finally, to consider the potential heterogeneity caused by unobserved environmental, demographic and technical factors, we treat the imaging data as functional responses, and set up a surrogate variable analysis framework in functional linear models. A functional latent factor regression model is proposed. The confounding factors and the bias of local linear estimators caused by the confounding factors can be estimated and removed using singular value decomposition on residuals. We further develop a test for linear hypotheses of primary coefficient functions. Both simulation studies and ADNI hippocampal surface data analysis are conducted to show the performance of proposed method.Doctor of Philosoph

    Human treelike tubular structure segmentation: A comprehensive review and future perspectives

    Get PDF
    Various structures in human physiology follow a treelike morphology, which often expresses complexity at very fine scales. Examples of such structures are intrathoracic airways, retinal blood vessels, and hepatic blood vessels. Large collections of 2D and 3D images have been made available by medical imaging modalities such as magnetic resonance imaging (MRI), computed tomography (CT), Optical coherence tomography (OCT) and ultrasound in which the spatial arrangement can be observed. Segmentation of these structures in medical imaging is of great importance since the analysis of the structure provides insights into disease diagnosis, treatment planning, and prognosis. Manually labelling extensive data by radiologists is often time-consuming and error-prone. As a result, automated or semi-automated computational models have become a popular research field of medical imaging in the past two decades, and many have been developed to date. In this survey, we aim to provide a comprehensive review of currently publicly available datasets, segmentation algorithms, and evaluation metrics. In addition, current challenges and future research directions are discussed
    corecore