229 research outputs found

    Statistical inference and distribution selection for SAR image analysis : a mixture-based approach

    Get PDF
    In the current dissertation, we propose three statistical approaches to the analysis of SAR images. SAR image is composed of several classes of pixels.In the first part of this thesis, we assume that each of these classes can be modeled by a Gamma distribution. The multi-modal SAR image histogram is thus a mixture of Gamma distributions. The maximum likelihood technique (ML) is used to estimate the parameters of each mode in the multimodal SAR image histogram. The number of looks of the SAR image is estimated by using either Gamma maximum likelihood or the maximum of the Gamma function. Second, we use a method taken from statistical inference theory, called the minimum message length approach (MML) to model SAR images. The MML permits to minimize the length of a message transmitted from sender to receiver. The parameters of the message are random. The message length is the logarithm of the posterior probability of the model, so the MML approach can also be regarded as finding the model with the highest posterior probability. The multimodal SAR image histogram is assumed to be a mixture of Gamma distributions. The MML algorithm finds the best model and estimates the number of modes and the statistics of the multimodal histogram. Third, the distribution of a given class in the SAR image depends on the form of the scene surfaces and on the radar parameters. Due to SAR image preprocessing and other factors, the use of one distribution is insufficient and we need a method for modeling each class (mode) in the SAR image (histogram) by an appropriate distribution. Using a set of distributions with flexible shapes that are likely to fit the SAR image histogram, we form a system called GGBL which includes four parametric distributions: Gaussian, Gamma, Beta and Log-Normal. The selection of a parametric distribution from the GGBL system for each mode of the heterogeneous multimodal SAR histogram is performed according to the location of the skewness and flatness coefficients in this space. We propose a distribution stability method for distribution selection, using the asymmetry and flatness coefficients. The statistics of heterogeneous multimodal SAR histogram are estimated using the characteristic points of the histogram. Algorithms are validated in the context of segmentation of SAR images using threshold information. Thresholds are computed by minimizing the discrimination error between classes of pixels in the SAR image. Finally, the major results and key features of the ML, MML and GGBL proposals are analyzed, and extensions for future research are discussed

    Simultaneous feature selection and clustering using mixture models

    Full text link

    Massive Unsourced Random Access: Exploiting Angular Domain Sparsity

    Get PDF
    This paper investigates the unsourced random access (URA) scheme to accommodate numerous machine-type users communicating to a base station equipped with multiple antennas. Existing works adopt a slotted transmission strategy to reduce system complexity; they operate under the framework of coupled compressed sensing (CCS) which concatenates an outer tree code to an inner compressed sensing code for slot-wise message stitching. We suggest that by exploiting the MIMO channel information in the angular domain, redundancies required by the tree encoder/decoder in CCS can be removed to improve spectral efficiency, thereby an uncoupled transmission protocol is devised. To perform activity detection and channel estimation, we propose an expectation-maximization-aided generalized approximate message passing algorithm with a Markov random field support structure, which captures the inherent clustered sparsity structure of the angular domain channel. Then, message reconstruction in the form of a clustering decoder is performed by recognizing slot-distributed channels of each active user based on similarity. We put forward the slot-balanced K-means algorithm as the kernel of the clustering decoder, resolving constraints and collisions specific to the application scene. Extensive simulations reveal that the proposed scheme achieves a better error performance at high spectral efficiency compared to the CCS-based URA schemes

    High-dimensional Sparse Count Data Clustering Using Finite Mixture Models

    Get PDF
    Due to the massive amount of available digital data, automating its analysis and modeling for different purposes and applications has become an urgent need. One of the most challenging tasks in machine learning is clustering, which is defined as the process of assigning observations sharing similar characteristics to subgroups. Such a task is significant, especially in implementing complex algorithms to deal with high-dimensional data. Thus, the advancement of computational power in statistical-based approaches is increasingly becoming an interesting and attractive research domain. Among the successful methods, mixture models have been widely acknowledged and successfully applied in numerous fields as they have been providing a convenient yet flexible formal setting for unsupervised and semi-supervised learning. An essential problem with these approaches is to develop a probabilistic model that represents the data well by taking into account its nature. Count data are widely used in machine learning and computer vision applications where an object, e.g., a text document or an image, can be represented by a vector corresponding to the appearance frequencies of words or visual words, respectively. Thus, they usually suffer from the well-known curse of dimensionality as objects are represented with high-dimensional and sparse vectors, i.e., a few thousand dimensions with a sparsity of 95 to 99%, which decline the performance of clustering algorithms dramatically. Moreover, count data systematically exhibit the burstiness and overdispersion phenomena, which both cannot be handled with a generic multinomial distribution, typically used to model count data, due to its dependency assumption. This thesis is constructed around six related manuscripts, in which we propose several approaches for high-dimensional sparse count data clustering via various mixture models based on hierarchical Bayesian modeling frameworks that have the ability to model the dependency of repetitive word occurrences. In such frameworks, a suitable distribution is used to introduce the prior information into the construction of the statistical model, based on a conjugate distribution to the multinomial, e.g. the Dirichlet, generalized Dirichlet, and the Beta-Liouville, which has numerous computational advantages. Thus, we proposed a novel model that we call the Multinomial Scaled Dirichlet (MSD) based on using the scaled Dirichlet as a prior to the multinomial to allow more modeling flexibility. Although these frameworks can model burstiness and overdispersion well, they share similar disadvantages making their estimation procedure is very inefficient when the collection size is large. To handle high-dimensionality, we considered two approaches. First, we derived close approximations to the distributions in a hierarchical structure to bring them to the exponential-family form aiming to combine the flexibility and efficiency of these models with the desirable statistical and computational properties of the exponential family of distributions, including sufficiency, which reduce the complexity and computational efforts especially for sparse and high-dimensional data. Second, we proposed a model-based unsupervised feature selection approach for count data to overcome several issues that may be caused by the high dimensionality of the feature space, such as over-fitting, low efficiency, and poor performance. Furthermore, we handled two significant aspects of mixture based clustering methods, namely, parameters estimation and performing model selection. We considered the Expectation-Maximization (EM) algorithm, which is a broadly applicable iterative algorithm for estimating the mixture model parameters, with incorporating several techniques to avoid its initialization dependency and poor local maxima. For model selection, we investigated different approaches to find the optimal number of components based on the Minimum Message Length (MML) philosophy. The effectiveness of our approaches is evaluated using challenging real-life applications, such as sentiment analysis, hate speech detection on Twitter, topic novelty detection, human interaction recognition in films and TV shows, facial expression recognition, face identification, and age estimation
    corecore