16,023 research outputs found

    EM Algorithms for Weighted-Data Clustering with Application to Audio-Visual Scene Analysis

    Get PDF
    Data clustering has received a lot of attention and numerous methods, algorithms and software packages are available. Among these techniques, parametric finite-mixture models play a central role due to their interesting mathematical properties and to the existence of maximum-likelihood estimators based on expectation-maximization (EM). In this paper we propose a new mixture model that associates a weight with each observed point. We introduce the weighted-data Gaussian mixture and we derive two EM algorithms. The first one considers a fixed weight for each observation. The second one treats each weight as a random variable following a gamma distribution. We propose a model selection method based on a minimum message length criterion, provide a weight initialization strategy, and validate the proposed algorithms by comparing them with several state of the art parametric and non-parametric clustering techniques. We also demonstrate the effectiveness and robustness of the proposed clustering technique in the presence of heterogeneous data, namely audio-visual scene analysis.Comment: 14 pages, 4 figures, 4 table

    A New Generation of Mixture-Model Cluster Analysis with Information Complexity and the Genetic EM Algorithm

    Get PDF
    In this dissertation, we extend several relatively new developments in statistical model selection and data mining in order to improve one of the workhorse statistical tools - mixture modeling (Pearson, 1894). The traditional mixture model assumes data comes from several populations of Gaussian distributions. Thus, what remains is to determine how many distributions, their population parameters, and the mixing proportions. However, real data often do not fit the restrictions of normality very well. It is likely that data from a single population exhibiting either asymmetrical or nonnormal tail behavior could be erroneously modeled as two populations, resulting in suboptimal decisions. To avoid these pitfalls, we develop the mixture model under a broader distributional assumption by fitting a group of multivariate elliptically-contoured distributions (Anderson and Fang, 1990; Fang et al., 1990). Special cases include the multivariate Gaussian and power exponential distributions, as well as the multivariate generalization of the Student’s T. This gives us the flexibility to model nonnormal tail and peak behavior, though the symmetry restriction still exists. The literature has many examples of research generalizing the Gaussian mixture model to other distributions (Farrell and Mersereau, 2004; Hasselblad, 1966; John, 1970a), but our effort is more general. Further, we generalize the mixture model to be non-parametric, by developing two types of kernel mixture model. First, we generalize the mixture model to use the truly multivariate kernel density estimators (Wand and Jones, 1995). Additionally, we develop the power exponential product kernel mixture model, which allows the density to adjust to the shape of each dimension independently. Because kernel density estimators enforce no functional form, both of these methods can adapt to nonnormal asymmetric, kurtotic, and tail characteristics. Over the past two decades or so, evolutionary algorithms have grown in popularity, as they have provided encouraging results in a variety of optimization problems. Several authors have applied the genetic algorithm - a subset of evolutionary algorithms - to mixture modeling, including Bhuyan et al. (1991), Krishna and Murty (1999), and Wicker (2006). These procedures have the benefit that they bypass computational issues that plague the traditional methods. We extend these initialization and optimization methods by combining them with our updated mixture models. Additionally, we “borrow” results from robust estimation theory (Ledoit and Wolf, 2003; Shurygin, 1983; Thomaz, 2004) in order to data-adaptively regularize population covariance matrices. Numerical instability of the covariance matrix can be a significant problem for mixture modeling, since estimation is typically done on a relatively small subset of the observations. We likewise extend various information criteria (Akaike, 1973; Bozdogan, 1994b; Schwarz, 1978) to the elliptically-contoured and kernel mixture models. Information criteria guide model selection and estimation based on various approximations to the Kullback-Liebler divergence. Following Bozdogan (1994a), we use these tools to sequentially select the best mixture model, select the best subset of variables, and detect influential observations - all without making any subjective decisions. Over the course of this research, we developed a full-featured Matlab toolbox (M3) which implements all the new developments in mixture modeling presented in this dissertation. We show results on both simulated and real world datasets. Keywords: mixture modeling, nonparametric estimation, subset selection, influence detection, evidence-based medical diagnostics, unsupervised classification, robust estimation

    Software reuse in a paralysis dataset based on categorical clustering and the Pearson distribution

    Get PDF
    AbstractSoftware reuse is the process of building software applications that make use of formerly developed software components. In this paper, we explain the benefits that can be obtained from using statistical procedures for prescribing medicines, especially in rural areas, which have limited resources available on hand. It should be noted that although the expert systems were successful in research, they never dominated the market when actual patient treatment was considered. The proposed methodology is compared with the categorical clustering technique. The Fenton and Melton Coupling Metric is considered for the evaluation of the statistic model. The reliability of this methodology is also considered

    Evaluating the Differences of Gridding Techniques for Digital Elevation Models Generation and Their Influence on the Modeling of Stony Debris Flows Routing: A Case Study From Rovina di Cancia Basin (North-Eastern Italian Alps)

    Get PDF
    Debris \ufb02ows are among the most hazardous phenomena in mountain areas. To cope with debris \ufb02ow hazard, it is common to delineate the risk-prone areas through routing models. The most important input to debris \ufb02ow routing models are the topographic data, usually in the form of Digital Elevation Models (DEMs). The quality of DEMs depends on the accuracy, density, and spatial distribution of the sampled points; on the characteristics of the surface; and on the applied gridding methodology. Therefore, the choice of the interpolation method affects the realistic representation of the channel and fan morphology, and thus potentially the debris \ufb02ow routing modeling outcomes. In this paper, we initially investigate the performance of common interpolation methods (i.e., linear triangulation, natural neighbor, nearest neighbor, Inverse Distance to a Power, ANUDEM, Radial Basis Functions, and ordinary kriging) in building DEMs with the complex topography of a debris \ufb02ow channel located in the Venetian Dolomites (North-eastern Italian Alps), by using small footprint full- waveform Light Detection And Ranging (LiDAR) data. The investigation is carried out through a combination of statistical analysis of vertical accuracy, algorithm robustness, and spatial clustering of vertical errors, and multi-criteria shape reliability assessment. After that, we examine the in\ufb02uence of the tested interpolation algorithms on the performance of a Geographic Information System (GIS)-based cell model for simulating stony debris \ufb02ows routing. In detail, we investigate both the correlation between the DEMs heights uncertainty resulting from the gridding procedure and that on the corresponding simulated erosion/deposition depths, both the effect of interpolation algorithms on simulated areas, erosion and deposition volumes, solid-liquid discharges, and channel morphology after the event. The comparison among the tested interpolation methods highlights that the ANUDEM and ordinary kriging algorithms are not suitable for building DEMs with complex topography. Conversely, the linear triangulation, the natural neighbor algorithm, and the thin-plate spline plus tension and completely regularized spline functions ensure the best trade-off among accuracy and shape reliability. Anyway, the evaluation of the effects of gridding techniques on debris \ufb02ow routing modeling reveals that the choice of the interpolation algorithm does not signi\ufb01cantly affect the model outcomes

    Bitter taste stimuli induce differential neural codes in mouse brain.

    Get PDF
    A growing literature suggests taste stimuli commonly classified as "bitter" induce heterogeneous neural and perceptual responses. Here, the central processing of bitter stimuli was studied in mice with genetically controlled bitter taste profiles. Using these mice removed genetic heterogeneity as a factor influencing gustatory neural codes for bitter stimuli. Electrophysiological activity (spikes) was recorded from single neurons in the nucleus tractus solitarius during oral delivery of taste solutions (26 total), including concentration series of the bitter tastants quinine, denatonium benzoate, cycloheximide, and sucrose octaacetate (SOA), presented to the whole mouth for 5 s. Seventy-nine neurons were sampled; in many cases multiple cells (2 to 5) were recorded from a mouse. Results showed bitter stimuli induced variable gustatory activity. For example, although some neurons responded robustly to quinine and cycloheximide, others displayed concentration-dependent activity (p<0.05) to quinine but not cycloheximide. Differential activity to bitter stimuli was observed across multiple neurons recorded from one animal in several mice. Across all cells, quinine and denatonium induced correlated spatial responses that differed (p<0.05) from those to cycloheximide and SOA. Modeling spatiotemporal neural ensemble activity revealed responses to quinine/denatonium and cycloheximide/SOA diverged during only an early, at least 1 s wide period of the taste response. Our findings highlight how temporal features of sensory processing contribute differences among bitter taste codes and build on data suggesting heterogeneity among "bitter" stimuli, data that challenge a strict monoguesia model for the bitter quality
    • 

    corecore