565 research outputs found

    Optimal Clustering under Uncertainty

    Full text link
    Classical clustering algorithms typically either lack an underlying probability framework to make them predictive or focus on parameter estimation rather than defining and minimizing a notion of error. Recent work addresses these issues by developing a probabilistic framework based on the theory of random labeled point processes and characterizing a Bayes clusterer that minimizes the number of misclustered points. The Bayes clusterer is analogous to the Bayes classifier. Whereas determining a Bayes classifier requires full knowledge of the feature-label distribution, deriving a Bayes clusterer requires full knowledge of the point process. When uncertain of the point process, one would like to find a robust clusterer that is optimal over the uncertainty, just as one may find optimal robust classifiers with uncertain feature-label distributions. Herein, we derive an optimal robust clusterer by first finding an effective random point process that incorporates all randomness within its own probabilistic structure and from which a Bayes clusterer can be derived that provides an optimal robust clusterer relative to the uncertainty. This is analogous to the use of effective class-conditional distributions in robust classification. After evaluating the performance of robust clusterers in synthetic mixtures of Gaussians models, we apply the framework to granular imaging, where we make use of the asymptotic granulometric moment theory for granular images to relate robust clustering theory to the application.Comment: 19 pages, 5 eps figures, 1 tabl

    Validation of Inference Procedures for Gene Regulatory Networks

    Get PDF
    The availability of high-throughput genomic data has motivated the development of numerous algorithms to infer gene regulatory networks. The validity of an inference procedure must be evaluated relative to its ability to infer a model network close to the ground-truth network from which the data have been generated. The input to an inference algorithm is a sample set of data and its output is a network. Since input, output, and algorithm are mathematical structures, the validity of an inference algorithm is a mathematical issue. This paper formulates validation in terms of a semi-metric distance between two networks, or the distance between two structures of the same kind deduced from the networks, such as their steady-state distributions or regulatory graphs. The paper sets up the validation framework, provides examples of distance functions, and applies them to some discrete Markov network models. It also considers approximate validation methods based on data for which the generating network is not known, the kind of situation one faces when using real data

    On the Number of Close-to-Optimal Feature Sets

    Get PDF
    The issue of wide feature-set variability has recently been raised in the context of expression-based classification using microarray data. This paper addresses this concern by demonstrating the natural manner in which many feature sets of a certain size chosen from a large collection of potential features can be so close to being optimal that they are statistically indistinguishable. Feature-set optimality is inherently related to sample size because it only arises on account of the tendency for diminished classifier accuracy as the number of features grows too large for satisfactory design from the sample data. The paper considers optimal feature sets in the framework of a model in which the features are grouped in such a way that intra-group correlation is substantial whereas inter-group correlation is minimal, the intent being to model the situation in which there are groups of highly correlated co-regulated genes and there is little correlation between the co-regulated groups. This is accomplished by using a block model for the covariance matrix that reflects these conditions. Focusing on linear discriminant analysis, we demonstrate how these assumptions can lead to very large numbers of close-to-optimal feature sets

    Incorporating prior knowledge induced from stochastic differential equations in the classification of stochastic observations

    Get PDF
    In classification, prior knowledge is incorporated in a Bayesian framework by assuming that the feature-label distribution belongs to an uncertainty class of feature-label distributions governed by a prior distribution. A posterior distribution is then derived from the prior and the sample data. An optimal Bayesian classifier (OBC) minimizes the expected misclassification error relative to the posterior distribution. From an application perspective, prior construction is critical

    Efficient Implicit Runge-Kutta Methods for Fast-Responding Ligand-Gated Neuroreceptor Kinetic Models

    Get PDF
    Neurophysiological models of the brain typically utilize systems of ordinary differential equations to simulate single-cell electrodynamics. To accurately emulate neurological treatments and their physiological effects on neurodegenerative disease, models that incorporate biologically-inspired mechanisms, such as neurotransmitter signalling, are necessary. Additionally, applications that examine populations of neurons, such as multiscale models, can demand solving hundreds of millions of these systems at each simulation time step. Therefore, robust numerical solvers for biologically-inspired neuron models are vital. To address this requirement, we evaluate the numerical accuracy and computational efficiency of three L-stable implicit Runge-Kutta methods when solving kinetic models of the ligand-gated glutamate and gamma-aminobutyric acid (GABA) neurotransmitter receptors. Efficient implementations of each numerical method are discussed, and numerous performance metrics including accuracy, simulation time steps, execution speeds, Jacobian calculations, and LU factorizations are evaluated to identify appropriate strategies for solving these models. Comparisons to popular explicit methods are presented and highlight the advantages of the implicit methods. In addition, we show a machine-code compiled implicit Runge-Kutta method implementation that possesses exceptional accuracy and superior computational efficiency
    corecore