565 research outputs found
Optimal Clustering under Uncertainty
Classical clustering algorithms typically either lack an underlying
probability framework to make them predictive or focus on parameter estimation
rather than defining and minimizing a notion of error. Recent work addresses
these issues by developing a probabilistic framework based on the theory of
random labeled point processes and characterizing a Bayes clusterer that
minimizes the number of misclustered points. The Bayes clusterer is analogous
to the Bayes classifier. Whereas determining a Bayes classifier requires full
knowledge of the feature-label distribution, deriving a Bayes clusterer
requires full knowledge of the point process. When uncertain of the point
process, one would like to find a robust clusterer that is optimal over the
uncertainty, just as one may find optimal robust classifiers with uncertain
feature-label distributions. Herein, we derive an optimal robust clusterer by
first finding an effective random point process that incorporates all
randomness within its own probabilistic structure and from which a Bayes
clusterer can be derived that provides an optimal robust clusterer relative to
the uncertainty. This is analogous to the use of effective class-conditional
distributions in robust classification. After evaluating the performance of
robust clusterers in synthetic mixtures of Gaussians models, we apply the
framework to granular imaging, where we make use of the asymptotic
granulometric moment theory for granular images to relate robust clustering
theory to the application.Comment: 19 pages, 5 eps figures, 1 tabl
Validation of Inference Procedures for Gene Regulatory Networks
The availability of high-throughput genomic data has motivated the development of numerous algorithms to infer gene regulatory networks. The validity of an inference procedure must be evaluated relative to its ability to infer a model network close to the ground-truth network from which the data have been generated. The input to an inference algorithm is a sample set of data and its output is a network. Since input, output, and algorithm are mathematical structures, the validity of an inference algorithm is a mathematical issue. This paper formulates validation in terms of a semi-metric distance between two networks, or the distance between two structures of the same kind deduced from the networks, such as their steady-state distributions or regulatory graphs. The paper sets up the validation framework, provides examples of distance functions, and applies them to some discrete Markov network models. It also considers approximate validation methods based on data for which the generating network is not known, the kind of situation one faces when using real data
On the Number of Close-to-Optimal Feature Sets
The issue of wide feature-set variability has recently been raised in the context of expression-based classification using microarray data. This paper addresses this concern by demonstrating the natural manner in which many feature sets of a certain size chosen from a large collection of potential features can be so close to being optimal that they are statistically indistinguishable. Feature-set optimality is inherently related to sample size because it only arises on account of the tendency for diminished classifier accuracy as the number of features grows too large for satisfactory design from the sample data. The paper considers optimal feature sets in the framework of a model in which the features are grouped in such a way that intra-group correlation is substantial whereas inter-group correlation is minimal, the intent being to model the situation in which there are groups of highly correlated co-regulated genes and there is little correlation between the co-regulated groups. This is accomplished by using a block model for the covariance matrix that reflects these conditions. Focusing on linear discriminant analysis, we demonstrate how these assumptions can lead to very large numbers of close-to-optimal feature sets
Incorporating prior knowledge induced from stochastic differential equations in the classification of stochastic observations
In classification, prior knowledge is incorporated in a Bayesian framework by assuming that the feature-label distribution belongs to an uncertainty class of feature-label distributions governed by a prior distribution. A posterior
distribution is then derived from the prior and the sample data. An optimal Bayesian classifier (OBC) minimizes the expected misclassification error relative to the posterior distribution. From an application perspective, prior
construction is critical
Efficient Implicit Runge-Kutta Methods for Fast-Responding Ligand-Gated Neuroreceptor Kinetic Models
Neurophysiological models of the brain typically utilize systems of ordinary differential equations to simulate single-cell electrodynamics. To accurately emulate neurological treatments and their physiological effects on neurodegenerative disease, models that incorporate biologically-inspired mechanisms, such as neurotransmitter signalling, are necessary. Additionally, applications that examine populations of neurons, such as multiscale models, can demand solving hundreds of millions of these systems at each simulation time step. Therefore, robust numerical solvers for biologically-inspired neuron models are vital. To address this requirement, we evaluate the numerical accuracy and computational efficiency of three L-stable implicit Runge-Kutta methods when solving kinetic models of the ligand-gated glutamate and gamma-aminobutyric acid (GABA) neurotransmitter receptors. Efficient implementations of each numerical method are discussed, and numerous performance metrics including accuracy, simulation time steps, execution speeds, Jacobian calculations, and LU factorizations are evaluated to identify appropriate strategies for solving these models. Comparisons to popular explicit methods are presented and highlight the advantages of the implicit methods. In addition, we show a machine-code compiled implicit Runge-Kutta method implementation that possesses exceptional accuracy and superior computational efficiency
- …