286 research outputs found
Distribution of Mutual Information
The mutual information of two random variables i and j with joint
probabilities t_ij is commonly used in learning Bayesian nets as well as in
many other fields. The chances t_ij are usually estimated by the empirical
sampling frequency n_ij/n leading to a point estimate I(n_ij/n) for the mutual
information. To answer questions like "is I(n_ij/n) consistent with zero?" or
"what is the probability that the true mutual information is much larger than
the point estimate?" one has to go beyond the point estimate. In the Bayesian
framework one can answer these questions by utilizing a (second order) prior
distribution p(t) comprising prior information about t. From the prior p(t) one
can compute the posterior p(t|n), from which the distribution p(I|n) of the
mutual information can be calculated. We derive reliable and quickly computable
approximations for p(I|n). We concentrate on the mean, variance, skewness, and
kurtosis, and non-informative priors. For the mean we also give an exact
expression. Numerical issues and the range of validity are discussed.Comment: 8 page
Model Selection for Gaussian Mixture Models
This paper is concerned with an important issue in finite mixture modelling,
the selection of the number of mixing components. We propose a new penalized
likelihood method for model selection of finite multivariate Gaussian mixture
models. The proposed method is shown to be statistically consistent in
determining of the number of components. A modified EM algorithm is developed
to simultaneously select the number of components and to estimate the mixing
weights, i.e. the mixing probabilities, and unknown parameters of Gaussian
distributions. Simulations and a real data analysis are presented to illustrate
the performance of the proposed method
The Value of Information for Populations in Varying Environments
The notion of information pervades informal descriptions of biological
systems, but formal treatments face the problem of defining a quantitative
measure of information rooted in a concept of fitness, which is itself an
elusive notion. Here, we present a model of population dynamics where this
problem is amenable to a mathematical analysis. In the limit where any
information about future environmental variations is common to the members of
the population, our model is equivalent to known models of financial
investment. In this case, the population can be interpreted as a portfolio of
financial assets and previous analyses have shown that a key quantity of
Shannon's communication theory, the mutual information, sets a fundamental
limit on the value of information. We show that this bound can be violated when
accounting for features that are irrelevant in finance but inherent to
biological systems, such as the stochasticity present at the individual level.
This leads us to generalize the measures of uncertainty and information usually
encountered in information theory
Learning Model Structure from Data : an Application to On-Line Handwriting
We present a learning strategy for Hidden Markov Models that may be used to cluster handwriting sequences or to learn a character model by identifying its main writing styles. Our approach aims at learning both the structure and parameters of a Hidden Markov Model (HMM) from the data. A byproduct of this learning strategy is the ability to cluster signals and identify allograph. We provide experimental results on artificial data that demonstrate the possibility to learn from data HMM parameters and topology. For a given topology, our approach outperforms in some cases that we identify standard Maximum Likelihood learning scheme. We also apply our unsupervised learning scheme on on-line handwritten signals for allograph clustering as well as for learning HMM models for handwritten digit recognition
Learning from Partial Labels with Minimum Entropy
This paper introduces the minimum entropy regularizer for learning from partial labels. This learning problem encompasses the semi-supervised setting, where a decision rule is to be learned from labeled and unlabeled examples. The minimum entropy regularizer applies to diagnosis models, i.e. models of the posterior probabilities of classes. It is shown to include other approaches to the semi-supervised problem as particular or limiting cases. A series of experiments illustrates that the proposed criterion provides solutions taking advantage of unlabeled examples when the latter convey information. Even when the data are sampled from the distribution class spanned by a generative model, the proposed approach improves over the estimated generative model when the number of features is of the order of sample size. The performances are definitely in favor of minimum entropy when the generative model is slightly misspecified. Finally, the robustness of the learning scheme is demonstrated: in situations where unlabeled examples do not convey information, minimum entropy returns a solution discarding unlabeled examples and performs as well as supervised learning. Cet article introduit le régularisateur à entropie minimum pour l'apprentissage d'étiquettes partielles. Ce problème d'apprentissage incorpore le cadre non supervisé, où une règle de décision doit être apprise à partir d'exemples étiquetés et non étiquetés. Le régularisateur à entropie minimum s'applique aux modèles de diagnostics, c'est-à-dire aux modèles des probabilités postérieures de classes. Nous montrons comment inclure d'autres approches comme un cas particulier ou limité du problème semi-supervisé. Une série d'expériences montrent que le critère proposé fournit des solutions utilisant les exemples non étiquetés lorsque ces dernières sont instructives. Même lorsque les données sont échantillonnées à partir de la classe de distribution balayée par un modèle génératif, l'approche mentionnée améliore le modèle génératif estimé lorsque le nombre de caractéristiques est de l'ordre de la taille de l'échantillon. Les performances avantagent certainement l'entropie minimum lorsque le modèle génératif est légèrement mal spécifié. Finalement, la robustesse de ce cadre d'apprentissage est démontré : lors de situations où les exemples non étiquetés n'apportent aucune information, l'entropie minimum retourne une solution rejetant les exemples non étiquetés et est aussi performante que l'apprentissage supervisé.discriminant learning, semi-supervised learning, minimum entropy, apprentissage discriminant, apprentissage semi-supervisé, entropie minimum
Intentional Motion On-line Learning and Prediction
International audiencePredicting motion of humans, animals and other objects which move according to internal plans is a challenging problem. Most existing approaches operate in two stages: a) learning typical motion patterns by observing an environment and b) predicting future motion on the basis of the learned patterns. In existing techniques, learning is performed off-line, hence, it is impossible to refine the existing knowledge on the basis of the new observations obtained during the prediction phase. We propose an approach which uses Hidden Markov Models to represent motion patterns. It is different from similar approaches because it is able to learn and predict in a concurrent fashion thanks to a novel approximate learning approach, based on the Growing Neural Gas algorithm, which estimates both HMM parameters and structure. The found structure has the property of being a planar graph, thus enabling exact inference in linear time with respect to the number of states in the model. Our experiments demonstrate that the technique works in real-time, and is able to produce sound long-term predictions of people motion
Model Selection in Summary Evaluation
A difficulty in the design of automated text summarization algorithms is in the objective evaluation. Viewing summarization as a tradeoff between length and information content, we introduce a technique based on a hierarchy of classifiers to rank, through model selection, different summarization methods. This summary evaluation technique allows for broader comparison of summarization methods than the traditional techniques of summary evaluation. We present an empirical study of two simple, albeit widely used, summarization methods that shows the different usages of this automated task-based evaluation system and confirms the results obtained with human-based evaluation methods over smaller corpora
Statistical modeling of RNA structure profiling experiments enables parsimonious reconstruction of structure landscapes.
RNA plays key regulatory roles in diverse cellular processes, where its functionality often derives from folding into and converting between structures. Many RNAs further rely on co-existence of alternative structures, which govern their response to cellular signals. However, characterizing heterogeneous landscapes is difficult, both experimentally and computationally. Recently, structure profiling experiments have emerged as powerful and affordable structure characterization methods, which improve computational structure prediction. To date, efforts have centered on predicting one optimal structure, with much less progress made on multiple-structure prediction. Here, we report a probabilistic modeling approach that predicts a parsimonious set of co-existing structures and estimates their abundances from structure profiling data. We demonstrate robust landscape reconstruction and quantitative insights into structural dynamics by analyzing numerous data sets. This work establishes a framework for data-directed characterization of structure landscapes to aid experimentalists in performing structure-function studies
On Separation Between Learning and Control in Partially Observed Markov Decision Processes
Cyber-physical systems (CPS) encounter a large volume of data which is added
to the system gradually in real time and not altogether in advance. As the
volume of data increases, the domain of the control strategies also increases,
and thus it becomes challenging to search for an optimal strategy. Even if an
optimal control strategy is found, implementing such strategies with increasing
domains is burdensome. To derive an optimal control strategy in CPS, we
typically assume an ideal model of the system. Such model-based control
approaches cannot effectively facilitate optimal solutions with performance
guarantees due to the discrepancy between the model and the actual CPS.
Alternatively, traditional supervised learning approaches cannot always
facilitate robust solutions using data derived offline. Similarly, applying
reinforcement learning approaches directly to the actual CPS might impose
significant implications on safety and robust operation of the system. The goal
of this chapter is to provide a theoretical framework that aims at separating
the control and learning tasks which allows us to combine offline model-based
control with online learning approaches, and thus circumvent the challenges in
deriving optimal control strategies for CPS.Comment: 18 pages, 5 figures. arXiv admin note: text overlap with
arXiv:2101.1099
- …