3,656 research outputs found
Inducing Features of Random Fields
We present a technique for constructing random fields from a set of training
samples. The learning paradigm builds increasingly complex fields by allowing
potential functions, or features, that are supported by increasingly large
subgraphs. Each feature has a weight that is trained by minimizing the
Kullback-Leibler divergence between the model and the empirical distribution of
the training data. A greedy algorithm determines how features are incrementally
added to the field and an iterative scaling algorithm is used to estimate the
optimal values of the weights.
The statistical modeling techniques introduced in this paper differ from
those common to much of the natural language processing literature since there
is no probabilistic finite state or push-down automaton on which the model is
built. Our approach also differs from the techniques common to the computer
vision literature in that the underlying random fields are non-Markovian and
have a large number of parameters that must be estimated. Relations to other
learning approaches including decision trees and Boltzmann machines are given.
As a demonstration of the method, we describe its application to the problem of
automatic word classification in natural language processing.
Key words: random field, Kullback-Leibler divergence, iterative scaling,
divergence geometry, maximum entropy, EM algorithm, statistical learning,
clustering, word morphology, natural language processingComment: 34 pages, compressed postscrip
Shannon entropy of brain functional complex networks under the influence of the psychedelic Ayahuasca
The entropic brain hypothesis holds that the key facts concerning
psychedelics are partially explained in terms of increased entropy of the
brain's functional connectivity. Ayahuasca is a psychedelic beverage of
Amazonian indigenous origin with legal status in Brazil in religious and
scientific settings. In this context, we use tools and concepts from the theory
of complex networks to analyze resting state fMRI data of the brains of human
subjects under two distinct conditions: (i) under ordinary waking state and
(ii) in an altered state of consciousness induced by ingestion of Ayahuasca. We
report an increase in the Shannon entropy of the degree distribution of the
networks subsequent to Ayahuasca ingestion. We also find increased local and
decreased global network integration. Our results are broadly consistent with
the entropic brain hypothesis. Finally, we discuss our findings in the context
of descriptions of "mind-expansion" frequently seen in self-reports of users of
psychedelic drugs.Comment: 27 pages, 6 figure
Searching for collective behavior in a network of real neurons
Maximum entropy models are the least structured probability distributions
that exactly reproduce a chosen set of statistics measured in an interacting
network. Here we use this principle to construct probabilistic models which
describe the correlated spiking activity of populations of up to 120 neurons in
the salamander retina as it responds to natural movies. Already in groups as
small as 10 neurons, interactions between spikes can no longer be regarded as
small perturbations in an otherwise independent system; for 40 or more neurons
pairwise interactions need to be supplemented by a global interaction that
controls the distribution of synchrony in the population. Here we show that
such "K-pairwise" models--being systematic extensions of the previously used
pairwise Ising models--provide an excellent account of the data. We explore the
properties of the neural vocabulary by: 1) estimating its entropy, which
constrains the population's capacity to represent visual information; 2)
classifying activity patterns into a small set of metastable collective modes;
3) showing that the neural codeword ensembles are extremely inhomogenous; 4)
demonstrating that the state of individual neurons is highly predictable from
the rest of the population, allowing the capacity for error correction.Comment: 24 pages, 19 figure
Deconvolving mutational patterns of poliovirus outbreaks reveals its intrinsic fitness landscape.
Vaccination has essentially eradicated poliovirus. Yet, its mutation rate is higher than that of viruses like HIV, for which no effective vaccine exists. To investigate this, we infer a fitness model for the poliovirus viral protein 1 (vp1), which successfully predicts in vitro fitness measurements. This is achieved by first developing a probabilistic model for the prevalence of vp1 sequences that enables us to isolate and remove data that are subject to strong vaccine-derived biases. The intrinsic fitness constraints derived for vp1, a capsid protein subject to antibody responses, are compared with those of analogous HIV proteins. We find that vp1 evolution is subject to tighter constraints, limiting its ability to evade vaccine-induced immune responses. Our analysis also indicates that circulating poliovirus strains in unimmunized populations serve as a reservoir that can seed outbreaks in spatio-temporally localized sub-optimally immunized populations
Statistical modeling of RNA structure profiling experiments enables parsimonious reconstruction of structure landscapes.
RNA plays key regulatory roles in diverse cellular processes, where its functionality often derives from folding into and converting between structures. Many RNAs further rely on co-existence of alternative structures, which govern their response to cellular signals. However, characterizing heterogeneous landscapes is difficult, both experimentally and computationally. Recently, structure profiling experiments have emerged as powerful and affordable structure characterization methods, which improve computational structure prediction. To date, efforts have centered on predicting one optimal structure, with much less progress made on multiple-structure prediction. Here, we report a probabilistic modeling approach that predicts a parsimonious set of co-existing structures and estimates their abundances from structure profiling data. We demonstrate robust landscape reconstruction and quantitative insights into structural dynamics by analyzing numerous data sets. This work establishes a framework for data-directed characterization of structure landscapes to aid experimentalists in performing structure-function studies
Developments in the theory of randomized shortest paths with a comparison of graph node distances
There have lately been several suggestions for parametrized distances on a
graph that generalize the shortest path distance and the commute time or
resistance distance. The need for developing such distances has risen from the
observation that the above-mentioned common distances in many situations fail
to take into account the global structure of the graph. In this article, we
develop the theory of one family of graph node distances, known as the
randomized shortest path dissimilarity, which has its foundation in statistical
physics. We show that the randomized shortest path dissimilarity can be easily
computed in closed form for all pairs of nodes of a graph. Moreover, we come up
with a new definition of a distance measure that we call the free energy
distance. The free energy distance can be seen as an upgrade of the randomized
shortest path dissimilarity as it defines a metric, in addition to which it
satisfies the graph-geodetic property. The derivation and computation of the
free energy distance are also straightforward. We then make a comparison
between a set of generalized distances that interpolate between the shortest
path distance and the commute time, or resistance distance. This comparison
focuses on the applicability of the distances in graph node clustering and
classification. The comparison, in general, shows that the parametrized
distances perform well in the tasks. In particular, we see that the results
obtained with the free energy distance are among the best in all the
experiments.Comment: 30 pages, 4 figures, 3 table
- …