45,777 research outputs found
Bayesian stochastic blockmodeling
This chapter provides a self-contained introduction to the use of Bayesian
inference to extract large-scale modular structures from network data, based on
the stochastic blockmodel (SBM), as well as its degree-corrected and
overlapping generalizations. We focus on nonparametric formulations that allow
their inference in a manner that prevents overfitting, and enables model
selection. We discuss aspects of the choice of priors, in particular how to
avoid underfitting via increased Bayesian hierarchies, and we contrast the task
of sampling network partitions from the posterior distribution with finding the
single point estimate that maximizes it, while describing efficient algorithms
to perform either one. We also show how inferring the SBM can be used to
predict missing and spurious links, and shed light on the fundamental
limitations of the detectability of modular structures in networks.Comment: 44 pages, 16 figures. Code is freely available as part of graph-tool
at https://graph-tool.skewed.de . See also the HOWTO at
https://graph-tool.skewed.de/static/doc/demos/inference/inference.htm
Transferable coarse-grained potential for protein folding and design
Protein folding and design are major biophysical problems, the solution of
which would lead to important applications especially in medicine. Here a novel
protein model capable of simultaneously provide quantitative protein design and
folding is introduced. With computer simulations it is shown that, for a large
set of real protein structures, the model produces designed sequences with
similar physical properties to the corresponding natural occurring sequences.
The designed sequences are not yet fully realistic and require further
experimental testing. For an independent set of proteins, notoriously difficult
to fold, the correct folding of both the designed and the natural sequences is
also demonstrated. The folding properties are characterized by free energy
calculations. which not only are consistent among natural and designed
proteins, but we also show a remarkable precision when the folded structures
are compared to the experimentally determined ones. Ultimately, this novel
coarse-grained protein model is unique in the combination of its fundamental
three features: its simplicity, its ability to produce natural foldable
designed sequences, and its structure prediction precision. The latter
demonstrated by free energy calculations. It is also remarkable that low
frustration sequences can be obtained with such a simple and universal design
procedure, and that the folding of natural proteins shows funnelled free energy
landscapes without the need of any potentials based on the native structure
Change, time and information geometry
Dynamics, the study of change, is normally the subject of mechanics. Whether
the chosen mechanics is ``fundamental'' and deterministic or
``phenomenological'' and stochastic, all changes are described relative to an
external time. Here we show that once we define what we are talking about,
namely, the system, its states and a criterion to distinguish among them, there
is a single, unique, and natural dynamical law for irreversible processes that
is compatible with the principle of maximum entropy. In this alternative
dynamics changes are described relative to an internal, ``intrinsic'' time
which is a derived, statistical concept defined and measured by change itself.
Time is quantified change.Comment: Presented at MaxEnt 2000, the 20th International Workshop on Bayesian
Inference and Maximum Entropy Methods (July 8-13, 2000, Gif-sur-Yvette,
France
Saccadic Predictive Vision Model with a Fovea
We propose a model that emulates saccades, the rapid movements of the eye,
called the Error Saccade Model, based on the prediction error of the Predictive
Vision Model (PVM). The Error Saccade Model carries out movements of the
model's field of view to regions with the highest prediction error. Comparisons
of the Error Saccade Model on Predictive Vision Models with and without a fovea
show that a fovea-like structure in the input level of the PVM improves the
Error Saccade Model's ability to pursue detailed objects in its view. We
hypothesize that the improvement is due to poorer resolution in the periphery
causing higher prediction error when an object passes, triggering a saccade to
the next location.Comment: 10 pages, 6 figure, Accepted in International Conference of
Neuromorphic Computing (2018
Statistical Mechanics of maximal independent sets
The graph theoretic concept of maximal independent set arises in several
practical problems in computer science as well as in game theory. A maximal
independent set is defined by the set of occupied nodes that satisfy some
packing and covering constraints. It is known that finding minimum and
maximum-density maximal independent sets are hard optimization problems. In
this paper, we use cavity method of statistical physics and Monte Carlo
simulations to study the corresponding constraint satisfaction problem on
random graphs. We obtain the entropy of maximal independent sets within the
replica symmetric and one-step replica symmetry breaking frameworks, shedding
light on the metric structure of the landscape of solutions and suggesting a
class of possible algorithms. This is of particular relevance for the
application to the study of strategic interactions in social and economic
networks, where maximal independent sets correspond to pure Nash equilibria of
a graphical game of public goods allocation
Semantic Information G Theory and Logical Bayesian Inference for Machine Learning
An important problem with machine learning is that when label number n\u3e2, it is very difficult to construct and optimize a group of learning functions, and we wish that optimized learning functions are still useful when prior distribution P(x) (where x is an instance) is changed. To resolve this problem, the semantic information G theory, Logical Bayesian Inference (LBI), and a group of Channel Matching (CM) algorithms together form a systematic solution. MultilabelMultilabel A semantic channel in the G theory consists of a group of truth functions or membership functions. In comparison with likelihood functions, Bayesian posteriors, and Logistic functions used by popular methods, membership functions can be more conveniently used as learning functions without the above problem. In Logical Bayesian Inference (LBI), every label’s learning is independent. For Multilabel learning, we can directly obtain a group of optimized membership functions from a big enough sample with labels, without preparing different samples for different labels. A group of Channel Matching (CM) algorithms are developed for machine learning. For the Maximum Mutual Information (MMI) classification of three classes with Gaussian distributions on a two-dimensional feature space, 2-3 iterations can make mutual information between three classes and three labels surpass 99% of the MMI for most initial partitions. For mixture models, the Expectation-Maxmization (EM) algorithm is improved and becomes the CM-EM algorithm, which can outperform the EM algorithm when mixture ratios are imbalanced, or local convergence exists. The CM iteration algorithm needs to combine neural networks for MMI classifications on high-dimensional feature spaces. LBI needs further studies for the unification of statistics and logic
Stability of Terrestrial Planets in the Habitable Zone of Gl 777 A, HD 72659, Gl 614, 47 Uma and HD 4208
We have undertaken a thorough dynamical investigation of five extrasolar
planetary systems using extensive numerical experiments. The systems Gl 777 A,
HD 72659, Gl 614, 47 Uma and HD 4208 were examined concerning the question of
whether they could host terrestrial like planets in their habitable zones
(=HZ). First we investigated the mean motion resonances between fictitious
terrestrial planets and the existing gas giants in these five extrasolar
systems. Then a fine grid of initial conditions for a potential terrestrial
planet within the HZ was chosen for each system, from which the stability of
orbits was then assessed by direct integrations over a time interval of 1
million years. The computations were carried out using a Lie-series integration
method with an adaptive step size control. This integration method achieves
machine precision accuracy in a highly efficient and robust way, requiring no
special adjustments when the orbits have large eccentricities. The stability of
orbits was examined with a determination of the Renyi entropy, estimated from
recurrence plots, and with a more straight forward method based on the maximum
eccentricity achieved by the planet over the 1 million year integration.
Additionally, the eccentricity is an indication of the habitability of a
terrestrial planet in the HZ; any value of e>0.2 produces a significant
temperature difference on a planet's surface between apoapse and periapse. The
results for possible stable orbits for terrestrial planets in habitable zones
for the five systems are summarized as follows: for Gl 777 A nearly the entire
HZ is stable, for 47 Uma, HD 72659 and HD 4208 terrestrial planets can survive
for a sufficiently long time, while for Gl 614 our results exclude terrestrial
planets moving in stable orbits within the HZ.Comment: 14 pages, 18 figures submitted to A&
Entropy-scaling search of massive biological data
Many datasets exhibit a well-defined structure that can be exploited to
design faster search tools, but it is not always clear when such acceleration
is possible. Here, we introduce a framework for similarity search based on
characterizing a dataset's entropy and fractal dimension. We prove that
searching scales in time with metric entropy (number of covering hyperspheres),
if the fractal dimension of the dataset is low, and scales in space with the
sum of metric entropy and information-theoretic entropy (randomness of the
data). Using these ideas, we present accelerated versions of standard tools,
with no loss in specificity and little loss in sensitivity, for use in three
domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics
(MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search
(esFragBag, 10x speedup of FragBag). Our framework can be used to achieve
"compressive omics," and the general theory can be readily applied to data
science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo
- …