45,777 research outputs found

    Bayesian stochastic blockmodeling

    Full text link
    This chapter provides a self-contained introduction to the use of Bayesian inference to extract large-scale modular structures from network data, based on the stochastic blockmodel (SBM), as well as its degree-corrected and overlapping generalizations. We focus on nonparametric formulations that allow their inference in a manner that prevents overfitting, and enables model selection. We discuss aspects of the choice of priors, in particular how to avoid underfitting via increased Bayesian hierarchies, and we contrast the task of sampling network partitions from the posterior distribution with finding the single point estimate that maximizes it, while describing efficient algorithms to perform either one. We also show how inferring the SBM can be used to predict missing and spurious links, and shed light on the fundamental limitations of the detectability of modular structures in networks.Comment: 44 pages, 16 figures. Code is freely available as part of graph-tool at https://graph-tool.skewed.de . See also the HOWTO at https://graph-tool.skewed.de/static/doc/demos/inference/inference.htm

    Transferable coarse-grained potential for de novo\textit{de novo} protein folding and design

    Full text link
    Protein folding and design are major biophysical problems, the solution of which would lead to important applications especially in medicine. Here a novel protein model capable of simultaneously provide quantitative protein design and folding is introduced. With computer simulations it is shown that, for a large set of real protein structures, the model produces designed sequences with similar physical properties to the corresponding natural occurring sequences. The designed sequences are not yet fully realistic and require further experimental testing. For an independent set of proteins, notoriously difficult to fold, the correct folding of both the designed and the natural sequences is also demonstrated. The folding properties are characterized by free energy calculations. which not only are consistent among natural and designed proteins, but we also show a remarkable precision when the folded structures are compared to the experimentally determined ones. Ultimately, this novel coarse-grained protein model is unique in the combination of its fundamental three features: its simplicity, its ability to produce natural foldable designed sequences, and its structure prediction precision. The latter demonstrated by free energy calculations. It is also remarkable that low frustration sequences can be obtained with such a simple and universal design procedure, and that the folding of natural proteins shows funnelled free energy landscapes without the need of any potentials based on the native structure

    Change, time and information geometry

    Full text link
    Dynamics, the study of change, is normally the subject of mechanics. Whether the chosen mechanics is ``fundamental'' and deterministic or ``phenomenological'' and stochastic, all changes are described relative to an external time. Here we show that once we define what we are talking about, namely, the system, its states and a criterion to distinguish among them, there is a single, unique, and natural dynamical law for irreversible processes that is compatible with the principle of maximum entropy. In this alternative dynamics changes are described relative to an internal, ``intrinsic'' time which is a derived, statistical concept defined and measured by change itself. Time is quantified change.Comment: Presented at MaxEnt 2000, the 20th International Workshop on Bayesian Inference and Maximum Entropy Methods (July 8-13, 2000, Gif-sur-Yvette, France

    Saccadic Predictive Vision Model with a Fovea

    Full text link
    We propose a model that emulates saccades, the rapid movements of the eye, called the Error Saccade Model, based on the prediction error of the Predictive Vision Model (PVM). The Error Saccade Model carries out movements of the model's field of view to regions with the highest prediction error. Comparisons of the Error Saccade Model on Predictive Vision Models with and without a fovea show that a fovea-like structure in the input level of the PVM improves the Error Saccade Model's ability to pursue detailed objects in its view. We hypothesize that the improvement is due to poorer resolution in the periphery causing higher prediction error when an object passes, triggering a saccade to the next location.Comment: 10 pages, 6 figure, Accepted in International Conference of Neuromorphic Computing (2018

    Statistical Mechanics of maximal independent sets

    Full text link
    The graph theoretic concept of maximal independent set arises in several practical problems in computer science as well as in game theory. A maximal independent set is defined by the set of occupied nodes that satisfy some packing and covering constraints. It is known that finding minimum and maximum-density maximal independent sets are hard optimization problems. In this paper, we use cavity method of statistical physics and Monte Carlo simulations to study the corresponding constraint satisfaction problem on random graphs. We obtain the entropy of maximal independent sets within the replica symmetric and one-step replica symmetry breaking frameworks, shedding light on the metric structure of the landscape of solutions and suggesting a class of possible algorithms. This is of particular relevance for the application to the study of strategic interactions in social and economic networks, where maximal independent sets correspond to pure Nash equilibria of a graphical game of public goods allocation

    Semantic Information G Theory and Logical Bayesian Inference for Machine Learning

    Get PDF
    An important problem with machine learning is that when label number n\u3e2, it is very difficult to construct and optimize a group of learning functions, and we wish that optimized learning functions are still useful when prior distribution P(x) (where x is an instance) is changed. To resolve this problem, the semantic information G theory, Logical Bayesian Inference (LBI), and a group of Channel Matching (CM) algorithms together form a systematic solution. MultilabelMultilabel A semantic channel in the G theory consists of a group of truth functions or membership functions. In comparison with likelihood functions, Bayesian posteriors, and Logistic functions used by popular methods, membership functions can be more conveniently used as learning functions without the above problem. In Logical Bayesian Inference (LBI), every label’s learning is independent. For Multilabel learning, we can directly obtain a group of optimized membership functions from a big enough sample with labels, without preparing different samples for different labels. A group of Channel Matching (CM) algorithms are developed for machine learning. For the Maximum Mutual Information (MMI) classification of three classes with Gaussian distributions on a two-dimensional feature space, 2-3 iterations can make mutual information between three classes and three labels surpass 99% of the MMI for most initial partitions. For mixture models, the Expectation-Maxmization (EM) algorithm is improved and becomes the CM-EM algorithm, which can outperform the EM algorithm when mixture ratios are imbalanced, or local convergence exists. The CM iteration algorithm needs to combine neural networks for MMI classifications on high-dimensional feature spaces. LBI needs further studies for the unification of statistics and logic

    Stability of Terrestrial Planets in the Habitable Zone of Gl 777 A, HD 72659, Gl 614, 47 Uma and HD 4208

    Full text link
    We have undertaken a thorough dynamical investigation of five extrasolar planetary systems using extensive numerical experiments. The systems Gl 777 A, HD 72659, Gl 614, 47 Uma and HD 4208 were examined concerning the question of whether they could host terrestrial like planets in their habitable zones (=HZ). First we investigated the mean motion resonances between fictitious terrestrial planets and the existing gas giants in these five extrasolar systems. Then a fine grid of initial conditions for a potential terrestrial planet within the HZ was chosen for each system, from which the stability of orbits was then assessed by direct integrations over a time interval of 1 million years. The computations were carried out using a Lie-series integration method with an adaptive step size control. This integration method achieves machine precision accuracy in a highly efficient and robust way, requiring no special adjustments when the orbits have large eccentricities. The stability of orbits was examined with a determination of the Renyi entropy, estimated from recurrence plots, and with a more straight forward method based on the maximum eccentricity achieved by the planet over the 1 million year integration. Additionally, the eccentricity is an indication of the habitability of a terrestrial planet in the HZ; any value of e>0.2 produces a significant temperature difference on a planet's surface between apoapse and periapse. The results for possible stable orbits for terrestrial planets in habitable zones for the five systems are summarized as follows: for Gl 777 A nearly the entire HZ is stable, for 47 Uma, HD 72659 and HD 4208 terrestrial planets can survive for a sufficiently long time, while for Gl 614 our results exclude terrestrial planets moving in stable orbits within the HZ.Comment: 14 pages, 18 figures submitted to A&

    Entropy-scaling search of massive biological data

    Get PDF
    Many datasets exhibit a well-defined structure that can be exploited to design faster search tools, but it is not always clear when such acceleration is possible. Here, we introduce a framework for similarity search based on characterizing a dataset's entropy and fractal dimension. We prove that searching scales in time with metric entropy (number of covering hyperspheres), if the fractal dimension of the dataset is low, and scales in space with the sum of metric entropy and information-theoretic entropy (randomness of the data). Using these ideas, we present accelerated versions of standard tools, with no loss in specificity and little loss in sensitivity, for use in three domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics (MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search (esFragBag, 10x speedup of FragBag). Our framework can be used to achieve "compressive omics," and the general theory can be readily applied to data science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo
    • …
    corecore