70,965 research outputs found

    A Topic Modeling Toolbox Using Belief Propagation

    Full text link
    Latent Dirichlet allocation (LDA) is an important hierarchical Bayesian model for probabilistic topic modeling, which attracts worldwide interests and touches on many important applications in text mining, computer vision and computational biology. This paper introduces a topic modeling toolbox (TMBP) based on the belief propagation (BP) algorithms. TMBP toolbox is implemented by MEX C++/Matlab/Octave for either Windows 7 or Linux. Compared with existing topic modeling packages, the novelty of this toolbox lies in the BP algorithms for learning LDA-based topic models. The current version includes BP algorithms for latent Dirichlet allocation (LDA), author-topic models (ATM), relational topic models (RTM), and labeled LDA (LaLDA). This toolbox is an ongoing project and more BP-based algorithms for various topic models will be added in the near future. Interested users may also extend BP algorithms for learning more complicated topic models. The source codes are freely available under the GNU General Public Licence, Version 1.0 at https://mloss.org/software/view/399/.Comment: 4 page

    Computational disease modeling – fact or fiction?

    Get PDF
    BACKGROUND: Biomedical research is changing due to the rapid accumulation of experimental data at an unprecedented scale, revealing increasing degrees of complexity of biological processes. Life Sciences are facing a transition from a descriptive to a mechanistic approach that reveals principles of cells, cellular networks, organs, and their interactions across several spatial and temporal scales. There are two conceptual traditions in biological computational-modeling. The bottom-up approach emphasizes complex intracellular molecular models and is well represented within the systems biology community. On the other hand, the physics-inspired top-down modeling strategy identifies and selects features of (presumably) essential relevance to the phenomena of interest and combines available data in models of modest complexity. RESULTS: The workshop, "ESF Exploratory Workshop on Computational disease Modeling", examined the challenges that computational modeling faces in contributing to the understanding and treatment of complex multi-factorial diseases. Participants at the meeting agreed on two general conclusions. First, we identified the critical importance of developing analytical tools for dealing with model and parameter uncertainty. Second, the development of predictive hierarchical models spanning several scales beyond intracellular molecular networks was identified as a major objective. This contrasts with the current focus within the systems biology community on complex molecular modeling. CONCLUSION: During the workshop it became obvious that diverse scientific modeling cultures (from computational neuroscience, theory, data-driven machine-learning approaches, agent-based modeling, network modeling and stochastic-molecular simulations) would benefit from intense cross-talk on shared theoretical issues in order to make progress on clinically relevant problems

    Rate variation in language change: Toward distributional phylogenetic modeling

    Get PDF
    Since the advent of phylogenetic linguistics, researchers have used a large number of phylogenetic comparative methods adapted from computational biology to model and analyze the dynamics of change of a wide range of linguistic features. Models of this sort vary in complexity; the simplest models of change assume homogeneity of transition rates within families, while state-of-the-art models of heterotachy allow transition rates to vary across lineages within a family. In this contribution, I review a range of applications of biological models of rate variation to questions in diachronic linguistics and highlight some models from computational biology that have remained largely overlooked by linguists.Building off of these and other biological models, I sketch out a program for what I term DISTRIBUTIONAL PHYLOGENETIC MODELING, inspired by an analogousrecently proposed family of hierarchical Bayesian models. I report the results of some work in progress carried out within this framework and present a casestudy illustrating the flexibility of the approach

    Modeling and visualizing uncertainty in gene expression clusters using Dirichlet process mixtures

    Get PDF
    Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data, little attention has been paid to uncertainty in the results obtained. Dirichlet process mixture (DPM) models provide a nonparametric Bayesian alternative to the bootstrap approach to modeling uncertainty in gene expression clustering. Most previously published applications of Bayesian model-based clustering methods have been to short time series data. In this paper, we present a case study of the application of nonparametric Bayesian clustering methods to the clustering of high-dimensional nontime series gene expression data using full Gaussian covariances. We use the probability that two genes belong to the same cluster in a DPM model as a measure of the similarity of these gene expression profiles. Conversely, this probability can be used to define a dissimilarity measure, which, for the purposes of visualization, can be input to one of the standard linkage algorithms used for hierarchical clustering. Biologically plausible results are obtained from the Rosetta compendium of expression profiles which extend previously published cluster analyses of this data

    Hierarchical nanomechanics of collagen microfibrils

    Get PDF
    Collagen constitutes one third of the human proteome, providing mechanical stability, elasticity and strength to connective tissues. Collagen is also the dominating material in the extracellular matrix (ECM) and is thus crucial for cell differentiation, growth and pathology. However, fundamental questions remain with respect to the origin of the unique mechanical properties of collagenous tissues, and in particular its stiffness, extensibility and nonlinear mechanical response. By using x-ray diffraction data of a collagen fibril reported by Orgel et al. (Proceedings of the National Academy of Sciences USA, 2006) in combination with protein structure identification methods, here we present an experimentally validated model of the nanomechanics of a collagen microfibril that incorporates the full biochemical details of the amino acid sequence of the constituting molecules. We report the analysis of its mechanical properties under different levels of stress and solvent conditions, using a full-atomistic force field including explicit water solvent. Mechanical testing of hydrated collagen microfibrils yields a Young’s modulus of ≈300 MPa at small and ≈1.2 GPa at larger deformation in excess of 10% strain, in excellent agreement with experimental data. Dehydrated, dry collagen microfibrils show a significantly increased Young’s modulus of ≈1.8 to 2.25 GPa (or ≈6.75 times the modulus in the wet state) owing to a much tighter molecular packing, in good agreement with experimental measurements (where an increase of the modulus by ≈9 times was found). Our model demonstrates that the unique mechanical properties of collagen microfibrils can be explained based on their hierarchical structure, where deformation is mediated through mechanisms that operate at different hierarchical levels. Key mechanisms involve straightening of initially disordered and helically twisted molecules at small strains, followed by axial stretching of molecules, and eventual molecular uncoiling at extreme deformation. These mechanisms explain the striking difference of the modulus of collagen fibrils compared with single molecules, which is found in the range of 4.8±2 GPa or ≈10-20 times greater. These findings corroborate the notion that collagen tissue properties are highly scale dependent and nonlinear elastic, an issue that must be considered in the development of models that describe the interaction of cells with collagen in the extracellular matrix. A key impact the atomistic model of collagen microfibril mechanics reported here is that it enables the bottom-up elucidation of structure-property relationships in the broader class of collagen materials such as tendon or bone, including studies in the context of genetic disease where the incorporation of biochemical, genetic details in material models of connective tissue is essential

    Hierarchical coexistence of universality and diversity controls robustness and multi-functionality in intermediate filament protein networks

    Get PDF
    Proteins constitute the elementary building blocks of a vast variety of biological materials such as cellular protein networks, spider silk or bone, where they create extremely robust, multi-functional materials by self-organization of structures over many length- and time scales, from nano to macro. Some of the structural features are commonly found in a many different tissues, that is, they are highly conserved. Examples of such universal building blocks include alpha-helices, beta-sheets or tropocollagen molecules. In contrast, other features are highly specific to tissue types, such as particular filament assemblies, beta-sheet nanocrystals in spider silk or tendon fascicles. These examples illustrate that the coexistence of universality and diversity – in the following referred to as the universality-diversity paradigm (UDP) – is an overarching feature in protein materials. This paradigm is a paradox: How can a structure be universal and diverse at the same time? In protein materials, the coexistence of universality and diversity is enabled by utilizing hierarchies, which serve as an additional dimension beyond the 3D or 4D physical space. This may be crucial to understand how their structure and properties are linked, and how these materials are capable of combining seemingly disparate properties such as strength and robustness. Here we illustrate how the UDP enables to unify universal building blocks and highly diversified patterns through formation of hierarchical structures that lead to multi-functional, robust yet highly adapted structures. We illustrate these concepts in an analysis of three types of intermediate filament proteins, including vimentin, lamin and keratin
    corecore