5,884 research outputs found

    Selection of sequence motifs and generative Hopfield-Potts models for protein familiesilies

    Full text link
    Statistical models for families of evolutionary related proteins have recently gained interest: in particular pairwise Potts models, as those inferred by the Direct-Coupling Analysis, have been able to extract information about the three-dimensional structure of folded proteins, and about the effect of amino-acid substitutions in proteins. These models are typically requested to reproduce the one- and two-point statistics of the amino-acid usage in a protein family, {\em i.e.}~to capture the so-called residue conservation and covariation statistics of proteins of common evolutionary origin. Pairwise Potts models are the maximum-entropy models achieving this. While being successful, these models depend on huge numbers of {\em ad hoc} introduced parameters, which have to be estimated from finite amount of data and whose biophysical interpretation remains unclear. Here we propose an approach to parameter reduction, which is based on selecting collective sequence motifs. It naturally leads to the formulation of statistical sequence models in terms of Hopfield-Potts models. These models can be accurately inferred using a mapping to restricted Boltzmann machines and persistent contrastive divergence. We show that, when applied to protein data, even 20-40 patterns are sufficient to obtain statistically close-to-generative models. The Hopfield patterns form interpretable sequence motifs and may be used to clusterize amino-acid sequences into functional sub-families. However, the distributed collective nature of these motifs intrinsically limits the ability of Hopfield-Potts models in predicting contact maps, showing the necessity of developing models going beyond the Hopfield-Potts models discussed here.Comment: 26 pages, 16 figures, to app. in PR

    Hierarchical fractional-step approximations and parallel kinetic Monte Carlo algorithms

    Get PDF
    We present a mathematical framework for constructing and analyzing parallel algorithms for lattice Kinetic Monte Carlo (KMC) simulations. The resulting algorithms have the capacity to simulate a wide range of spatio-temporal scales in spatially distributed, non-equilibrium physiochemical processes with complex chemistry and transport micro-mechanisms. The algorithms can be tailored to specific hierarchical parallel architectures such as multi-core processors or clusters of Graphical Processing Units (GPUs). The proposed parallel algorithms are controlled-error approximations of kinetic Monte Carlo algorithms, departing from the predominant paradigm of creating parallel KMC algorithms with exactly the same master equation as the serial one. Our methodology relies on a spatial decomposition of the Markov operator underlying the KMC algorithm into a hierarchy of operators corresponding to the processors' structure in the parallel architecture. Based on this operator decomposition, we formulate Fractional Step Approximation schemes by employing the Trotter Theorem and its random variants; these schemes, (a) determine the communication schedule} between processors, and (b) are run independently on each processor through a serial KMC simulation, called a kernel, on each fractional step time-window. Furthermore, the proposed mathematical framework allows us to rigorously justify the numerical and statistical consistency of the proposed algorithms, showing the convergence of our approximating schemes to the original serial KMC. The approach also provides a systematic evaluation of different processor communicating schedules.Comment: 34 pages, 9 figure

    Molecular phylogenetics of Amblycorypha (Orthoptera: Tettigoniidae): a molecular morphometric and molecular taxonomic approach

    Get PDF
    Genus Amblycorypha (Orthoptera: Tettigoniidae) is comprised of 14 nominal species exhibitinghighly similar morphologies. Three major morphologically similar species complexes exist in Amblycorypha – the uhleri, oblongifolia, and rotundifolia complexes. While each species ismorphological similar, the songs that males use to attract mates differ drastically among species. Recently collected male and female mating songs suggest multiple undescribed species exist within the rotundifoliacomplex. Using molecular techniques, I aim to delimit species groups within Amblycorypha and attempt to reconstruct their evolutionary histories. The ITS1 (~461 bp), 5.8S (174 bp), and ITS2 (240 bp) nuDNA regions and a partial CO1 (523 bp) mtDNA gene were sequenced using massively parallel sequencing technologies. The CO1 mtDNA region was the most variable (10.1% overall mean distance), followed by ITS2 (1.1% mean distance), ITS1 (0.9% mean distance), and 5.8S (0.02% mean distance). A single nucleotide polymorphism was present in 5.8S uniting the uhleri complex as a clade. K2P interspecificdifferences had large overlap in both nominal species groups and unknown species groups. ML and MSC phylogenetic analyses recovered the uhleri complex as monophyletic, while the oblongifolia and rotundifolia complexes were polyphyletic. Additionally, 6 distinct clades of ‘unknown specimens’ were recovered in ML and MSC analyses using all gene targets. Finally, A. bartrami may represent a species complex based on the molecular evidence presented here. This study represents the first molecularphylogeny for genus Amblycorypha. While incomplete, this study supports additional cryptic species within the rotundifolia complex that were initially detected based on male songs
    • …
    corecore