5,884 research outputs found
Selection of sequence motifs and generative Hopfield-Potts models for protein familiesilies
Statistical models for families of evolutionary related proteins have
recently gained interest: in particular pairwise Potts models, as those
inferred by the Direct-Coupling Analysis, have been able to extract information
about the three-dimensional structure of folded proteins, and about the effect
of amino-acid substitutions in proteins. These models are typically requested
to reproduce the one- and two-point statistics of the amino-acid usage in a
protein family, {\em i.e.}~to capture the so-called residue conservation and
covariation statistics of proteins of common evolutionary origin. Pairwise
Potts models are the maximum-entropy models achieving this. While being
successful, these models depend on huge numbers of {\em ad hoc} introduced
parameters, which have to be estimated from finite amount of data and whose
biophysical interpretation remains unclear. Here we propose an approach to
parameter reduction, which is based on selecting collective sequence motifs. It
naturally leads to the formulation of statistical sequence models in terms of
Hopfield-Potts models. These models can be accurately inferred using a mapping
to restricted Boltzmann machines and persistent contrastive divergence. We show
that, when applied to protein data, even 20-40 patterns are sufficient to
obtain statistically close-to-generative models. The Hopfield patterns form
interpretable sequence motifs and may be used to clusterize amino-acid
sequences into functional sub-families. However, the distributed collective
nature of these motifs intrinsically limits the ability of Hopfield-Potts
models in predicting contact maps, showing the necessity of developing models
going beyond the Hopfield-Potts models discussed here.Comment: 26 pages, 16 figures, to app. in PR
Hierarchical fractional-step approximations and parallel kinetic Monte Carlo algorithms
We present a mathematical framework for constructing and analyzing parallel
algorithms for lattice Kinetic Monte Carlo (KMC) simulations. The resulting
algorithms have the capacity to simulate a wide range of spatio-temporal scales
in spatially distributed, non-equilibrium physiochemical processes with complex
chemistry and transport micro-mechanisms. The algorithms can be tailored to
specific hierarchical parallel architectures such as multi-core processors or
clusters of Graphical Processing Units (GPUs). The proposed parallel algorithms
are controlled-error approximations of kinetic Monte Carlo algorithms,
departing from the predominant paradigm of creating parallel KMC algorithms
with exactly the same master equation as the serial one.
Our methodology relies on a spatial decomposition of the Markov operator
underlying the KMC algorithm into a hierarchy of operators corresponding to the
processors' structure in the parallel architecture. Based on this operator
decomposition, we formulate Fractional Step Approximation schemes by employing
the Trotter Theorem and its random variants; these schemes, (a) determine the
communication schedule} between processors, and (b) are run independently on
each processor through a serial KMC simulation, called a kernel, on each
fractional step time-window.
Furthermore, the proposed mathematical framework allows us to rigorously
justify the numerical and statistical consistency of the proposed algorithms,
showing the convergence of our approximating schemes to the original serial
KMC. The approach also provides a systematic evaluation of different processor
communicating schedules.Comment: 34 pages, 9 figure
Molecular phylogenetics of Amblycorypha (Orthoptera: Tettigoniidae): a molecular morphometric and molecular taxonomic approach
Genus Amblycorypha (Orthoptera: Tettigoniidae) is comprised of 14 nominal species exhibitinghighly similar morphologies. Three major morphologically similar species complexes exist in Amblycorypha – the uhleri, oblongifolia, and rotundifolia complexes. While each species ismorphological similar, the songs that males use to attract mates differ drastically among species. Recently collected male and female mating songs suggest multiple undescribed species exist within the rotundifoliacomplex. Using molecular techniques, I aim to delimit species groups within Amblycorypha and attempt to reconstruct their evolutionary histories. The ITS1 (~461 bp), 5.8S (174 bp), and ITS2 (240 bp) nuDNA regions and a partial CO1 (523 bp) mtDNA gene were sequenced using massively parallel sequencing technologies. The CO1 mtDNA region was the most variable (10.1% overall mean distance), followed by ITS2 (1.1% mean distance), ITS1 (0.9% mean distance), and 5.8S (0.02% mean distance). A single nucleotide polymorphism was present in 5.8S uniting the uhleri complex as a clade. K2P interspecificdifferences had large overlap in both nominal species groups and unknown species groups. ML and MSC phylogenetic analyses recovered the uhleri complex as monophyletic, while the oblongifolia and rotundifolia complexes were polyphyletic. Additionally, 6 distinct clades of ‘unknown specimens’ were recovered in ML and MSC analyses using all gene targets. Finally, A. bartrami may represent a species complex based on the molecular evidence presented here. This study represents the first molecularphylogeny for genus Amblycorypha. While incomplete, this study supports additional cryptic species within the rotundifolia complex that were initially detected based on male songs
- …