10,806 research outputs found
Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models
Spatially proximate amino acids in a protein tend to coevolve. A protein's
three-dimensional (3D) structure hence leaves an echo of correlations in the
evolutionary record. Reverse engineering 3D structures from such correlations
is an open problem in structural biology, pursued with increasing vigor as more
and more protein sequences continue to fill the data banks. Within this task
lies a statistical inference problem, rooted in the following: correlation
between two sites in a protein sequence can arise from firsthand interaction
but can also be network-propagated via intermediate sites; observed correlation
is not enough to guarantee proximity. To separate direct from indirect
interactions is an instance of the general problem of inverse statistical
mechanics, where the task is to learn model parameters (fields, couplings) from
observables (magnetizations, correlations, samples) in large systems. In the
context of protein sequences, the approach has been referred to as
direct-coupling analysis. Here we show that the pseudolikelihood method,
applied to 21-state Potts models describing the statistical properties of
families of evolutionarily related proteins, significantly outperforms existing
approaches to the direct-coupling analysis, the latter being based on standard
mean-field techniques. This improved performance also relies on a modified
score for the coupling strength. The results are verified using known crystal
structures of specific sequence instances of various protein families. Code
implementing the new method can be found at http://plmdca.csc.kth.se/.Comment: 19 pages, 16 figures, published versio
Motif Discovery in Protein Sequences
Biology has become a dataâintensive research field. Coping with the flood of data from the new genome sequencing technologies is a major area of research. The exponential increase in the size of the datasets produced by ânextâgeneration sequencingâ (NGS) poses unique computational challenges. In this context, motif discovery tools are widely used to identify important patterns in the sequences produced. Biological sequence motifs are defined as short, usually fixed length, sequence patterns that may represent important structural or functional features in nucleic acid and protein sequences such as transcription binding sites, splice junctions, active sites, or interaction interfaces. They can occur in an exact or approximate form within a family or a subfamily of sequences. Motif discovery is therefore an important field in bioinformatics, and numerous methods have been developed for the identification of motifs shared by a set of functionally related sequences. This chapter will review the existing motif discovery methods for protein sequences and their ability to discover biologically important features as well as their limitations for the discovery of new motifs. Finally, we will propose new horizons for motif discovery in order to address the short comings of the existent methods
New Insights on the Evolutionary History of Aphids and Their Primary Endosymbiont Buchnera aphidicola
Since the establishment of the symbiosis between the ancestor of modern aphids and their primary endosymbiont, Buchnera aphidicola, insects and bacteria have coevolved. Due to this parallel evolution, the analysis of bacterial genomic features constitutes a useful tool to understand their evolutionary history. Here we report, based on data from B. aphidicola, the molecular evolutionary analysis, the phylogenetic relationships among lineages and a comparison of sequence evolutionary rates of symbionts of four aphid species from three subfamilies. Our results support previous hypotheses of divergence of B. aphidicola and their host lineages during the early Cretaceous and indicate a closer relationship between subfamilies Eriosomatinae and Lachninae than with the Aphidinae. They also reveal a general evolutionary pattern among strains at the functional level. We also point out the effect of lifecycle and generation time as a possible explanation for the accelerated rate in B. aphidicola from the Lachninae
SBSM-Pro: Support Bio-sequence Machine for Proteins
Proteins play a pivotal role in biological systems. The use of machine
learning algorithms for protein classification can assist and even guide
biological experiments, offering crucial insights for biotechnological
applications. We propose a support bio-sequence machine for proteins, a model
specifically designed for biological sequence classification. This model starts
with raw sequences and groups amino acids based on their physicochemical
properties. It incorporates sequence alignment to measure the similarities
between proteins and uses a novel MKL approach to integrate various types of
information, utilizing support vector machines for classification prediction.
The results indicate that our model demonstrates commendable performance across
10 datasets in terms of the identification of protein function and
posttranslational modification. This research not only showcases
state-of-the-art work in protein classification but also paves the way for new
directions in this domain, representing a beneficial endeavour in the
development of platforms tailored for biological sequence classification.
SBSM-Pro is available for access at http://lab.malab.cn/soft/SBSM-Pro/.Comment: 38 pages, 9 figure
- âŠ