159 research outputs found
Transduplication resulted in the incorporation of two protein-coding sequences into the Turmoil-1 transposable element of C. elegans
Transposable elements may acquire unrelated gene fragments into their
sequences in a process called transduplication. Transduplication of
protein-coding genes is common in plants, but is unknown of in animals. Here,
we report that the Turmoil-1 transposable element in C. elegans has
incorporated two protein-coding sequences into its inverted terminal repeat
(ITR) sequences. The ITRs of Turmoil-1 contain a conserved RNA recognition
motif (RRM) that originated from the rsp- 2 gene and a fragment from the
protein-coding region of the cpg-3 gene. We further report that an open reading
frame specific to C. elegans may have been created as a result of a Turmoil-1
insertion. Mutations at the 5' splice site of this open reading frame may have
reactivated the transduplicated RRM moti
Paths of lateral gene transfer of lysyl-aminoacyl-tRNA synthetases with a unique evolutionary transition stage of prokaryotes coding for class I and II varieties by the same organisms
BACKGROUND: While the premise that lateral gene transfer (LGT) is a dominant evolutionary force is still in considerable dispute, the case for widespread LGT in the family of aminoacyl-tRNA synthetases (aaRS) is no longer contentious. aaRSs are ancient enzymes, guarding the fidelity of the genetic code. They are clustered in two structurally unrelated classes. Only lysine aminoacyl-tRNA synthetase (LysRS) is found both as a class 1 and a class 2 enzyme (LysRS1-2). Remarkably, in several extant prokaryotes both classes of the enzyme coexist, a unique phenomenon that has yet to receive its due attention. RESULTS: We applied a phylogenetic approach for determining the extent and origin of LGT in prokaryotic LysRS. Reconstructing species trees for Archaea and Bacteria, and inferring that their last common ancestors encoded LysRS1 and LysRS2, respectively, we studied the gains and losses of both classes. A complex pattern of LGT events emerged. In specific groups of organisms LysRS1 was replaced by LysRS2 (and vice versa). In one occasion, within the alpha proteobacteria, a LysRS2 to LysRS1 LGT was followed by reversal to LysRS2. After establishing the most likely LGT paths, we studied the possible origins of the laterally transferred genes. To this end, we reconstructed LysRS gene trees and evaluated the likely origins of the laterally transferred genes. While the sources of LysRS1 LGTs were readily identified, those for LysRS2 remain, for now, uncertain. The replacement of one LysRS by another apparently transits through a stage simultaneously coding for both synthetases, probably conferring a selective advantage to the affected organisms. CONCLUSION: The family of LysRSs features complex LGT events. The currently available data were sufficient for identifying unambiguously the origins of LysRS1 but not of LysRS2 gene transfers. A selective advantage is suggested to organisms encoding simultaneously LysRS1-2
Epitopia: a web-server for predicting B-cell epitopes
<p>Abstract</p> <p>Background</p> <p>Detecting candidate B-cell epitopes in a protein is a basic and fundamental step in many immunological applications. Due to the impracticality of experimental approaches to systematically scan the entire protein, a computational tool that predicts the most probable epitope regions is desirable.</p> <p>Results</p> <p>The Epitopia server is a web-based tool that aims to predict immunogenic regions in either a protein three-dimensional structure or a linear sequence. Epitopia implements a machine-learning algorithm that was trained to discern antigenic features within a given protein. The Epitopia algorithm has been compared to other available epitope prediction tools and was found to have higher predictive power. A special emphasis was put on the development of a user-friendly graphical interface for displaying the results.</p> <p>Conclusion</p> <p>Epitopia is a user-friendly web-server that predicts immunogenic regions for both a protein structure and a protein sequence. Its accuracy and functionality make it a highly useful tool. Epitopia is available at <url>http://epitopia.tau.ac.il</url> and includes extensive explanations and example predictions.</p
Selecton 2007: advanced models for detecting positive and purifying selection using a Bayesian inference approach
Biologically significant sites in a protein may be identified by contrasting the rates of synonymous (Ks) and non-synonymous (Ka) substitutions. This enables the inference of site-specific positive Darwinian selection and purifying selection. We present here Selecton version 2.2 (http://selecton.bioinfo.tau.ac.il), a web server which automatically calculates the ratio between Ka and Ks (ω) at each site of the protein. This ratio is graphically displayed on each site using a color-coding scheme, indicating either positive selection, purifying selection or lack of selection. Selecton implements an assembly of different evolutionary models, which allow for statistical testing of the hypothesis that a protein has undergone positive selection. Specifically, the recently developed mechanistic-empirical model is introduced, which takes into account the physicochemical properties of amino acids. Advanced options were introduced to allow maximal fine tuning of the server to the user's specific needs, including calculation of statistical support of the ω values, an advanced graphic display of the protein's 3-dimensional structure, use of different genetic codes and inputting of a pre-built phylogenetic tree. Selecton version 2.2 is an effective, user-friendly and freely available web server which implements up-to-date methods for computing site-specific selection forces, and the visualization of these forces on the protein's sequence and structure
The Alternative Choice of Constitutive Exons throughout Evolution
Alternative cassette exons are known to originate from two processes
exonization of intronic sequences and exon shuffling. Herein, we suggest an
additional mechanism by which constitutively spliced exons become alternative
cassette exons during evolution. We compiled a dataset of orthologous exons
from human and mouse that are constitutively spliced in one species but
alternatively spliced in the other. Examination of these exons suggests that
the common ancestors were constitutively spliced. We show that relaxation of
the 59 splice site during evolution is one of the molecular mechanisms by which
exons shift from constitutive to alternative splicing. This shift is associated
with the fixation of exonic splicing regulatory sequences (ESRs) that are
essential for exon definition and control the inclusion level only after the
transition to alternative splicing. The effect of each ESR on splicing and the
combinatorial effects between two ESRs are conserved from fish to human. Our
results uncover an evolutionary pathway that increases transcriptome diversity
by shifting exons from constitutive to alternative splicin
A LASSO-based approach to sample sites for phylogenetic tree search
Motivation
In recent years, full-genome sequences have become increasingly available and as a result many modern phylogenetic analyses are based on very long sequences, often with over 100 000 sites. Phylogenetic reconstructions of large-scale alignments are challenging for likelihood-based phylogenetic inference programs and usually require using a powerful computer cluster. Current tools for alignment trimming prior to phylogenetic analysis do not promise a significant reduction in the alignment size and are claimed to have a negative effect on the accuracy of the obtained tree.
Results
Here, we propose an artificial-intelligence-based approach, which provides means to select the optimal subset of sites and a formula by which one can compute the log-likelihood of the entire data based on this subset. Our approach is based on training a regularized Lasso-regression model that optimizes the log-likelihood prediction accuracy while putting a constraint on the number of sites used for the approximation. We show that computing the likelihood based on 5% of the sites already provides accurate approximation of the tree likelihood based on the entire data. Furthermore, we show that using this Lasso-based approximation during a tree search decreased running-time substantially while retaining the same tree-search performance
State-of the art methodologies dictate new standards for phylogenetic analysis
The intention of this editorial is to steer researchers through methodological choices in molecular evolution, drawing on the combined expertise of the authors. Our aim is not to review the most advanced methods for a specific task. Rather, we define several general guidelines to help with methodology choices at different stages of a typical phylogenetic ‘pipeline’. We are not able to provide exhaustive citation of a literature that is vast and plentiful, but we point the reader to a set of classical textbooks that reflect the state-of-the-art. We do not wish to appear overly critical of outdated methodology but rather provide some practical guidance on the sort of issues which should be considered. We stress that a reported study should be well-motivated and evaluate a specific hypothesis or scientific question. However, a publishable study should not be merely a compilation of available sequences for a protein family of interest followed by some standard analyses, unless it specifically addresses a scientific hypothesis or question. The rapid pace at which sequence data accumulate quickly outdates such publications. Although clearly, discoveries stemming from data mining, reports of new tools and databases and review papers are also desirable
FastML: a web server for probabilistic reconstruction of ancestral sequences
Ancestral sequence reconstruction is essential to a variety of evolutionary studies. Here, we present the FastML web server, a user-friendly tool for the reconstruction of ancestral sequences. FastML implements various novel features that differentiate it from existing tools: (i) FastML uses an indel-coding method, in which each gap, possibly spanning multiples sites, is coded as binary data. FastML then reconstructs ancestral indel states assuming a continuous time Markov process. FastML provides the most likely ancestral sequences, integrating both indels and characters; (ii) FastML accounts for uncertainty in ancestral states: it provides not only the posterior probabilities for each character and indel at each sequence position, but also a sample of ancestral sequences from this posterior distribution, and a list of the k-most likely ancestral sequences; (iii) FastML implements a large array of evolutionary models, which makes it generic and applicable for nucleotide, protein and codon sequences; and (iv) a graphical representation of the results is provided, including, for example, a graphical logo of the inferred ancestral sequences. The utility of FastML is demonstrated by reconstructing ancestral sequences of the Env protein from various HIV-1 subtypes. FastML is freely available for all academic users and is available online at http://fastml.tau.ac.i
- …