248 research outputs found

    GUIDANCE: a web server for assessing alignment confidence scores

    Get PDF
    Evaluating the accuracy of multiple sequence alignment (MSA) is critical for virtually every comparative sequence analysis that uses an MSA as input. Here we present the GUIDANCE web-server, a user-friendly, open access tool for the identification of unreliable alignment regions. The web-server accepts as input a set of unaligned sequences. The server aligns the sequences and provides a simple graphic visualization of the confidence score of each column, residue and sequence of an alignment, using a color-coding scheme. The method is generic and the user is allowed to choose the alignment algorithm (ClustalW, MAFFT and PRANK are supported) as well as any type of molecular sequences (nucleotide, protein or codon sequences). The server implements two different algorithms for evaluating confidence scores: (i) the heads-or-tails (HoT) method, which measures alignment uncertainty due to co-optimal solutions; (ii) the GUIDANCE method, which measures the robustness of the alignment to guide-tree uncertainty. The server projects the confidence scores onto the MSA and points to columns and sequences that are unreliably aligned. These can be automatically removed in preparation for downstream analyses. GUIDANCE is freely available for use at http://guidance.tau.ac.il

    TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations

    Get PDF
    We present TranslatorX, a web server designed to align protein-coding nucleotide sequences based on their corresponding amino acid translations. Many comparisons between biological sequences (nucleic acids and proteins) involve the construction of multiple alignments. Alignments represent a statement regarding the homology between individual nucleotides or amino acids within homologous genes. As protein-coding DNA sequences evolve as triplets of nucleotides (codons) and it is known that sequence similarity degrades more rapidly at the DNA than at the amino acid level, alignments are generally more accurate when based on amino acids than on their corresponding nucleotides. TranslatorX novelties include: (i) use of all documented genetic codes and the possibility of assigning different genetic codes for each sequence; (ii) a battery of different multiple alignment programs; (iii) translation of ambiguous codons when possible; (iv) an innovative criterion to clean nucleotide alignments with GBlocks based on protein information; and (v) a rich output, including Jalview-powered graphical visualization of the alignments, codon-based alignments coloured according to the corresponding amino acids, measures of compositional bias and first, second and third codon position specific alignments. The TranslatorX server is freely available at http://translatorx.co.uk

    The genome of the protozoan parasite Cystoisospora suis and a reverse vaccinology approach to identify vaccine candidates

    Get PDF
    Vaccine development targeting protozoan parasites remains challenging, partly due to the complex interactions between these eukaryotes and the host immune system. Reverse vaccinology is a promising approach for direct screening of genome sequence assemblies for new vaccine candidate proteins. Here, we applied this paradigm to Cystoisospora suis, an apicomplexan parasite that causes enteritis and diarrhea in suckling piglets and economic losses in pig production worldwide. Using Next Generation Sequencing we produced an ∼84 Mb sequence assembly for the C. suis genome, making it the first available reference for the genus Cystoisospora. Then, we derived a manually curated annotation of more than 11,000 protein-coding genes and applied the tool Vacceed to identify 1,168 vaccine candidates by screening the predicted C. suis proteome. To refine the set of candidates, we looked at proteins that are highly expressed in merozoites and specific to apicomplexans. The stringent set of candidates included 220 proteins, among which were 152 proteins with unknown function, 17 surface antigens of the SAG and SRS gene families, 12 proteins of the apicomplexan-specific secretory organelles including AMA1, MIC6, MIC13, ROP6, ROP12, ROP27, ROP32 and three proteins related to cell adhesion. Finally, we demonstrated in vitro the immunogenic potential of a C. suis-specific 42 kDa transmembrane protein, which might constitute an attractive candidate for further testing

    Evidence for Centromere Drive in the Holocentric Chromosomes of Caenorhabditis

    Get PDF
    In monocentric organisms with asymmetric meiosis, the kinetochore proteins, such as CENH3 and CENP-C, evolve adaptively to counterbalance the deleterious effects of centromere drive, which is caused by the expansion of centromeric satellite repeats. The selection regimes that act on CENH3 and CENP-C genes have not been analyzed in organisms with holocentric chromosomes, although holocentrism is speculated to have evolved to suppress centromere drive. We tested both CENH3 and CENP-C for positive selection in several species of the holocentric genus Caenorhabditis using the maximum likelihood approach and sliding-window analysis. Although CENP-C did not show any signs of positive selection, positive selection has been detected in the case of CENH3. These results support the hypothesis that centromere drive occurs in Nematoda, at least in the telokinetic meiosis of Caenorhabditis

    CO-phylum: An Assembly-Free Phylogenomic Approach for Close Related Organisms

    Full text link
    Phylogenomic approaches developed thus far are either too time-consuming or lack a solid evolutionary basis. Moreover, no phylogenomic approach is capable of constructing a tree directly from unassembled raw sequencing data. A new phylogenomic method, CO-phylum, is developed to alleviate these flaws. CO-phylum can generate a high-resolution and highly accurate tree using complete genome or unassembled sequencing data of close related organisms, in addition, CO-phylum distance is almost linear with p-distance.Comment: 21 pages, 6 figure

    Population gene introgression and high genome plasticity for the zoonotic pathogen Streptococcus agalactiae

    Get PDF
    The influence that bacterial adaptation (or niche partitioning) within species has on gene spillover and transmission among bacteria populations occupying different niches is not well understood. Streptococcus agalactiae is an important bacterial pathogen that has a taxonomically diverse host range making it an excellent model system to study these processes. Here we analyze a global set of 901 genome sequences from nine diverse host species to advance our understanding of these processes. Bayesian clustering analysis delineated twelve major populations that closely aligned with niches. Comparative genomics revealed extensive gene gain/loss among populations and a large pan-genome of 9,527 genes, which remained open and was strongly partitioned among niches. As a result, the biochemical characteristics of eleven populations were highly distinctive (significantly enriched). Positive selection was detected and biochemical characteristics of the dispensable genes under selection were enriched in ten populations. Despite the strong gene partitioning, phylogenomics detected gene spillover. In particular, tetracycline resistance (which likely evolved in the human-associated population) from humans to bovine, canines, seals, and fish, demonstrating how a gene selected in one host can ultimately be transmitted into another, and biased transmission from humans to bovines was confirmed with a Bayesian migration analysis. Our findings show high bacterial genome plasticity acting in balance with selection pressure from distinct functional requirements of niches that is associated with an extensive and highly partitioned dispensable genome, likely facilitating continued and expansive adaptation

    Gene Promoter Evolution Targets the Center of the Human Protein Interaction Network

    Get PDF
    Assessing the contribution of promoters and coding sequences to gene evolution is an important step toward discovering the major genetic determinants of human evolution. Many specific examples have revealed the evolutionary importance of cis-regulatory regions. However, the relative contribution of regulatory and coding regions to the evolutionary process and whether systemic factors differentially influence their evolution remains unclear. To address these questions, we carried out an analysis at the genome scale to identify signatures of positive selection in human proximal promoters. Next, we examined whether genes with positively selected promoters (Prom+ genes) show systemic differences with respect to a set of genes with positively selected protein-coding regions (Cod+ genes). We found that the number of genes in each set was not significantly different (8.1% and 8.5%, respectively). Furthermore, a functional analysis showed that, in both cases, positive selection affects almost all biological processes and only a few genes of each group are located in enriched categories, indicating that promoters and coding regions are not evolutionarily specialized with respect to gene function. On the other hand, we show that the topology of the human protein network has a different influence on the molecular evolution of proximal promoters and coding regions. Notably, Prom+ genes have an unexpectedly high centrality when compared with a reference distribution (P = 0.008, for Eigenvalue centrality). Moreover, the frequency of Prom+ genes increases from the periphery to the center of the protein network (P = 0.02, for the logistic regression coefficient). This means that gene centrality does not constrain the evolution of proximal promoters, unlike the case with coding regions, and further indicates that the evolution of proximal promoters is more efficient in the center of the protein network than in the periphery. These results show that proximal promoters have had a systemic contribution to human evolution by increasing the participation of central genes in the evolutionary process

    Exploiting CpG Hypermutability to Identify Phenotypically Significant Variation Within Human Protein-Coding Genes

    Get PDF
    The CpG dinucleotide is disproportionately represented in human genetic variation due to the hypermutability of 5-methyl-cytosine (5mC). We exploit this hypermutability and a novel codon substitution model to identify candidate functionally important exonic nucleotides. Population genetic theory suggests that codon positions with high cross-species CpG frequency will derive from stronger purifying selection. Using the phylogeny-based maximum likelihood inference framework, we applied codon substitution models with context-dependent parameters to measure the mutagenic and selective processes affecting CpG dinucleotides within exonic sequence. The suitability of these models was validated on >2,000 protein coding genes from a naturally occurring biological control, four yeast species that do not methylate their DNA. As expected, our analyses of yeast revealed no evidence for an elevated CpG transition rate or for substitution suppression affecting CpG-containing codons. Our analyses of >12,000 protein-coding genes from four primate lineages confirm the systemic influence of 5mC hypermutability on the divergence of these genes. After adjusting for confounding influences of mutation and the properties of the encoded amino acids, we confirmed that CpG-containing codons are under greater purifying selection in primates. Genes with significant evidence of enhanced suppression of nonsynonymous CpG changes were also shown to be significantly enriched in Online Mendelian Inheritance in Man. We developed a method for ranking candidate phenotypically influential CpG positions in human genes. Application of this method indicates that of the ∼1 million exonic CpG dinucleotides within humans, ∼20% are strong candidates for both hypermutability and disease association

    Accounting For Alignment Uncertainty in Phylogenomics

    Get PDF
    Uncertainty in multiple sequence alignments has a large impact on phylogenetic analyses. Little has been done to evaluate the quality of individual positions in protein sequence alignments, which directly impact the accuracy of phylogenetic trees. Here we describe ZORRO, a probabilistic masking program that accounts for alignment uncertainty by assigning confidence scores to each alignment position. Using the BALIBASE database and in simulation studies, we demonstrate that masking by ZORRO significantly reduces the alignment uncertainty and improves the tree accuracy

    Alignment-Free Phylogenetic Reconstruction

    Get PDF
    14th Annual International Conference, RECOMB 2010, Lisbon, Portugal, April 25-28, 2010. ProceedingsWe introduce the first polynomial-time phylogenetic reconstruction algorithm under a model of sequence evolution allowing insertions and deletions (or indels). Given appropriate assumptions, our algorithm requires sequence lengths growing polynomially in the number of leaf taxa. Our techniques are distance-based and largely bypass the problem of multiple alignment
    corecore