226 research outputs found

    Identifying and Seeing beyond Multiple Sequence Alignment Errors Using Intra-Molecular Protein Covariation

    Get PDF
    BACKGROUND: There is currently no way to verify the quality of a multiple sequence alignment that is independent of the assumptions used to build it. Sequence alignments are typically evaluated by a number of established criteria: sequence conservation, the number of aligned residues, the frequency of gaps, and the probable correct gap placement. Covariation analysis is used to find putatively important residue pairs in a sequence alignment. Different alignments of the same protein family give different results demonstrating that covariation depends on the quality of the sequence alignment. We thus hypothesized that current criteria are insufficient to build alignments for use with covariation analyses. METHODOLOGY/PRINCIPAL FINDINGS: We show that current criteria are insufficient to build alignments for use with covariation analyses as systematic sequence alignment errors are present even in hand-curated structure-based alignment datasets like those from the Conserved Domain Database. We show that current non-parametric covariation statistics are sensitive to sequence misalignments and that this sensitivity can be used to identify systematic alignment errors. We demonstrate that removing alignment errors due to 1) improper structure alignment, 2) the presence of paralogous sequences, and 3) partial or otherwise erroneous sequences, improves contact prediction by covariation analysis. Finally we describe two non-parametric covariation statistics that are less sensitive to sequence alignment errors than those described previously in the literature. CONCLUSIONS/SIGNIFICANCE: Protein alignments with errors lead to false positive and false negative conclusions (incorrect assignment of covariation and conservation, respectively). Covariation analysis can provide a verification step, independent of traditional criteria, to identify systematic misalignments in protein alignments. Two non-parametric statistics are shown to be somewhat insensitive to misalignment errors, providing increased confidence in contact prediction when analyzing alignments with erroneous regions because of an emphasis on they emphasize pairwise covariation over group covariation

    Administration of defined microbiota is protective in a murine Salmonella infection model.

    No full text
    Salmonella typhimurium is a major cause of diarrhea and causes significant morbidity and mortality worldwide, and perturbations of the gut microbiota are known to increase susceptibility to enteric infections. The purpose of this study was to investigate whether a Microbial Ecosystem Therapeutic (MET-1) consisting of 33 bacterial strains, isolated from human stool and previously used to cure patients with recurrent Clostridium difficile infection, could also protect against S. typhimurium disease. C57BL/6 mice were pretreated with streptomycin prior to receiving MET-1 or control, then gavaged with S. typhimurium. Weight loss, serum cytokine levels, and S. typhimurium splenic translocation were measured. NF-κB nuclear staining, neutrophil accumulation, and localization of tight junction proteins (claudin-1, ZO-1) were visualized by immunofluorescence. Infected mice receiving MET-1 lost less weight, had reduced serum cytokines, reduced NF-κB nuclear staining, and decreased neutrophil infiltration in the cecum. MET-1 also preserved cecum tight junction protein expression, and reduced S. typhimurium translocation to the spleen. Notably, MET-1 did not decrease CFUs of Salmonella in the intestine. MET-1 may attenuate systemic infection by preserving tight junctions, thereby inhibiting S. typhimurium from gaining access to the systemic circulation. We conclude that MET-1 may be protective against enteric infections besides C. difficile infection

    Unrelated Helpers in a Primitively Eusocial Wasp: Is Helping Tailored Towards Direct Fitness?

    Get PDF
    The paper wasp Polistes dominulus is unique among the social insects in that nearly one-third of co-foundresses are completely unrelated to the dominant individual whose offspring they help to rear and yet reproductive skew is high. These unrelated subordinates stand to gain direct fitness through nest inheritance, raising the question of whether their behaviour is adaptively tailored towards maximizing inheritance prospects. Unusually, in this species, a wealth of theory and empirical data allows us to predict how unrelated subordinates should behave. Based on these predictions, here we compare helping in subordinates that are unrelated or related to the dominant wasp across an extensive range of field-based behavioural contexts. We find no differences in foraging effort, defense behaviour, aggression or inheritance rank between unrelated helpers and their related counterparts. Our study provides no evidence, across a number of behavioural scenarios, that the behaviour of unrelated subordinates is adaptively modified to promote direct fitness interests

    Protein Sequence Alignment Analysis by Local Covariation: Coevolution Statistics Detect Benchmark Alignment Errors

    Get PDF
    The use of sequence alignments to understand protein families is ubiquitous in molecular biology. High quality alignments are difficult to build and protein alignment remains one of the largest open problems in computational biology. Misalignments can lead to inferential errors about protein structure, folding, function, phylogeny, and residue importance. Identifying alignment errors is difficult because alignments are built and validated on the same primary criteria: sequence conservation. Local covariation identifies systematic misalignments and is independent of conservation. We demonstrate an alignment curation tool, LoCo, that integrates local covariation scores with the Jalview alignment editor. Using LoCo, we illustrate how local covariation is capable of identifying alignment errors due to the reduction of positional independence in the region of misalignment. We highlight three alignments from the benchmark database, BAliBASE 3, that contain regions of high local covariation, and investigate the causes to illustrate these types of scenarios. Two alignments contain sequential and structural shifts that cause elevated local covariation. Realignment of these misaligned segments reduces local covariation; these alternative alignments are supported with structural evidence. We also show that local covariation identifies active site residues in a validated alignment of paralogous structures. Loco is available at https://sourceforge.net/projects/locoprotein/files

    Two-Stage Clustering (TSC): A Pipeline for Selecting Operational Taxonomic Units for the High-Throughput Sequencing of PCR Amplicons

    Get PDF
    Clustering 16S/18S rRNA amplicon sequences into operational taxonomic units (OTUs) is a critical step for the bioinformatic analysis of microbial diversity. Here, we report a pipeline for selecting OTUs with a relatively low computational demand and a high degree of accuracy. This pipeline is referred to as two-stage clustering (TSC) because it divides tags into two groups according to their abundance and clusters them sequentially. The more abundant group is clustered using a hierarchical algorithm similar to that in ESPRIT, which has a high degree of accuracy but is computationally costly for large datasets. The rarer group, which includes the majority of tags, is then heuristically clustered to improve efficiency. To further improve the computational efficiency and accuracy, two preclustering steps are implemented. To maintain clustering accuracy, all tags are grouped into an OTU depending on their pairwise Needleman-Wunsch distance. This method not only improved the computational efficiency but also mitigated the spurious OTU estimation from ‘noise’ sequences. In addition, OTUs clustered using TSC showed comparable or improved performance in beta-diversity comparisons compared to existing OTU selection methods. This study suggests that the distribution of sequencing datasets is a useful property for improving the computational efficiency and increasing the clustering accuracy of the high-throughput sequencing of PCR amplicons. The software and user guide are freely available at http://hwzhoulab.smu.edu.cn/paperdata/

    A Coevolutionary Residue Network at the Site of a Functionally Important Conformational Change in a Phosphohexomutase Enzyme Family

    Get PDF
    Coevolution analyses identify residues that co-vary with each other during evolution, revealing sequence relationships unobservable from traditional multiple sequence alignments. Here we describe a coevolutionary analysis of phosphomannomutase/phosphoglucomutase (PMM/PGM), a widespread and diverse enzyme family involved in carbohydrate biosynthesis. Mutual information and graph theory were utilized to identify a network of highly connected residues with high significance. An examination of the most tightly connected regions of the coevolutionary network reveals that most of the involved residues are localized near an interdomain interface of this enzyme, known to be the site of a functionally important conformational change. The roles of four interface residues found in this network were examined via site-directed mutagenesis and kinetic characterization. For three of these residues, mutation to alanine reduces enzyme specificity to ∼10% or less of wild-type, while the other has ∼45% activity of wild-type enzyme. An additional mutant of an interface residue that is not densely connected in the coevolutionary network was also characterized, and shows no change in activity relative to wild-type enzyme. The results of these studies are interpreted in the context of structural and functional data on PMM/PGM. Together, they demonstrate that a network of coevolving residues links the highly conserved active site with the interdomain conformational change necessary for the multi-step catalytic reaction. This work adds to our understanding of the functional roles of coevolving residue networks, and has implications for the definition of catalytically important residues

    BioPhysConnectoR: Connecting Sequence Information and Biophysical Models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>One of the most challenging aspects of biomolecular systems is the understanding of the coevolution in and among the molecule(s).</p> <p>A complete, theoretical picture of the selective advantage, and thus a functional annotation, of (co-)mutations is still lacking. Using sequence-based and information theoretical inspired methods we can identify coevolving residues in proteins without understanding the underlying biophysical properties giving rise to such coevolutionary dynamics. Detailed (atomistic) simulations are prohibitively expensive. At the same time reduced molecular models are an efficient way to determine the reduced dynamics around the native state. The combination of sequence based approaches with such reduced models is therefore a promising approach to annotate evolutionary sequence changes.</p> <p>Results</p> <p>With the <monospace>R</monospace> package <monospace>BioPhysConnectoR</monospace> we provide a framework to connect the information theoretical domain of biomolecular sequences to biophysical properties of the encoded molecules - derived from reduced molecular models. To this end we have integrated several fragmented ideas into one single package ready to be used in connection with additional statistical routines in <monospace>R</monospace>. Additionally, the package leverages the power of modern multi-core architectures to reduce turn-around times in evolutionary and biomolecular design studies. Our package is a first step to achieve the above mentioned annotation of coevolution by reduced dynamics around the native state of proteins.</p> <p>Conclusions</p> <p><monospace>BioPhysConnectoR</monospace> is implemented as an <monospace>R</monospace> package and distributed under GPL 2 license. It allows for efficient and perfectly parallelized functional annotation of coevolution found at the sequence level.</p

    Systematic Dissection and Trajectory-Scanning Mutagenesis of the Molecular Interface That Ensures Specificity of Two-Component Signaling Pathways

    Get PDF
    Two-component signal transduction systems enable bacteria to sense and respond to a wide range of environmental stimuli. Sensor histidine kinases transmit signals to their cognate response regulators via phosphorylation. The faithful transmission of information through two-component pathways and the avoidance of unwanted cross-talk require exquisite specificity of histidine kinase-response regulator interactions to ensure that cells mount the appropriate response to external signals. To identify putative specificity-determining residues, we have analyzed amino acid coevolution in two-component proteins and identified a set of residues that can be used to rationally rewire a model signaling pathway, EnvZ-OmpR. To explore how a relatively small set of residues can dictate partner selectivity, we combined alanine-scanning mutagenesis with an approach we call trajectory-scanning mutagenesis, in which all mutational intermediates between the specificity residues of EnvZ and another kinase, RstB, were systematically examined for phosphotransfer specificity. The same approach was used for the response regulators OmpR and RstA. Collectively, the results begin to reveal the molecular mechanism by which a small set of amino acids enables an individual kinase to discriminate amongst a large set of highly-related response regulators and vice versa. Our results also suggest that the mutational trajectories taken by two-component signaling proteins following gene or pathway duplication may be constrained and subject to differential selective pressures. Only some trajectories allow both the maintenance of phosphotransfer and the avoidance of unwanted cross-talk

    Slip-Sliding Away: Serial Changes and Homoplasy in Repeat Number in the Drosophila yakuba Homolog of Human Cancer Susceptibility Gene BRCA2

    Get PDF
    Several recent studies have examined the function and evolution of a Drosophila homolog to the human breast cancer susceptibility gene BRCA2, named dmbrca2. We previously identified what appeared to be a recent expansion in the RAD51-binding BRC-repeat array in the ancestor of Drosophila yakuba. In this study, we examine patterns of variation and evolution of the dmbrca2 BRC-repeat array within D. yakuba and its close relatives. We develop a model of how unequal crossing over may have produced the expanded form, but we also observe short repeat forms, typical of other species in the D. melanogaster group, segregating within D. yakuba and D. santomea. These short forms do not appear to be identical-by-descent, suggesting that the history of dmbrca2 in the D. melanogaster subgroup has involved repeat unit contractions resulting in homoplasious forms. We conclude that the evolutionary history of dmbrca2 in D. yakuba and perhaps in other Drosophila species may be more complicated than can be inferred from examination of the published single genome sequences per species

    P-Element Homing Is Facilitated by engrailed Polycomb-Group Response Elements in Drosophila melanogaster

    Get PDF
    P-element vectors are commonly used to make transgenic Drosophila and generally insert in the genome in a nonselective manner. However, when specific fragments of regulatory DNA from a few Drosophila genes are incorporated into P-transposons, they cause the vectors to be inserted near the gene from which the DNA fragment was derived. This is called P-element homing. We mapped the minimal DNA fragment that could mediate homing to the engrailed/invected region of the genome. A 1.6 kb fragment of engrailed regulatory DNA that contains two Polycomb-group response elements (PREs) was sufficient for homing. We made flies that contain a 1.5kb deletion of engrailed DNA (enΔ1.5) in situ, including the PREs and the majority of the fragment that mediates homing. Remarkably, homing still occurs onto the enΔ1. 5 chromosome. In addition to homing to en, P[en] inserts near Polycomb group target genes at an increased frequency compared to P[EPgy2], a vector used to generate 18,214 insertions for the Drosophila gene disruption project. We suggest that homing is mediated by interactions between multiple proteins bound to the homing fragment and proteins bound to multiple areas of the engrailed/invected chromatin domain. Chromatin structure may also play a role in homing
    corecore