7,464 research outputs found

    Pairwise alignment incorporating dipeptide covariation

    Full text link
    Motivation: Standard algorithms for pairwise protein sequence alignment make the simplifying assumption that amino acid substitutions at neighboring sites are uncorrelated. This assumption allows implementation of fast algorithms for pairwise sequence alignment, but it ignores information that could conceivably increase the power of remote homolog detection. We examine the validity of this assumption by constructing extended substitution matrixes that encapsulate the observed correlations between neighboring sites, by developing an efficient and rigorous algorithm for pairwise protein sequence alignment that incorporates these local substitution correlations, and by assessing the ability of this algorithm to detect remote homologies. Results: Our analysis indicates that local correlations between substitutions are not strong on the average. Furthermore, incorporating local substitution correlations into pairwise alignment did not lead to a statistically significant improvement in remote homology detection. Therefore, the standard assumption that individual residues within protein sequences evolve independently of neighboring positions appears to be an efficient and appropriate approximation

    CATHEDRAL: A Fast and Effective Algorithm to Predict Folds and Domain Boundaries from Multidomain Protein Structures

    Get PDF
    We present CATHEDRAL, an iterative protocol for determining the location of previously observed protein folds in novel multidomain protein structures. CATHEDRAL builds on the features of a fast secondary-structure–based method (using graph theory) to locate known folds within a multidomain context and a residue-based, double-dynamic programming algorithm, which is used to align members of the target fold groups against the query protein structure to identify the closest relative and assign domain boundaries. To increase the fidelity of the assignments, a support vector machine is used to provide an optimal scoring scheme. Once a domain is verified, it is excised, and the search protocol is repeated in an iterative fashion until all recognisable domains have been identified. We have performed an initial benchmark of CATHEDRAL against other publicly available structure comparison methods using a consensus dataset of domains derived from the CATH and SCOP domain classifications. CATHEDRAL shows superior performance in fold recognition and alignment accuracy when compared with many equivalent methods. If a novel multidomain structure contains a known fold, CATHEDRAL will locate it in 90% of cases, with <1% false positives. For nearly 80% of assigned domains in a manually validated test set, the boundaries were correctly delineated within a tolerance of ten residues. For the remaining cases, previously classified domains were very remotely related to the query chain so that embellishments to the core of the fold caused significant differences in domain sizes and manual refinement of the boundaries was necessary. To put this performance in context, a well-established sequence method based on hidden Markov models was only able to detect 65% of domains, with 33% of the subsequent boundaries assigned within ten residues. Since, on average, 50% of newly determined protein structures contain more than one domain unit, and typically 90% or more of these domains are already classified in CATH, CATHEDRAL will considerably facilitate the automation of protein structure classification

    The aspartic proteinase family of three Phytophthora species

    Get PDF
    Background - Phytophthora species are oomycete plant pathogens with such major social and economic impact that genome sequences have been determined for Phytophthora infestans, P. sojae and P. ramorum. Pepsin-like aspartic proteinases (APs) are produced in a wide variety of species (from bacteria to humans) and contain conserved motifs and landmark residues. APs fulfil critical roles in infectious organisms and their host cells. Annotation of Phytophthora APs would provide invaluable information for studies into their roles in the physiology of Phytophthora species and interactions with their hosts. Results - Genomes of Phytophthora infestans, P. sojae and P. ramorum contain 11-12 genes encoding APs. Nine of the original gene models in the P. infestans database and several in P. sojae and P. ramorum (three and four, respectively) were erroneous. Gene models were corrected on the basis of EST data, consistent positioning of introns between orthologues and conservation of hallmark motifs. Phylogenetic analysis resolved the Phytophthora APs into 5 clades. Of the 12 sub-families, several contained an unconventional architecture, as they either lacked a signal peptide or a propart region. Remarkably, almost all APs are predicted to be membrane-bound. Conclusions - One of the twelve Phytophthora APs is an unprecedented fusion protein with a putative G-protein coupled receptor as the C-terminal partner. The others appear to be related to well-documented enzymes from other species, including a vacuolar enzyme that is encoded in every fungal genome sequenced to date. Unexpectedly, however, the oomycetes were found to have both active and probably-inactive forms of an AP similar to vertebrate BACE, the enzyme responsible for initiating the processing cascade that generates the AĂź peptide central to Alzheimer's Disease. The oomycetes also encode enzymes similar to plasmepsin V, a membrane-bound AP that cleaves effector proteins of the malaria parasite Plasmodium falciparum during their translocation into the host red blood cell. Since the translocation of Phytophthora effector proteins is currently a topic of intense research activity, the identification in Phytophthora of potential functional homologues of plasmepsin V would appear worthy of investigation. Indeed, elucidation of the physiological roles of the APs identified here offers areas for future study. The significant revision of gene models and detailed annotation presented here should significantly facilitate experimental design

    Introduction to Protein Structure Prediction

    Get PDF
    This chapter gives a graceful introduction to problem of protein three- dimensional structure prediction, and focuses on how to make structural sense out of a single input sequence with unknown structure, the 'query' or 'target' sequence. We give an overview of the different classes of modelling techniques, notably template-based and template free. We also discuss the way in which structural predictions are validated within the global com- munity, and elaborate on the extent to which predicted structures may be trusted and used in practice. Finally we discuss whether the concept of a sin- gle fold pertaining to a protein structure is sustainable given recent insights. In short, we conclude that the general protein three-dimensional structure prediction problem remains unsolved, especially if we desire quantitative predictions. However, if a homologous structural template is available in the PDB model or reasonable to high accuracy may be generated

    Recognition of short functional motifs in protein sequences

    Get PDF
    The main goal of this study was to develop a method for computational de novo prediction of short linear motifs (SLiMs) in protein sequences that would provide advantages over existing solutions for the users. The users are typically biological laboratory researchers, who want to elucidate the function of a protein that is possibly mediated by a short motif. Such a process can be subcellular localization, secretion, post-translational modification or degradation of proteins. Conducting such studies only with experimental techniques is often associated with high costs and risks of uncertainty. Preliminary prediction of putative motifs with computational methods, them being fast and much less expensive, provides possibilities for generating hypotheses and therefore, more directed and efficient planning of experiments. To meet this goal, I have developed HH-MOTiF – a web-based tool for de novo discovery of SLiMs in a set of protein sequences. While working on the project, I have also detected patterns in sequence properties of certain SLiMs that make their de novo prediction easier. As some of these patterns are not yet described in the literature, I am sharing them in this thesis. While evaluating and comparing motif prediction results, I have identified conceptual gaps in theoretical studies, as well as existing practical solutions for comparing two sets of positional data annotating the same set of biological sequences. To close this gap and to be able to carry out in-depth performance analyses of HH-MOTiF in comparison to other predictors, I have developed a corresponding statistical method, SLALOM (for StatisticaL Analysis of Locus Overlap Method). It is currently available as a standalone command line tool
    • …
    corecore