2,864 research outputs found

    Efficient seeding techniques for protein similarity search

    Get PDF
    We apply the concept of subset seeds proposed in [1] to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets.We then perform an analysis of seeds built over those alphabet and compare them with the standard Blastp seeding method [2,3], as well as with the family of vector seeds proposed in [4]. While the formalism of subset seed is less expressive (but less costly to implement) than the accumulative principle used in Blastp and vector seeds, our seeds show a similar or even better performance than Blastp on Bernoulli models of proteins compatible with the common BLOSUM62 matrix

    Efficient seeding techniques for protein similarity search

    Get PDF
    We apply the concept of subset seeds proposed in [1] to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets.We then perform an analysis of seeds built over those alphabet and compare them with the standard Blastp seeding method [2,3], as well as with the family of vector seeds proposed in [4]. While the formalism of subset seed is less expressive (but less costly to implement) than the accumulative principle used in Blastp and vector seeds, our seeds show a similar or even better performance than Blastp on Bernoulli models of proteins compatible with the common BLOSUM62 matrix

    Characterization of three 2-hydroxy-acid dehydrogenases in the context of a biotechnological approach to short-circuit photorespiration

    Get PDF
    Photorespiration results from the incorporation of oxygen into ribulose-1,5-bisphosphate due to the failure of RuBisCO to properly discriminate between oxygen and carbon dioxide. This process lowers photosynthetic efficiency in that CO2 and ammonia should be re-assimilated with the concomitant consumption of both ATP and reducing power. Two recent approaches, aimed at decreasing the detrimental effects of photorespiration by introducing novel metabolic pathways into plant chloroplasts, show great promise. The goal of this work was to identify and biochemically characterize a single-gene glycolate dehydrogenase for use in further improving the synthetic pathways. Forward and reverse genetics were used to identify three candidate genes in Arabidopsis thaliana; At5g06580, At4g36400 and At4g18360. The proteins encoded by these genes were expressed in Escherichia coli, purified and characterized. Moreover, in silico analysis and the analysis of loss-of-function mutants yielded insights into the significance of these novel enzymatic activities in plant metabolism. AtD-LDH, encoded by At5g06580, is a homodimeric FAD-binding flavoprotein that catalyzes the cytochrome c- dependent oxidation of substrates. The enzyme has high activity with D- and L-lactate, D-2-hydroxybutyrate and D-glycerate, but of these only D-lactate and D-2-hydroxybutyrate are bound with high affinity. Knock-out mutants show impaired growth on medium containing methylglyoxal and D-lactate. Together, the data indicates a role for AtD-LDH in the mitochondrial intermembrane space where it oxidizes D-lactate to pyruvate in the final step of methylglyoxal detoxification. AtD-2HGDH, encoded by At4g36400, is a homodimeric FAD-binding flavoprotein. The enzyme only has activity with D-2-hydroxyglutarate and uses a synthetic electron acceptor in vitro. Metabolic analysis of knock-out mutants reveals high accumulation of D-2-hydroxyglutarate in plants exposed to long periods of extended darkness, confirming that this is the in vivo substrate for the enzyme. Co-expression analysis reveals that AtD-2HGDH is co-expressed with enzymes and transporters participating in the breakdown of lipids, branched-chain amino acids and chlorophyll, all pathways that converge in the production of propionyl-CoA. Together, the data suggest a role for AtD-2HGDH in the mitochondrial matrix where it oxidizes D-2-hydroxyglutarate, most probably originating from propionyl-CoA metabolism, to 2-oxoglutarate, using an electron transfer flavoprotein as an electron acceptor. Finally, AtGOX3, encoded by At4g18360, is a peroxisomal (S)-2-hydroxy-acid oxidase with specificity towards glycolate, L-lactate and L-2-hydroxybutyrate. AtGOX3 is almost exclusively expressed in roots where it might participate in either the metabolism of L-lactate produced during hypoxia, or glycolate produced from glycolaldehyde. In this work, the identification and thorough characterization of three novel enzymatic activities in the model plant A. thaliana are described. Moreover, novel plant metabolic pathways in which these enzymes participate were discovered. The biochemical characterization of these enzymes indicated that they are not suited for use in pathways aimed at decreasing photorespiration and thus, the search for a single-gene glycolate dehydrogenase should continue

    Improvements on Seeding Based Protein Sequence Similarity Search

    Get PDF
    The primary goal of bioinformatics is to increase an understanding in the biology of organisms. Computational, statistical, and mathematical theories and techniques have been developed on formal and practical problems that assist to achieve this primary goal. For the past three decades, the primary application of bioinformatics has been biological data analysis. The DNA or protein sequence similarity search is perhaps the most common, yet vitally important task for analyzing biological data. The sequence similarity search is a process of finding optimal sequence alignments. On the theoretical level, the problem of sequence similarity search is complex. On the applicational level, the sequences similarity search onto a biological database has been one of the most basic tasks today. Using traditional quadratic time complexity solutions becomes a challenge due to the size of the database. Seeding (or filtration) based approaches, which trade sensitivity for speed, are a popular choice among those available. Two main phases usually exist in a seeding based approach. The first phase is referred to as the hit generation, and the second phase is referred to as the hit extension. In this thesis, two improvements on the seeding based protein sequence similarity search are presented. First, for the hit generation, a new seeding idea, namely spaced k-mer neighbors, is presented. We present our effective algorithms to find a good set of spaced k-mer neighbors. Secondly, for the hit generation, a new method, namely HexFilter, is proposed to reduce the number of hit extensions while achieving better selectivity. We show our HexFilters with optimized configurations

    On subset seeds for protein alignment

    Get PDF
    We apply the concept of subset seeds proposed in [1] to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets. We then perform a comparative analysis of seeds built over those alphabets and compare them with the standard BLASTP seeding method [2], [3], as well as with the family of vector seeds proposed in [4]. While the formalism of subset seeds is less expressive (but less costly to implement) than the cumulative principle used in BLASTP and vector seeds, our seeds show a similar or even better performance than BLASTP on Bernoulli models of proteins compatible with the common BLOSUM62 matrix. Finally, we perform a large-scale benchmarking of our seeds against several main databases of protein alignments. Here again, the results show a comparable or better performance of our seeds vs. BLASTP.Comment: IEEE/ACM Transactions on Computational Biology and Bioinformatics (2009

    Two highly divergent alcohol dehydrogenases of melon exhibit fruit ripening-specific expression and distinct biochemical characteristics

    Get PDF
    Alcohol dehydrogenases (ADH) participate in the biosynthetic pathway of aroma volatiles in fruit by interconverting aldehydes to alcohols and providing substrates for the formation of esters. Two highly divergent ADH genes (15% identity at the amino acid level) of Cantaloupe Charentais melon (Cucumis melo var. Cantalupensis) have been isolated. Cm-ADH1 belongs to the medium-chain zinc-binding type of ADHs and is highly similar to all ADH genes expressed in fruit isolated so far. Cm-ADH2 belongs to the short-chain type of ADHs. The two encoded proteins are enzymatically active upon expression in yeast. Cm-ADH1 has strong preference for NAPDH as a co-factor, whereas Cm-ADH2 preferentially uses NADH. Both Cm-ADH proteins are much more active as reductases with Kms 10–20 times lower for the conversion of aldehydes to alcohols than for the dehydrogenation of alcohols to aldehydes. They both show strong preference for aliphatic aldehydes but Cm-ADH1 is capable of reducing branched aldehydes such as 3-methylbutyraldehyde, whereas Cm-ADH2 cannot. Both Cm-ADH genes are expressed specifically in fruit and up-regulated during ripening. Gene expression as well as total ADH activity are strongly inhibited in antisense ACC oxidase melons and in melon fruit treated with the ethylene antagonist 1-methylcyclopropene (1-MCP), indicating a positive regulation by ethylene. These data suggest that each of the Cm-ADH protein plays a specific role in the regulation of aroma biosynthesis in melon fruit

    Optimal neighborhood indexing for protein similarity search

    Get PDF
    Background: Similarity inference, one of the main bioinformatics tasks, has to face an exponential growth of the biological data. A classical approach used to cope with this data flow involves heuristics with large seed indexes. In order to speed up this technique, the index can be enhanced by storing additional information to limit the number of random memory accesses. However, this improvement leads to a larger index that may become a bottleneck. In the case of protein similarity search, we propose to decrease the index size by reducing the amino acid alphabet.\ud \ud Results: The paper presents two main contributions. First, we show that an optimal neighborhood indexing combining an alphabet reduction and a longer neighborhood leads to a reduction of 35% of memory involved into the process, without sacrificing the quality of results nor the computational time. Second, our approach led us to develop a new kind of substitution score matrices and their associated e-value parameters. In contrast to usual matrices, these matrices are rectangular since they compare amino acid groups from different alphabets. We describe the method used for computing those matrices and we provide some typical examples that can be used in such comparisons. Supplementary data can be found on the website http://bioinfo.lifl.fr/reblosum.\ud \ud Conclusions: We propose a practical index size reduction of the neighborhood data, that does not negatively affect the performance of large-scale search in protein sequences. Such an index can be used in any study involving large protein data. Moreover, rectangular substitution score matrices and their associated statistical parameters can have applications in any study involving an alphabet reduction
    corecore