656 research outputs found
Distributed BLAST in a grid computing context
The Basic Local Alignment Search Tool (BLAST) is one of the best known sequence comparison programs available in bioinformatics. It is used to compare query sequences to a set of target sequences, with the intention of finding similar sequences in the target set. Here, we present a distributed BLAST service which operates over a set of heterogeneous Grid resources and is made available through a Globus toolkit v.3 Grid service. This work has been carried out in the context of the BRIDGES project, a UK e-Science project aimed at providing a Grid based environment for biomedical research. Input consisting of multiple query sequences is partitioned into sub-jobs on the basis of the number of idle compute nodes available and then processed on these in batches. To achieve this, we have implemented our own Java-based scheduler which distributes sub-jobs across an array of resources utilizing a variety of local job scheduling systems
Simplified amino acid alphabets based on deviation of conditional probability from random background
The primitive data for deducing the Miyazawa-Jernigan contact energy or
BLOSUM score matrix consists of pair frequency counts. Each amino acid
corresponds to a conditional probability distribution. Based on the deviation
of such conditional probability from random background, a scheme for reduction
of amino acid alphabet is proposed. It is observed that evident discrepancy
exists between reduced alphabets obtained from raw data of the
Miyazawa-Jernigan's and BLOSUM's residue pair counts. Taking homologous
sequence database SCOP40 as a test set, we detect homology with the obtained
coarse-grained substitution matrices. It is verified that the reduced alphabets
obtained well preserve information contained in the original 20-letter
alphabet.Comment: 9 pages,3figure
A eubacterial origin for the human tRNA nucleotidyltransferase?
tRNA CCA-termini are generated and maintained by tRNA nucleotidyltransferases. Together with poly(A) polymerases and other enzymes they belong to the nucleotidyltransferase superfamily. However, sequence alignments within this family do not allow to distinguish between CCA-adding enzymes and poly(A) polymerases. Furthermore, due to the lack of sequence information about animal CCA-adding enzymes, identification of corresponding animal genes was not possible so far. Therefore, we looked for the human homolog using the baker's yeast tRNA nucleotidyltransferase as a query sequence in a BLAST search. This revealed that the human gene transcript CGI-47, (\#AF151805) deposited in GenBank is likely to encode such an enzyme. To identify the nature of this protein, the cDNA of the transcript was cloned and the recombinant protein biochemically characterized, indicating that CGI-47 encodes a bona fide CCA-adding enzyme and not a poly(A) polymerase. This confirmed animal CCA-adding enzyme allowed us to identify putative homologs from other animals. Calculation of a neighbor-joining tree, using an alignment of several CCA-adding enzymes, revealed that the animal enzymes resemble more eubacterial ones than eukaryotic plant and fungal tRNA nucleotidyltransferases, suggesting that the animal nuclear cca genes might have been derived from the endosymbiotic progenitor of mitochondria and are therefore of eubacterial origin
Efficient chaining of seeds in ordered trees
We consider here the problem of chaining seeds in ordered trees. Seeds are
mappings between two trees Q and T and a chain is a subset of non overlapping
seeds that is consistent with respect to postfix order and ancestrality. This
problem is a natural extension of a similar problem for sequences, and has
applications in computational biology, such as mining a database of RNA
secondary structures. For the chaining problem with a set of m constant size
seeds, we describe an algorithm with complexity O(m2 log(m)) in time and O(m2)
in space
SimSearch: A new variant of dynamic programming based on distance series for optimal and near-optimal similarity discovery in biological sequences
http://www.informatik.uni-trier.de/%7Eley/db/conf/iwpacbb/iwpacbb2008.htmlIn this paper, we propose SimSearch, an algorithm implementing a new variant of dynamic programming based on distance series for optimal and near-optimal similarity discovery in biological sequences. The initial phase of SimSearch is devoted to fulfil the binary similarity matrices by signalling the distances between occurrences of the same symbol. The scoring scheme is further applied, when analysed the maximal extension of the pattern. Employing bit parallelism to analyse the global similarity matrix’s upper triangle, the new methodology searches the sequence(s) for all the exact and approximate patterns in regular or reverse order. The algorithm accepts parameterization to work with greater seeds for near-optimal results. Performance tests show significant efficiency improvement over traditional optimal methods based on dynamic programming. Comparing the new algorithm’s efficiency against heuristic based methods, equalizing the required sensitivity, the proposed algorithm remains acceptable.This work has been partially supported by PRODEP
Generalized Buneman pruning for inferring the most parsimonious multi-state phylogeny
Accurate reconstruction of phylogenies remains a key challenge in
evolutionary biology. Most biologically plausible formulations of the problem
are formally NP-hard, with no known efficient solution. The standard in
practice are fast heuristic methods that are empirically known to work very
well in general, but can yield results arbitrarily far from optimal. Practical
exact methods, which yield exponential worst-case running times but generally
much better times in practice, provide an important alternative. We report
progress in this direction by introducing a provably optimal method for the
weighted multi-state maximum parsimony phylogeny problem. The method is based
on generalizing the notion of the Buneman graph, a construction key to
efficient exact methods for binary sequences, so as to apply to sequences with
arbitrary finite numbers of states with arbitrary state transition weights. We
implement an integer linear programming (ILP) method for the multi-state
problem using this generalized Buneman graph and demonstrate that the resulting
method is able to solve data sets that are intractable by prior exact methods
in run times comparable with popular heuristics. Our work provides the first
method for provably optimal maximum parsimony phylogeny inference that is
practical for multi-state data sets of more than a few characters.Comment: 15 page
Ty1 insertions in intergenic regions of the genome of Saccharomyces cerevisiae transcribed by RNA polymerase III have no detectable selective effect
The retrotransposon Ty1 of Saccharomyces cerevisiae inserts preferentially into intergenic regions in the vicinity of RNA polymerase III-transcribed genes. It has been suggested that this preference has evolved to minimize the deleterious effects of element transposition on the host genome, and thus to favor their evolutionary survival. This presupposes that such insertions have no selective effect. However, there has been no direct test of this hypothesis. Here we construct a series of strains containing single Ty1 insertions in the vicinity of tRNA genes, or in the rDNA cluster on chromosome XII, which are otherwise isogenic to strain 337, containing zero Ty1 elements. Competition experiments between 337 and the strains containing single Ty1 insertions revealed that in all cases, the Ty1 insertions have no selective effect in rich medium. These results are thus consistent with the hypothesis that the insertion site preference of Ty1 elements has evolved to minimize the deleterious effects of transposition on the host genome.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/72266/1/S1567-1356_03_00199-5.pd
Functional characterization of a melon alcohol acyl-transferase gene family involved in the biosynthesis of ester volatiles. Identification of the crucial role of a threonine residue for enzyme activity
Volatile esters, a major class of compounds contributing to the aroma of many fruit, are synthesized by
alcohol acyl-transferases (AAT). We demonstrate here that, in Charentais melon (Cucumis melo var.
cantalupensis), AAT are encoded by a gene family of at least four members with amino acid identity ranging
from 84% (Cm-AAT1/Cm-AAT2) and 58% (Cm-AAT1/Cm-AAT3) to only 22% (Cm-AAT1/Cm-AAT4).
All encoded proteins, except Cm-AAT2, were enzymatically active upon expression in yeast and show
differential substrate preferences. Cm-AAT1 protein produces a wide range of short and long-chain acyl
esters but has strong preference for the formation of E-2-hexenyl acetate and hexyl hexanoate. Cm-AAT3
also accepts a wide range of substrates but with very strong preference for producing benzyl acetate.
Cm-AAT4 is almost exclusively devoted to the formation of acetates, with strong preference for cinnamoyl
acetate. Site directed mutagenesis demonstrated that the failure of Cm-AAT2 to produce volatile esters is
related to the presence of a 268-alanine residue instead of threonine as in all active AAT proteins. Mutating
268-A into 268-T of Cm-AAT2 restored enzyme activity, while mutating 268-T into 268-A abolished
activity of Cm-AAT1. Activities of all three proteins measured with the prefered substrates sharply increase
during fruit ripening. The expression of all Cm-AAT genes is up-regulated during ripening and inhibited in
antisense ACC oxidase melons and in fruit treated with the ethylene antagonist 1-methylcyclopropene
(1-MCP), indicating a positive regulation by ethylene. The data presented in this work suggest that the
multiplicity of AAT genes accounts for the great diversity of esters formed in melon
- …