2,014 research outputs found

    RegExpBlasting (REB), a Regular Expression Blasting algorithm based on multiply aligned sequences

    Get PDF
    Background: One of the most frequent uses of bioinformatics tools concerns functional characterization of a newly produced nucleotide sequence (a query sequence) by applying Blast or FASTA against a set of sequences (the subject sequences). However, in some specific contexts, it is useful to compare the query sequence against a cluster such as a MultiAlignment (MA). We present here the RegExpBlasting (REB) algorithm, which compares an unclassified sequence with a dataset of patterns defined by application of Regular Expression rules to a given-as-input MA datasets. The REB algorithm workflow consists in i. the definition of a dataset of multialignments ii. the association of each MA to a pattern, defined by application of regular expression rules; iii. automatic characterization of a submitted biosequence according to the function of the sequences described by the pattern best matching the query sequence. Results: An application of this algorithm is used in the "characterize your sequence" tool available in the PPNEMA resource. PPNEMA is a resource of Ribosomal Cistron sequences from various species, grouped according to nematode genera. It allows the retrieval of plant nematode multialigned sequences or the classification of new nematode rDNA sequences by applying REB. The same algorithm also supports automatic updating of the PPNEMA database. The present paper gives examples of the use of REB within PPNEMA. Conclusion: The use of REB in PPNEMA updating, the PPNEMA "characterize your sequence" option clearly demonstrates the power of the method. Using REB can also rapidly solve any other bioinformatics problem, where the addition of a new sequence to a pre-existing cluster is required. The statistical tests carried out here show the powerful flexibility of the method

    Genetic and biochemical analyses of chromosome and plasmid gene homologues encoding ICL and ArCP domains in Vibrioanguillarum strain 775

    Get PDF
    Anguibactin, the siderophore produced by Vibrio anguillarum 775 is synthesized from 2,3-dihydroxybenzoic acid (DHBA), cysteine and hydroxyhistamine via a nonribosomal peptide synthetase (NRPS) mechanism. Most of the genes encoding anguibactin biosynthetic proteins are harbored by the pJM1 plasmid. In this work we report the identification of a homologue of the plasmid-encoded angB on the chromosome of strain 775. The product of both genes harbor an isochorismate lyase (ICL) domain that converts isochorismic acid to 2,3-dihydro-2,3-dihydroxybenzoic acid, one of the steps of DHBA synthesis. We show in this work that both ICL domains are functional in the production of DHBA in V. anguillarum as well as in E. coli. Substitution by alanine of the aspartic acid residue in the active site of both ICL domains completely abolishes their isochorismate lyase activity in vivo. The two proteins also carry an aryl carrier protein (ArCP) domain. In contrast with the ICL domains only the plasmid encoded ArCP can participate in anguibactin production as determined by complementation analyses and site-directed mutagenesis in the active site of the plasmid encoded protein, S248A. The site-directed mutants, D37A in the ICL domain and S248A in the ArCP domain of the plasmid encoded AngB were also tested in vitro and clearly show the importance of each residue for the domain function and that each domain operates independently.

    A novel pathway producing dimethylsulphide in bacteria is widespread in soil environments

    Get PDF
    The volatile compound dimethylsulphide (DMS) is important in climate regulation, the sulphur cycle and signalling to higher organisms. Microbial catabolism of the marine osmolyte dimethylsulphoniopropionate (DMSP) is thought to be the major biological process generating DMS. Here we report the discovery and characterisation of the first gene for DMSP-independent DMS production in any bacterium. This gene, mddA, encodes a methyltransferase that methylates methanethiol (MeSH) and generates DMS. MddA functions in many taxonomically diverse bacteria including sediment-dwelling pseudomonads, nitrogen-fixing bradyrhizobia and cyanobacteria, and mycobacteria, including the pathogen Mycobacterium tuberculosis. The mddA gene is present in metagenomes from varied environments, being particularly abundant in soil environments, where it is predicted to occur in up to 76% of bacteria. This novel pathway may significantly contribute to global DMS emissions, especially in terrestrial environments, and could represent a shift from the notion that DMSP is the only significant precursor of DMS

    Abundant Human DNA Contamination Identified in Non-Primate Genome Databases

    Get PDF
    During routine screens of the NCBI databases using human repetitive elements we discovered an unlikely level of nucleotide identity across a broad range of phyla. To ascertain whether databases containing DNA sequences, genome assemblies and trace archive reads were contaminated with human sequences, we performed an in depth search for sequences of human origin in non-human species. Using a primate specific SINE, AluY, we screened 2,749 non-primate public databases from NCBI, Ensembl, JGI, and UCSC and have found 492 to be contaminated with human sequence. These represent species ranging from bacteria (B. cereus) to plants (Z. mays) to fish (D. rerio) with examples found from most phyla. The identification of such extensive contamination of human sequence across databases and sequence types warrants caution among the sequencing community in future sequencing efforts, such as human re-sequencing. We discuss issues this may raise as well as present data that gives insight as to how this may be occurring

    Expansion of the Protein Repertoire in Newly Explored Environments: Human Gut Microbiome Specific Protein Families

    Get PDF
    The microbes that inhabit particular environments must be able to perform molecular functions that provide them with a competitive advantage to thrive in those environments. As most molecular functions are performed by proteins and are conserved between related proteins, we can expect that organisms successful in a given environmental niche would contain protein families that are specific for functions that are important in that environment. For instance, the human gut is rich in polysaccharides from the diet or secreted by the host, and is dominated by Bacteroides, whose genomes contain highly expanded repertoire of protein families involved in carbohydrate metabolism. To identify other protein families that are specific to this environment, we investigated the distribution of protein families in the currently available human gut genomic and metagenomic data. Using an automated procedure, we identified a group of protein families strongly overrepresented in the human gut. These not only include many families described previously but also, interestingly, a large group of previously unrecognized protein families, which suggests that we still have much to discover about this environment. The identification and analysis of these families could provide us with new information about an environment critical to our health and well being

    Molecular cloning and transcriptional activity of a new Petunia calreticulin gene involved in pistil transmitting tract maturation, progamic phase, and double fertilization

    Get PDF
    Calreticulin (CRT) is a highly conserved and ubiquitously expressed Ca2+-binding protein in multicellular eukaryotes. As an endoplasmic reticulum-resident protein, CRT plays a key role in many cellular processes including Ca2+ storage and release, protein synthesis, and molecular chaperoning in both animals and plants. CRT has long been suggested to play a role in plant sexual reproduction. To begin to address this possibility, we cloned and characterized the full-length cDNA of a new CRT gene (PhCRT) from Petunia. The deduced amino acid sequence of PhCRT shares homology with other known plant CRTs, and phylogenetic analysis indicates that the PhCRT cDNA clone belongs to the CRT1/CRT2 subclass. Northern blot analysis and fluorescent in situ hybridization were used to assess PhCRT gene expression in different parts of the pistil before pollination, during subsequent stages of the progamic phase, and at fertilization. The highest level of PhCRT mRNA was detected in the stigma–style part of the unpollinated pistil 1 day before anthesis and during the early stage of the progamic phase, when pollen is germinated and tubes outgrow on the stigma. In the ovary, PhCRT mRNA was most abundant after pollination and reached maximum at the late stage of the progamic phase, when pollen tubes grow into the ovules and fertilization occurs. PhCRT mRNA transcripts were seen to accumulate predominantly in transmitting tract cells of maturing and receptive stigma, in germinated pollen/growing tubes, and at the micropylar region of the ovule, where the female gametophyte is located. From these results, we suggest that PhCRT gene expression is up-regulated during secretory activity of the pistil transmitting tract cells, pollen germination and outgrowth of the tubes, and then during gamete fusion and early embryogenesis

    Comparative genomics of isolates of a pseudomonas aeruginosa epidemic strain associated with chronic lung infections of cystic fibrosis patients

    Get PDF
    Pseudomonas aeruginosa is the main cause of fatal chronic lung infections among individuals suffering from cystic fibrosis (CF). During the past 15 years, particularly aggressive strains transmitted among CF patients have been identified, initially in Europe and more recently in Canada. The aim of this study was to generate high-quality genome sequences for 7 isolates of the Liverpool epidemic strain (LES) from the United Kingdom and Canada representing different virulence characteristics in order to: (1) associate comparative genomics results with virulence factor variability and (2) identify genomic and/or phenotypic divergence between the two geographical locations. We performed phenotypic characterization of pyoverdine, pyocyanin, motility, biofilm formation, and proteolytic activity. We also assessed the degree of virulence using the Dictyostelium discoideum amoeba model. Comparative genomics analysis revealed at least one large deletion (40-50 kb) in 6 out of the 7 isolates compared to the reference genome of LESB58. These deletions correspond to prophages, which are known to increase the competitiveness of LESB58 in chronic lung infection. We also identified 308 non-synonymous polymorphisms, of which 28 were associated with virulence determinants and 52 with regulatory proteins. At the phenotypic level, isolates showed extensive variability in production of pyocyanin, pyoverdine, proteases and biofilm as well as in swimming motility, while being predominantly avirulent in the amoeba model. Isolates from the two continents were phylogenetically and phenotypically undistinguishable. Most regulatory mutations were isolate-specific and 29% of them were predicted to have high functional impact. Therefore, polymorphism in regulatory genes is likely to be an important basis for phenotypic diversity among LES isolates, which in turn might contribute to this strain's adaptability to varying conditions in the CF lung

    A novel substitution matrix fitted to the compositional bias in Mollicutes improves the prediction of homologous relationships

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Substitution matrices are key parameters for the alignment of two protein sequences, and consequently for most comparative genomics studies. The composition of biological sequences can vary importantly between species and groups of species, and classical matrices such as those in the BLOSUM series fail to accurately estimate alignment scores and statistical significance with sequences sharing marked compositional biases.</p> <p>Results</p> <p>We present a general and simple methodology to build matrices that are especially fitted to the compositional bias of proteins. Our approach is inspired from the one used to build the BLOSUM matrices and is based on learning substitution and amino acid frequencies on real sequences with the corresponding compositional bias. We applied it to the large scale comparison of Mollicute AT-rich genomes. The new matrix, MOLLI60, was used to predict pairwise orthology relationships, as well as homolog families among 24 Mollicute genomes. We show that this new matrix enables to better discriminate between true and false orthologs and improves the clustering of homologous proteins, with respect to the use of the classical matrix BLOSUM62.</p> <p>Conclusions</p> <p>We show in this paper that well-fitted matrices can improve the predictions of orthologous and homologous relationships among proteins with a similar compositional bias. With the ever-increasing number of sequenced genomes, our approach could prove valuable in numerous comparative studies focusing on atypical genomes.</p

    Composite structural motifs of binding sites for delineating biological functions of proteins

    Get PDF
    Most biological processes are described as a series of interactions between proteins and other molecules, and interactions are in turn described in terms of atomic structures. To annotate protein functions as sets of interaction states at atomic resolution, and thereby to better understand the relation between protein interactions and biological functions, we conducted exhaustive all-against-all atomic structure comparisons of all known binding sites for ligands including small molecules, proteins and nucleic acids, and identified recurring elementary motifs. By integrating the elementary motifs associated with each subunit, we defined composite motifs which represent context-dependent combinations of elementary motifs. It is demonstrated that function similarity can be better inferred from composite motif similarity compared to the similarity of protein sequences or of individual binding sites. By integrating the composite motifs associated with each protein function, we define meta-composite motifs each of which is regarded as a time-independent diagrammatic representation of a biological process. It is shown that meta-composite motifs provide richer annotations of biological processes than sequence clusters. The present results serve as a basis for bridging atomic structures to higher-order biological phenomena by classification and integration of binding site structures.Comment: 34 pages, 7 figure
    • …
    corecore