328 research outputs found

    Bioinformatic Analysis Reveals High Diversity of Bacterial Genes for Laccase-Like Enzymes

    Get PDF
    Fungal laccases have been used in various fields ranging from processes in wood and paper industries to environmental applications. Although a few bacterial laccases have been characterized in recent years, prokaryotes have largely been neglected as a source of novel enzymes, in part due to the lack of knowledge about the diversity and distribution of laccases within Bacteria. In this work genes for laccase-like enzymes were searched for in over 2,200 complete and draft bacterial genomes and four metagenomic datasets, using the custom profile Hidden Markov Models for two- and three- domain laccases. More than 1,200 putative genes for laccase-like enzymes were retrieved from chromosomes and plasmids of diverse bacteria. In 76% of the genes, signal peptides were predicted, indicating that these bacterial laccases may be exported from the cytoplasm, which contrasts with the current belief. Moreover, several examples of putatively horizontally transferred bacterial laccase genes were described. Many metagenomic sequences encoding fragments of laccase-like enzymes could not be phylogenetically assigned, indicating considerable novelty. Laccase-like genes were also found in anaerobic bacteria, autotrophs and alkaliphiles, thus opening new hypotheses regarding their ecological functions. Bacteria identified as carrying laccase genes represent potential sources for future biotechnological applications

    A new data mining approach for the detection of bacterial promoters combining stochastic and combinatorial methods

    Get PDF
    International audienceWe present a new data mining method based on stochastic analysis (HMM for Hidden Markov Model) and combinatorial methods for discovering new transcriptional factors in bacterial genome sequences. Sigma factor binding sites (SFBSs) were described as patterns of box1 - spacer - box2 corresponding to the -35 and -10 DNA motifs of bacterial promoters. We used a high-order Hidden Markov Model in which the hidden process is a second-order Markov chain. Applied on the genome of the model bacterium Streptomyces coelicolor (2), the a posteriori state probabilities revealed local maxima or peaks whose distribution was enriched in the intergenic sequences (``iPeaks'' for intergenic peaks). Short DNA sequences underlying the iPeaks were extracted and clustered by a hierarchical classification algorithm based on the SmithWaterman local similarity. Some selected motif consensuses were used as box1 (-35 motif) in the search of a potential neighbouring box2 (-10 motif) using a word enumeration algorithm. This new SFBS mining methodology applied on Streptomyces coelicolor was successful to retrieve already known SFBSs and to suggest new potential transcriptional factor binding sites (TFBSs). The well defined SigR regulon (oxidative stress response) was also used as a test quorum to compare first and second-order HMM. Our approach also allowed the preliminary detection of known SFBSs in Bacillus subtilis

    MiST: a microbial signal transduction database

    Get PDF
    Signal transduction pathways control most cellular activities in living cells ranging from regulation of gene expression to fine-tuning enzymatic activity and controlling motile behavior in response to extracellular and intracellular signals. Because of their extreme sequence variability and extensive domain shuffling, signal transduction proteins are difficult to identify, and their current annotation in most leading databases is often incomplete or erroneous. To overcome this problem, we have developed the microbial signal transduction (MiST) database (), a comprehensive library of the signal transduction proteins from completely sequenced bacterial and archaeal genomes. By searching for domain profiles that implicate a particular protein as participating in signal transduction, we have systematically identified 69 270 two- and one-component proteins in 365 bacterial and archaeal genomes. We have designed a user-friendly website to access and browse the predicted signal transduction proteins within various organisms. Further capabilities include gene/protein sequence retrieval, visualized domain architectures, interactive chromosomal views for exploring gene neighborhood, advanced querying options and cross-species comparison. Newly available, complete genomes are loaded into the database each month. MiST is the only comprehensive and up-to-date electronic catalog of the signaling machinery in microbial genomes

    Genome-wide survey and phylogeny of S-Ribosylhomocysteinase (LuxS) enzyme in bacterial genomes

    Get PDF
    Background: The study of survival and communication of pathogenic bacteria is important to combat diseases caused by such micro-organisms. Bacterial cells communicate with each other using a density-dependent cell-cell communication process called Quorum Sensing (QS). LuxS protein is an important member of interspecies quorumsensing system, involved in the biosynthesis of Autoinducer-2 (AI-2), and has been identified as a drug target. Despite the above mentioned significance, their evolution has not been fully studied, particularly from a structural perspective. Results: Search for LuxS in the non-redundant database of protein sequences yielded 3106 sequences. Phylogenetic analysis of these sequences revealed grouping of sequences into five distinct clusters belonging to different phyla and according to their habitat. A majority of the neighbouring genes of LuxS have been found to be hypothetical proteins. However, gene synteny analyses in different bacterial genomes reveal the presence of few interesting gene neighbours. Moreover, LuxS gene was found to be a component of an operon in only six out of 36 genomes. Analysis of conserved motifs in representative LuxS sequences of different clusters revealed the presence of conserved motifs common to sequences of all the clusters as well as motifs unique to each cluster. Homology modelling of LuxS protein sequences of each cluster revealed few structural features unique to protein of each cluster. Analyses of surface electrostatic potentials of the homology models of each cluster showed the interactions that are common to all the clusters, as well as cluster-specific potentials and therefore interacting partners, which may be unique to each cluster. Conclusions: LuxS protein evolved early during the course of bacterial evolution, but has diverged into five subtypes. Analysis of sequence motifs and homology models of representative members reveal cluster-specific structural properties of LuxS. Further, it is also shown that LuxS protein may be involved in various protein-protein or proteinRNA interactions, which may regulate the activity of LuxS proteins in bacteria

    Temporal and Spatial Data Mining with Second-Order Hidden Models

    Get PDF
    In the frame of designing a knowledge discovery system, we have developed stochastic models based on high-order hidden Markov models. These models are capable to map sequences of data into a Markov chain in which the transitions between the states depend on the \texttt{n} previous states according to the order of the model. We study the process of achieving information extraction fromspatial and temporal data by means of an unsupervised classification. We use therefore a French national database related to the land use of a region, named Teruti, which describes the land use both in the spatial and temporal domain. Land-use categories (wheat, corn, forest, ...) are logged every year on each site regularly spaced in the region. They constitute a temporal sequence of images in which we look for spatial and temporal dependencies. The temporal segmentation of the data is done by means of a second-order Hidden Markov Model (\hmmd) that appears to have very good capabilities to locate stationary segments, as shown in our previous work in speech recognition. Thespatial classification is performed by defining a fractal scanning ofthe images with the help of a Hilbert-Peano curve that introduces atotal order on the sites, preserving the relation ofneighborhood between the sites. We show that the \hmmd performs aclassification that is meaningful for the agronomists.Spatial and temporal classification may be achieved simultaneously by means of a 2 levels \hmmd that measures the \aposteriori probability to map a temporal sequence of images onto a set of hidden classes

    Using Markov Models to Mine Temporal and Spatial Data

    Get PDF
    Référence du projet ANR BIODIVAGRIM : ANR 07 BDIV 02Markov models represent a powerful way to approach the problem of mining time and spatial signals whose variability is not yet fully understood. In this chapter, we will present a general methodology to mine different kinds of temporal and spatial signals having contrasting properties: continuous or discrete with few or many modalities. This methodology is based on a high order Markov modelling as implemented in a free software: carottAge (Gnu GPL)Les modèles de Markov sont des modèles puissants pour analyser des signaux temporels et spatiaux dont la variabilité n'est pas entièrement comprise. Dans ce chapitre, nous présentons notre méthodologie pour fouiller différentes sortes de signaux ayant des propriétés différentes: signaux continus ou discrets, simples ou composites. Cette méthodologie s'appuie sur des modèles de Markov cachés du second-ordre tels qu'implantés dans la boîte à outils CarottAge (licence Gnu-GPL)

    A Benchmark of Parametric Methods for Horizontal Transfers Detection

    Get PDF
    Horizontal gene transfer (HGT) has appeared to be of importance for prokaryotic species evolution. As a consequence numerous parametric methods, using only the information embedded in the genomes, have been designed to detect HGTs. Numerous reports of incongruencies in results of the different methods applied to the same genomes were published. The use of artificial genomes in which all HGT parameters are controlled allows testing different methods in the same conditions. The results of this benchmark concerning 16 representative parametric methods showed a great variety of efficiencies. Some methods work very poorly whatever the type of HGTs and some depend on the conditions or on the metrics used. The best methods in terms of total errors were those using tetranucleotides as criterion for the window methods or those using codon usage for gene based methods and the Kullback-Leibler divergence metric. Window methods are very sensitive but less specific and detect badly lone isolated gene. On the other hand gene based methods are often very specific but lack of sensitivity. We propose using two methods in combination to get the best of each category, a gene based one for specificity and a window based one for sensitivity

    Quod erat demonstrandum? The mystery of experimental validation of apparently erroneous computational analyses of protein sequences

    Get PDF
    BACKGROUND: Computational predictions are critical for directing the experimental study of protein functions. Therefore it is paradoxical when an apparently erroneous computational prediction seems to be supported by experiment. RESULTS: We analyzed six cases where application of novel or conventional computational methods for protein sequence and structure analysis led to non-trivial predictions that were subsequently supported by direct experiments. We show that, on all six occasions, the original prediction was unjustified, and in at least three cases, an alternative, well-supported computational prediction, incompatible with the original one, could be derived. The most unusual cases involved the identification of an archaeal cysteinyl-tRNA synthetase, a dihydropteroate synthase and a thymidylate synthase, for which experimental verifications of apparently erroneous computational predictions were reported. Using sequence-profile analysis, multiple alignment and secondary-structure prediction, we have identified the unique archaeal 'cysteinyl-tRNA synthetase' as a homolog of extracellular polygalactosaminidases, and the 'dihydropteroate synthase' as a member of the beta-lactamase-like superfamily of metal-dependent hydrolases. CONCLUSIONS: In each of the analyzed cases, the original computational predictions could be refuted and, in some instances, alternative strongly supported predictions were obtained. The nature of the experimental evidence that appears to support these predictions remains an open question. Some of these experiments might signify discovery of extremely unusual forms of the respective enzymes, whereas the results of others could be due to artifacts

    Multicofactor proteins: structure, prediction, function

    Get PDF
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    • …
    corecore