42 research outputs found

    Comparison study on k-word statistical measures for protein: From sequence to 'sequence space'

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many proposed statistical measures can efficiently compare protein sequence to further infer protein structure, function and evolutionary information. They share the same idea of using <it>k</it>-word frequencies of protein sequences. Given a protein sequence, the information on its related protein sequences hasn't been used for protein sequence comparison until now. This paper proposed a scheme to construct protein 'sequence space' which was associated with protein sequences related to the given protein, and the performances of statistical measures were compared when they explored the information on protein 'sequence space' or not. This paper also presented two statistical measures for protein: <it>gre.k </it>(generalized relative entropy) and <it>gsm.k </it>(gapped similarity measure).</p> <p>Results</p> <p>We tested statistical measures based on protein 'sequence space' or not with three data sets. This not only offers the systematic and quantitative experimental assessment of these statistical measures, but also naturally complements the available comparison of statistical measures based on protein sequence. Moreover, we compared our statistical measures with alignment-based measures and the existing statistical measures. The experiments were grouped into two sets. The first one, performed via ROC (Receiver Operating Curve) analysis, aims at assessing the intrinsic ability of the statistical measures to discriminate and classify protein sequences. The second set of the experiments aims at assessing how well our measure does in phylogenetic analysis. Based on the experiments, several conclusions can be drawn and, from them, novel valuable guidelines for the use of protein 'sequence space' and statistical measures were obtained.</p> <p>Conclusion</p> <p>Alignment-based measures have a clear advantage when the data is high redundant. The more efficient statistical measure is the novel <it>gsm.k </it>introduced by this article, the <it>cos.k </it>followed. When the data becomes less redundant, <it>gre.k </it>proposed by us achieves a better performance, but all the other measures perform poorly on classification tasks. Almost all the statistical measures achieve improvement by exploring the information on 'sequence space' as word's length increases, especially for less redundant data. The reasonable results of phylogenetic analysis confirm that <it>Gdis.k </it>based on 'sequence space' is a reliable measure for phylogenetic analysis. In summary, our quantitative analysis verifies that exploring the information on 'sequence space' is a promising way to improve the abilities of statistical measures for protein comparison.</p

    Identification of Prophages in Bacterial Genomes by Dinucleotide Relative Abundance Difference

    Get PDF
    BACKGROUND: Prophages are integrated viral forms in bacterial genomes that have been found to contribute to interstrain genetic variability. Many virulence-associated genes are reported to be prophage encoded. Present computational methods to detect prophages are either by identifying possible essential proteins such as integrases or by an extension of this technique, which involves identifying a region containing proteins similar to those occurring in prophages. These methods suffer due to the problem of low sequence similarity at the protein level, which suggests that a nucleotide based approach could be useful. METHODOLOGY: Earlier dinucleotide relative abundance (DRA) have been used to identify regions, which deviate from the neighborhood areas, in genomes. We have used the difference in the dinucleotide relative abundance (DRAD) between the bacterial and prophage DNA to aid location of DNA stretches that could be of prophage origin in bacterial genomes. Prophage sequences which deviate from bacterial regions in their dinucleotide frequencies are detected by scanning bacterial genome sequences. The method was validated using a subset of genomes with prophage data from literature reports. A web interface for prophage scan based on this method is available at http://bicmku.in:8082/prophagedb/dra.html. Two hundred bacterial genomes which do not have annotated prophages have been scanned for prophage regions using this method. CONCLUSIONS: The relative dinucleotide distribution difference helps detect prophage regions in genome sequences. The usefulness of this method is seen in the identification of 461 highly probable loci pertaining to prophages which have not been annotated so earlier. This work emphasizes the need to extend the efforts to detect and annotate prophage elements in genome sequences

    Extraction of pure components from overlapped signals in gas chromatography-mass spectrometry (GC-MS)

    Get PDF
    Gas chromatography-mass spectrometry (GC-MS) is a widely used analytical technique for the identification and quantification of trace chemicals in complex mixtures. When complex samples are analyzed by GC-MS it is common to observe co-elution of two or more components, resulting in an overlap of signal peaks observed in the total ion chromatogram. In such situations manual signal analysis is often the most reliable means for the extraction of pure component signals; however, a systematic manual analysis over a number of samples is both tedious and prone to error. In the past 30 years a number of computational approaches were proposed to assist in the process of the extraction of pure signals from co-eluting GC-MS components. This includes empirical methods, comparison with library spectra, eigenvalue analysis, regression and others. However, to date no approach has been recognized as best, nor accepted as standard. This situation hampers general GC-MS capabilities, and in particular has implications for the development of robust, high-throughput GC-MS analytical protocols required in metabolic profiling and biomarker discovery. Here we first discuss the nature of GC-MS data, and then review some of the approaches proposed for the extraction of pure signals from co-eluting components. We summarize and classify different approaches to this problem, and examine why so many approaches proposed in the past have failed to live up to their full promise. Finally, we give some thoughts on the future developments in this field, and suggest that the progress in general computing capabilities attained in the past two decades has opened new horizons for tackling this important problem

    Wolbachia Prophage DNA Adenine Methyltransferase Genes in Different Drosophila-Wolbachia Associations

    Get PDF
    Wolbachia is an obligatory intracellular bacterium which often manipulates the reproduction of its insect and isopod hosts. In contrast, Wolbachia is an essential symbiont in filarial nematodes. Lately, Wolbachia has been implicated in genomic imprinting of host DNA through cytosine methylation. The importance of DNA methylation in cell fate and biology calls for in depth studing of putative methylation-related genes. We present a molecular and phylogenetic analysis of a putative DNA adenine methyltransferase encoded by a prophage in the Wolbachia genome. Two slightly different copies of the gene, met1 and met2, exhibit a different distribution over various Wolbachia strains. The met2 gene is present in the majority of strains, in wAu, however, it contains a frameshift caused by a 2 bp deletion. Phylogenetic analysis of the met2 DNA sequences suggests a long association of the gene with the Wolbachia host strains. In addition, our analysis provides evidence for previously unnoticed multiple infections, the detection of which is critical for the molecular elucidation of modification and/or rescue mechanism of cytoplasmic incompatibility

    The use of genomic signature distance between bacteriophages and their hosts displays evolutionary relationships and phage growth cycle determination

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Bacteriophage classification is mainly based on morphological traits and genome characteristics combined with host information and in some cases on phage growth lifestyle. A lack of molecular tools can impede more precise studies on phylogenetic relationships or even a taxonomic classification. The use of methods to analyze genome sequences without the requirement for homology has allowed advances in classification.</p> <p>Results</p> <p>Here, we proposed to use genome sequence signature to characterize bacteriophages and to compare them to their host genome signature in order to obtain host-phage relationships and information on their lifestyle. We analyze the host-phage relationships in the four most representative groups of Caudoviridae, the dsDNA group of phages. We demonstrate that the use of phage genomic signature and its comparison with that of the host allows a grouping of phages and is also able to predict the host-phage relationships (lytic <it>vs</it>. temperate).</p> <p>Conclusions</p> <p>We can thus condense, in relatively simple figures, this phage information dispersed over many publications.</p

    Pharmacogenetics: data, concepts and tools to improve drug discovery and drug treatment

    Get PDF
    Variation in the human genome is a most important cause of variable response to drugs and other xenobiotics. Susceptibility to almost all diseases is determined to some extent by genetic variation. Driven by the advances in molecular biology, pharmacogenetics has evolved within the past 40 years from a niche discipline to a major driving force of clinical pharmacology, and it is currently one of the most actively pursued disciplines in applied biomedical research in general. Nowadays we can assess more than 1,000,000 polymorphisms or the expression of more than 25,000 genes in each participant of a clinical study – at affordable costs. This has not yet significantly changed common therapeutic practices, but a number of physicians are starting to consider polymorphisms, such as those in CYP2C9, CYP2C19, CYP2D6, TPMT and VKORC1, in daily medical practice. More obviously, pharmacogenetics has changed the practices and requirements in preclinical and clinical drug research; large clinical trials without a pharmacogenomic add-on appear to have become the minority. This review is about how the discipline of pharmacogenetics has evolved from the analysis of single proteins to current approaches involving the broad analyses of the entire genome and of all mRNA species or all metabolites and other approaches aimed at trying to understand the entire biological system. Pharmacogenetics and genomics are becoming substantially integrated fields of the profession of clinical pharmacology, and education in the relevant methods, knowledge and concepts form an indispensable part of the clinical pharmacology curriculum and the professional life of pharmacologists from early drug discovery to pharmacovigilance
    corecore