586 research outputs found

    Sequence Search Algorithms for Single Pass Sequence Identification: Does One Size Fit All?

    Get PDF
    Bioinformatic tools have become essential to biologists in their quest to understand the vast quantities of sequence data, and now whole genomes, which are being produced at an ever increasing rate. Much of these sequence data are single-pass sequences, such as sample sequences from organisms closely related to other organisms of interest which have already been sequenced, or cDNAs or expressed sequence tags (ESTs). These single-pass sequences often contain errors, including frameshifts, which complicate the identification of homologues, especially at the protein level. Therefore, sequence searches with this type of data are often performed at the nucleotide level. The most commonly used sequence search algorithms for the identification of homologues are Washington University’s and the National Center for Biotechnology Information's (NCBI) versions of the BLAST suites of tools, which are to be found on websites all over the world. The work reported here examines the use of these tools for comparing sample sequence datasets to a known genome. It shows that care must be taken when choosing the parameters to use with the BLAST algorithms. NCBI’s version of gapped BLASTn gives much shorter, and sometimes different, top alignments to those found using Washington University’s version of BLASTn (which also allows for gaps), when both are used with their default parameters. Most of the differences in performance were found to be due to the choices of default parameters rather than underlying differences between the two algorithms. Washington University’s version, used with defaults, compares very favourably with the results obtained using the accurate but computationally intensive Smith–Waterman algorithm

    Analysis of gene expression in operons of Streptomyces coelicolor

    Get PDF
    BACKGROUND: Recent studies have shown that microarray-derived gene-expression data are useful for operon prediction. However, it is apparent that genes within an operon do not conform to the simple notion that they have equal levels of expression. RESULTS: To investigate the relative transcript levels of intra-operonic genes, we have used a Z-score approach to normalize the expression levels of all genes within an operon to expression of the first gene of that operon. Here we demonstrate that there is a general downward trend in expression from the first to the last gene in Streptomyces coelicolor operons, in contrast to what we observe in Escherichia coli. Combining transcription-factor binding-site prediction with the identification of operonic genes that exhibited higher transcript levels than the first gene of the same operon enabled the discovery of putative internal promoters. The presence of transcription terminators and abundance of putative transcriptional control sequences in S. coelicolor operons are also described. CONCLUSION: Here we have demonstrated a polarity of expression in operons of S. coelicolor not seen in E. coli, bringing caution to those that apply operon prediction strategies based on E. coli 'equal-expression' to divergent species. We speculate that this general difference in transcription behavior could reflect the contrasting lifestyles of the two organisms and, in the case of Streptomyces, might also be influenced by its high G+C content genome. Identification of putative internal promoters, previously thought to cause problems in operon prediction strategies, has also been enabled

    PepSeeker: a database of proteome peptide identifications for investigating fragmentation patterns

    Get PDF
    Proteome science relies on bioinformatics tools to characterize proteins via their proteolytic peptides which are identified via characteristic mass spectra generated after their ions undergo fragmentation in the gas phase within the mass spectrometer. The resulting secondary ion mass spectra are compared with protein sequence databases in order to identify the amino acid sequence. Although these search tools (e.g. SEQUEST, Mascot, X!Tandem, Phenyx) are frequently successful, much is still not understood about the amino acid sequence patterns which promote/protect particular fragmentation pathways, and hence lead to the presence/absence of particular ions from different ion series. In order to advance this area, we have developed a database, PepSeeker (), which captures this peptide identification and ion information from proteome experiments. The database currently contains >185 000 peptides and associated database search information. Users may query this resource to retrieve peptide, protein and spectral information based on protein or peptide information, including the amino acid sequence itself represented by regular expressions coupled with ion series information. We believe this database will be useful to proteome researchers wishing to understand gas phase peptide ion chemistry in order to improve peptide identification strategies. Questions can be addressed to [email protected]

    Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide <it>prima facie </it>evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced <it>Aspergillus niger </it>fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another <it>A.niger </it>sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR).</p> <p>Results</p> <p>405 identified peptide sequences were mapped to 214 different <it>A.niger </it>genomic <it>loci </it>to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these <it>loci </it>either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models.</p> <p>Conclusion</p> <p>This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST) data has been. A comparison of the published genome from another strain of <it>A.niger </it>sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.</p

    Multiple Histogram Method for Quantum Monte Carlo

    Full text link
    An extension to the multiple-histogram method (sometimes referred to as the Ferrenberg-Swendsen method) for use in quantum Monte Carlo simulations is presented. This method is shown to work well for the 2D repulsive Hubbard model, allowing measurements to be taken over a continuous region of parameters. The method also reduces the error bars over the range of parameter values due the overlapping of multiple histograms. A continuous sweep of parameters and reduced error bars allow one to make more difficult measurements, such as Maxwell constructions used to study phase separation. Possibilities also exist for this method to be used for other quantum systems.Comment: 4 pages, 5 figures, RevTeX, submitted to Phys. Rev. B Rapid Com

    An informatic pipeline for the data capture and submission of quantitative proteomic data using iTRAQ(TM)

    Get PDF
    BACKGROUND: Proteomics continues to play a critical role in post-genomic science as continued advances in mass spectrometry and analytical chemistry support the separation and identification of increasing numbers of peptides and proteins from their characteristic mass spectra. In order to facilitate the sharing of this data, various standard formats have been, and continue to be, developed. Still not fully mature however, these are not yet able to cope with the increasing number of quantitative proteomic technologies that are being developed. RESULTS: We propose an extension to the PRIDE and mzData XML schema to accommodate the concept of multiple samples per experiment, and in addition, capture the intensities of the iTRAQ(TM )reporter ions in the entry. A simple Java-client has been developed to capture and convert the raw data from common spectral file formats, which also uses a third-party open source tool for the generation of iTRAQ(TM) reported intensities from Mascot output, into a valid PRIDE XML entry. CONCLUSION: We describe an extension to the PRIDE and mzData schemas to enable the capture of quantitative data. Currently this is limited to iTRAQ(TM) data but is readily extensible for other quantitative proteomic technologies. Furthermore, a software tool has been developed which enables conversion from various mass spectrum file formats and corresponding Mascot peptide identifications to PRIDE formatted XML. The tool represents a simple approach to preparing quantitative and qualitative data for submission to repositories such as PRIDE, which is necessary to facilitate data deposition and sharing in public domain database. The software is freely available from

    e-Fungi: a data resource for comparative analysis of fungal genomes.

    Get PDF
    BACKGROUND: The number of sequenced fungal genomes is ever increasing, with about 200 genomes already fully sequenced or in progress. Only a small percentage of those genomes have been comprehensively studied, for example using techniques from functional genomics. Comparative analysis has proven to be a useful strategy for enhancing our understanding of evolutionary biology and of the less well understood genomes. However, the data required for these analyses tends to be distributed in various heterogeneous data sources, making systematic comparative studies a cumbersome task. Furthermore, comparative analyses benefit from close integration of derived data sets that cluster genes or organisms in a way that eases the expression of requests that clarify points of similarity or difference between species. DESCRIPTION: To support systematic comparative analyses of fungal genomes we have developed the e-Fungi database, which integrates a variety of data for more than 30 fungal genomes. Publicly available genome data, functional annotations, and pathway information has been integrated into a single data repository and complemented with results of comparative analyses, such as MCL and OrthoMCL cluster analysis, and predictions of signaling proteins and the sub-cellular localisation of proteins. To access the data, a library of analysis tasks is available through a web interface. The analysis tasks are motivated by recent comparative genomics studies, and aim to support the study of evolutionary biology as well as community efforts for improving the annotation of genomes. Web services for each query are also available, enabling the tasks to be incorporated into workflows. CONCLUSION: The e-Fungi database provides fungal biologists with a resource for comparative studies of a large range of fungal genomes. Its analysis library supports the comparative study of genome data, functional annotation, and results of large scale analyses over all the genomes stored in the database. The database is accessible at http://www.e-fungi.org.uk, as is the WSDL for the web services.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

    Superconductivity in the Hubbard model with correlated hopping: Slave-boson study

    Full text link
    The slave boson mean-field studies of the ground state of the Hubbard model with correlated hopping were performed. The approach qualitatively recovers the exact results for the case of the hopping integral t equal to the correlated hopping integral X. The phase diagram for the strongly correlated state with only singly occupied sites, the weakly correlated state, where single and double occupation is allowed, and for the superconducting state, was determined for any values of X and any electron concentration n. At the half-filled band (n=1) a direct transition from the superconductor to the Mott insulator was found. In the region of strong correlations the superconducting solution is stable for n close to 1, in contrast to the case of weak correlations, in which superconductivity occurs at n close to 0 and n close to 2. We found also that strong correlations change characteristics of the superconducting phase, e.g. the gap in the excitation spectrum has a nonexponential dependence close to the point of the phase transition.Comment: 13 pages, 24 Postscript figures (in 12 files

    Charge dynamics in the Mott insulating phase of the ionic Hubbard model

    Full text link
    We extend to charge and bond operators the transformation that maps the ionic Hubbard model at half filling onto an effective spin Hamiltonian. Using these operators we calculate the amplitude of the charge density wave in different dimensions. In one dimension, the charge-charge correlations at large distance d decay as 1/(d^3 ln^{3/2}d), in spite of the presence of a charge gap, as a consequence of remaining charge-spin coupling. Bond-bond correlations decay as (-1)^d 1/(d ln^{3/2}d) as in the usual Hubbard model.Comment: 4 pages, no figures, submitted to Phys. Rev. B printing errors corrected and some clarifications adde
    corecore