12,923 research outputs found

    A parallel algorithm for de novo peptide sequencing

    Get PDF
    Protein identification is a main problem in proteomics,the large-scale analysis of proteins. Tandem mass spec-trometry (MS/MS) provides an important tool to handleprotein identification problem. Indeed the spectrometeris capable of ionizing a mixture of peptides, essentiallyseveral copies of the same unknown peptide, dissociatingevery molecule into two fragments called complementaryions, and measuring the mass/charge ratios of the pep-tides and of their fragments. These measures are visualizedas mass peaks in a mass spectrum.There are two fundamental approaches to interpret thespectra. The first approach is to search in a database tofind the peptides that match the MS/MS spectra. This data-base search approach is effective for known proteins, butdoes not permit to detect novel proteins. This second taskcan be dealt with the de novo sequencing that computesthe amino acid sequence of the peptides directly fromtheir MS/MS spectra.In the de novo sequencing problem one knows the pep-tide mas

    Protein Sequencing with an Adaptive Genetic Algorithm from Tandem Mass Spectrometry

    Full text link
    In Proteomics, only the de novo peptide sequencing approach allows a partial amino acid sequence of a peptide to be found from a MS/MS spectrum. In this article a preliminary work is presented to discover a complete protein sequence from spectral data (MS and MS/MS spectra). For the moment, our approach only uses MS spectra. A Genetic Algorithm (GA) has been designed with a new evaluation function which works directly with a complete MS spectrum as input and not with a mass list like the other methods using this kind of data. Thus the mono isotopic peak extraction step which needs a human intervention is deleted. The goal of this approach is to discover the sequence of unknown proteins and to allow a better understanding of the differences between experimental proteins and proteins from databases

    The impact of sequence database choice on metaproteomic results in gut microbiota studies

    Get PDF
    Background: Elucidating the role of gut microbiota in physiological and pathological processes has recently emerged as a key research aim in life sciences. In this respect, metaproteomics, the study of the whole protein complement of a microbial community, can provide a unique contribution by revealing which functions are actually being expressed by specific microbial taxa. However, its wide application to gut microbiota research has been hindered by challenges in data analysis, especially related to the choice of the proper sequence databases for protein identification. Results: Here, we present a systematic investigation of variables concerning database construction and annotation and evaluate their impact on human and mouse gut metaproteomic results. We found that both publicly available and experimental metagenomic databases lead to the identification of unique peptide assortments, suggesting parallel database searches as a mean to gain more complete information. In particular, the contribution of experimental metagenomic databases was revealed to be mandatory when dealing with mouse samples. Moreover, the use of a "merged" database, containing all metagenomic sequences from the population under study, was found to be generally preferable over the use of sample-matched databases. We also observed that taxonomic and functional results are strongly database-dependent, in particular when analyzing the mouse gut microbiota. As a striking example, the Firmicutes/Bacteroidetes ratio varied up to tenfold depending on the database used. Finally, assembling reads into longer contigs provided significant advantages in terms of functional annotation yields. Conclusions: This study contributes to identify host- and database-specific biases which need to be taken into account in a metaproteomic experiment, providing meaningful insights on how to design gut microbiota studies and to perform metaproteomic data analysis. In particular, the use of multiple databases and annotation tools has to be encouraged, even though this requires appropriate bioinformatic resources

    PARPST: a PARallel algorithm to find peptide sequence tags

    Get PDF
    Background: Protein identification is one of the most challenging problems in proteomics. Tandem mass spectrometry provides an important tool to handle the protein identification problem. Results: We developed a work-efficient parallel algorithm for the peptide sequence tag problem. The algorithm runs on the concurrent-read, exclusive-write PRAM in O(n) time using log n processors, where n is the number of mass peaks in the spectrum. The algorithm is able to find all the sequence tags having score greater than a parameter or all the sequence tags of maximum length. Our tests on 1507 spectra in the Open Proteomics Database shown that our algorithm is efficient and effective since achieves comparable results to other methods. Conclusions: The proposed algorithm can be used to speed up the database searching or to identify post-translational modifications, comparing the homology of the sequence tags found with the sequences in the biological database

    Coordinated RNA-Seq and peptidomics identify neuropeptides and G-protein coupled receptors (GPCRs) in the large pine weevil Hylobius abietis, a major forestry pest

    Get PDF
    Hylobius abietis (Linnaeus), or large pine weevil (Coleoptera, Curculionidae), is a pest of European coniferous forests. In order to gain understanding of the functional physiology of this species, we have assembled a de novo transcriptome of H. abietis, from sequence data obtained by Next Generation Sequencing. In particular, we have identified genes encoding neuropeptides, peptide hormones and their putative G-protein coupled receptors (GPCRs) to gain insights into neuropeptide-modulated processes. The transcriptome was assembled de novo from pooled paired-end, sequence reads obtained from RNA from whole adults, gut and central nervous system tissue samples. Data analysis was performed on the transcripts obtained from the assembly including, annotation, gene ontology and functional assignment as well as transcriptome completeness assessment and KEGG pathway analysis. Pipelines were created using Bioinformatics tools and techniques for prediction and identification of neuropeptides and neuropeptide receptors. Peptidomic analysis was also carried out using a combination of MALDI-TOF as well as Q-Exactive Orbitrap mass spectrometry to confirm the identified neuropeptide. 41 putative neuropeptide families were identified in H. abietis, including Adipokinetic hormone (AKH), CAPA and DH31. Neuropeptide F, which has not been yet identified in the model beetle T. castaneum, was identified. Additionally, 24 putative neuropeptide and 9 leucine-rich repeat containing G protein coupled receptor-encoding transcripts were determined using both alignment as well as non-alignment methods. This information, submitted to the NCBI sequence read archive repository (SRA accession: SRP133355), can now be used to inform understanding of neuropeptide-modulated physiology and behaviour in H. abietis; and to develop specific neuropeptide-based tools for H. abietis control

    Counting approximately-shortest paths in directed acyclic graphs

    Full text link
    Given a directed acyclic graph with positive edge-weights, two vertices s and t, and a threshold-weight L, we present a fully-polynomial time approximation-scheme for the problem of counting the s-t paths of length at most L. We extend the algorithm for the case of two (or more) instances of the same problem. That is, given two graphs that have the same vertices and edges and differ only in edge-weights, and given two threshold-weights L_1 and L_2, we show how to approximately count the s-t paths that have length at most L_1 in the first graph and length at most L_2 in the second graph. We believe that our algorithms should find application in counting approximate solutions of related optimization problems, where finding an (optimum) solution can be reduced to the computation of a shortest path in a purpose-built auxiliary graph

    Protein Sequencing with an Adaptive Genetic Algorithm from Tandem Mass Spectrometry

    Get PDF
    International audienceIn Proteomics, only the de novo peptide sequencing approach allows a partial amino acid sequence of a peptide to be found from a MS/MS spectrum. In this article a preliminary work is presented to discover a complete protein sequence from spectral data (MS and MS/MS spectra). For the moment, our approach only uses MS spectra. A Genetic Algorithm (GA) has been designed with a new evaluation function which works directly with a complete MS spectrum as input and not with a mass list like the other methods using this kind of data. Thus the mono isotopic peak extraction step which needs a human intervention is deleted. The goal of this approach is to discover the sequence of unknown proteins and to allow a better understanding of the differences between experimental proteins and proteins from databases
    corecore