2 research outputs found

    Protein Sequences Identification using NM-tree

    Get PDF
    ABSTRACT We have generalized a method for tandem mass spectra interpretation, based on the parameterized Hausdorff distance dHP . Instead of just peptides (short pieces of proteins), in this paper we describe the interpretation of whole protein sequences. For this purpose, we employ the recently introduced NM-tree to index the database of hypothetical mass spectra for exact or fast approximate search. The NM-tree combines the M-tree with the TriGen algorithm in a way that allows to dynamically control the retrieval precision at query time. A scheme for protein sequences identification using the NM-tree is proposed

    Improving the Similarity Search of Tandem Mass Spectra using Metric Access Methods.

    No full text
    ABSTRACT In biological applications, the tandem mass spectrometry is a widely used method for determining protein and peptide sequences from an "in vitro" sample. The sequences are not determined directly, but they must be interpreted from the mass spectra, which is the output of the mass spectrometer. This work is focused on a similarity-search approach to mass spectra interpretation, where the parametrized Hausdorff distance (d HP ) is used as the similarity. In order to provide an efficient similarity search under d HP , the metric access methods and the TriGen algorithm (controlling the metricity of d HP ) are employed. We show that similarity search using d HP exhibits better correctness of peptide mass spectra interpretation than the cosine similarity commonly mentioned in mass spectrometry literature. Moreover, the search model using the d HP distance could be extended to support chemical modifications in the query mass spectra, which is typically a problem when the cosine similarity is used. Our approach can be utilized as a coarse filter by any other database approach for mass spectra interpretation
    corecore