6 research outputs found

    Proteomics Reveals Novel Drosophila Seminal Fluid Proteins Transferred at Mating

    Get PDF
    Across diverse taxa, seminal fluid proteins (Sfps) transferred at mating affect the reproductive success of both sexes. Such reproductive proteins often evolve under positive selection between species; because of this rapid divergence, Sfps are hypothesized to play a role in speciation by contributing to reproductive isolation between populations. In Drosophila, individual Sfps have been characterized and are known to alter male sperm competitive ability and female post-mating behavior, but a proteomic-scale view of the transferred Sfps has been missing. Here we describe a novel proteomic method that uses whole-organism isotopic labeling to detect transferred Sfps in mated female D. melanogaster. We identified 63 proteins, which were previously unknown to function in reproduction, and confirmed the transfer of dozens of predicted Sfps. Relative quantification of protein abundance revealed that several of these novel Sfps are abundant in seminal fluid. Positive selection and tandem gene duplication are the prevailing forces of Sfp evolution, and comparative proteomics with additional species revealed lineage-specific changes in seminal fluid content. We also report a proteomic-based gene discovery method that uncovered 19 previously unannotated genes in D. melanogaster. Our results demonstrate an experimental method to identify transferred proteins in any system that is amenable to isotopic labeling, and they underscore the power of combining proteomic and evolutionary analyses to shed light on the complex process of Drosophila reproduction

    Filtering Methods for Mass Spectrometry-based Peptide Identification Processes

    Get PDF
    Tandem mass spectrometry (MS/MS) is a powerful tool for identifying peptide sequences. In a typical experiment, incorrect peptide identifications may result due to noise contained in the MS/MS spectra and to the low quality of the spectra. Filtering methods are widely used to remove the noise and improve the quality of the spectra before the subsequent spectra identification process. However, existing filtering methods often use features and empirically assigned weights. These weights may not reflect the reality that the contribution (reflected by weight) of each feature may vary from dataset to dataset. Therefore, filtering methods that can adapt to different datasets have the potential to improve peptide identification results. This thesis proposes two adaptive filtering methods; denoising and quality assessment, both of which improve efficiency and effectiveness of peptide identification. First, the denoising approach employs an adaptive method for picking signal peaks that is more suitable for the datasets of interest. By applying the approach to two tandem mass spectra datasets, about 66% of peaks (likely noise peaks) can be removed. The number of peptides identified later by peptide identification on those datasets increased by 14% and 23%, respectively, compared to previous work (Ding et al., 2009a). Second, the quality assessment method estimates the probabilities of spectra being high quality based on quality assessments of the individual features. The probabilities are estimated by solving a constraint optimization problem. Experimental results on two datasets illustrate that searching only the high-quality tandem spectra determined using this method saves about 56% and 62% of database searching time and loses 9% of high-quality spectra. Finally, the thesis suggests future research directions including feature selection and clustering of peptides

    Pre-processing of tandem mass spectra using machine learning methods

    Get PDF
    Protein identification has been more helpful than before in the diagnosis and treatment of many diseases, such as cancer, heart disease and HIV. Tandem mass spectrometry is a powerful tool for protein identification. In a typical experiment, proteins are broken into small amino acid oligomers called peptides. By determining the amino acid sequence of several peptides of a protein, its whole amino acid sequence can be inferred. Therefore, peptide identification is the first step and a central issue for protein identification. Tandem mass spectrometers can produce a large number of tandem mass spectra which are used for peptide identification. Two issues should be addressed to improve the performance of current peptide identification algorithms. Firstly, nearly all spectra are noise-contaminated. As a result, the accuracy of peptide identification algorithms may suffer from the noise in spectra. Secondly, the majority of spectra are not identifiable because they are of too poor quality. Therefore, much time is wasted attempting to identify these unidentifiable spectra. The goal of this research is to design spectrum pre-processing algorithms to both speedup and improve the reliability of peptide identification from tandem mass spectra. Firstly, as a tandem mass spectrum is a one dimensional signal consisting of dozens to hundreds of peaks, and majority of peaks are noisy peaks, a spectrum denoising algorithm is proposed to remove most noisy peaks of spectra. Experimental results show that our denoising algorithm can remove about 69% of peaks which are potential noisy peaks among a spectrum. At the same time, the number of spectra that can be identified by Mascot algorithm increases by 31% and 14% for two tandem mass spectrum datasets. Next, a two-stage recursive feature elimination based on support vector machines (SVM-RFE) and a sparse logistic regression method are proposed to select the most relevant features to describe the quality of tandem mass spectra. Our methods can effectively select the most relevant features in terms of performance of classifiers trained with the different number of features. Thirdly, both supervised and unsupervised machine learning methods are used for the quality assessment of tandem mass spectra. A supervised classifier, (a support vector machine) can be trained to remove more than 90% of poor quality spectra without removing more than 10% of high quality spectra. Clustering methods such as model-based clustering are also used for quality assessment to cancel the need for a labeled training dataset and show promising results

    De novo peptide sequencing methods for tandem mass spectra

    Get PDF
    De novo peptide sequencing from MS/MS spectra has become of primary importance in proteomics. It provides essential information for studies of protein structure and function. With the availability of various MS/MS spectra, a lot of computational methods have been developed to infer peptide sequences from them. However, current de novo peptide sequencing methods still have limitations. Some major ones include a lack of suitable models reflecting MS/MS spectra, limited information extracted from MS/MS spectra, and the inefficient use of multiple spectra. This thesis addresses some of the limitations with a series of novel computational methods designed for various MS/MS spectra and their combinations. The main content of the thesis starts with a comprehensive review of recent developments in de novo peptide sequencing methods, followed by two novel methods for single spectrum sequencing problems, and then presents two paired spectra sequencing methods. The first chapter introduces relevant background information, objectives of the study, and the structure of the thesis. After that, a comprehensive review of de novo peptide sequencing methods is given. It summarizes recent developments of computational methods for various experimental spectra, compares and analyzes their advantages and disadvantages, and points out some future research directions. Having these potential research directions, the thesis next presents two novel methods designed for higher-energy collisional dissociation (HCD) spectra and electron capture dissociation (ECD) (or electron transfer dissociation (ETD)) spectra, respectively. These methods apply new spectrum graph models with multiple types of edges, integrate amino acid combination (AAC) information and peptide tags, and consider spectrum-specific information to suit different spectra. After that, multiple spectra sequencing problem is studied. A framework for de novo peptide sequencing of multiple spectra is given with applications to two different spectra pairs. One pair is spectrally complementary to each other, and the other is similar spectra with property differences. These methods include effective spectra merging criteria and parent mass correction steps, and modify the previously proposed graph models to fit the merged spectra. Experiments on several experimental MS/MS spectra datasets and datasets pairs show the advantages of the proposed methods in terms of peptide sequencing accuracy. Finally, conclusions and future work directions are given at the end of the thesis. To summarize the work in the thesis, a series of novel computational methods for de novo peptide sequencing are proposed. These methods target different types of MS/MS spectra and their combinations. Experiential results show the proposed methods are either better than competing methods that already exist, or fill gaps in the suite of currently available methods