1,526 research outputs found

    An Efficient Algorithm for Clustering of Large-Scale Mass Spectrometry Data

    Full text link
    High-throughput spectrometers are capable of producing data sets containing thousands of spectra for a single biological sample. These data sets contain a substantial amount of redundancy from peptides that may get selected multiple times in a LC-MS/MS experiment. In this paper, we present an efficient algorithm, CAMS (Clustering Algorithm for Mass Spectra) for clustering mass spectrometry data which increases both the sensitivity and confidence of spectral assignment. CAMS utilizes a novel metric, called F-set, that allows accurate identification of the spectra that are similar. A graph theoretic framework is defined that allows the use of F-set metric efficiently for accurate cluster identifications. The accuracy of the algorithm is tested on real HCD and CID data sets with varying amounts of peptides. Our experiments show that the proposed algorithm is able to cluster spectra with very high accuracy in a reasonable amount of time for large spectral data sets. Thus, the algorithm is able to decrease the computational time by compressing the data sets while increasing the throughput of the data by interpreting low S/N spectra.Comment: 4 pages, 4 figures, Bioinformatics and Biomedicine (BIBM), 2012 IEEE International Conference o

    Algorithms for peptide and PTM identification using Tandem mass spectrometry

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Integration of Mass Spectrometry Data for Structural Biology

    Get PDF
    Mass spectrometry (MS) is increasingly being used to probe the structure and dynamics of proteins and the complexes they form with other macromolecules. There are now several specialized MS methods, each with unique sample preparation, data acquisition, and data processing protocols. Collectively, these methods are referred to as structural MS and include cross-linking, hydrogen-deuterium exchange, hydroxyl radical footprinting, native, ion mobility, and top-down MS. Each of these provides a unique type of structural information, ranging from composition and stoichiometry through to residue level proximity and solvent accessibility. Structural MS has proved particularly beneficial in studying protein classes for which analysis by classic structural biology techniques proves challenging such as glycosylated or intrinsically disordered proteins. To capture the structural details for a particular system, especially larger multiprotein complexes, more than one structural MS method with other structural and biophysical techniques is often required. Key to integrating these diverse data are computational strategies and software solutions to facilitate this process. We provide a background to the structural MS methods and briefly summarize other structural methods and how these are combined with MS. We then describe current state of the art approaches for the integration of structural MS data for structural biology. We quantify how often these methods are used together and provide examples where such combinations have been fruitful. To illustrate the power of integrative approaches, we discuss progress in solving the structures of the proteasome and the nuclear pore complex. We also discuss how information from structural MS, particularly pertaining to protein dynamics, is not currently utilized in integrative workflows and how such information can provide a more accurate picture of the systems studied. We conclude by discussing new developments in the MS and computational fields that will further enable in-cell structural studies

    Phosphoproteomic Profiling of In Vivo Signaling in Liver by the Mammalian Target of Rapamycin Complex 1 (mTORC1)

    Get PDF
    Our understanding of signal transduction networks in the physiological context of an organism remains limited, partly due to the technical challenge of identifying serine/threonine phosphorylated peptides from complex tissue samples. In the present study, we focused on signaling through the mammalian target of rapamycin (mTOR) complex 1 (mTORC1), which is at the center of a nutrient- and growth factor-responsive cell signaling network. Though studied extensively, the mechanisms involved in many mTORC1 biological functions remain poorly understood.We developed a phosphoproteomic strategy to purify, enrich and identify phosphopeptides from rat liver homogenates. Using the anticancer drug rapamycin, the only known target of which is mTORC1, we characterized signaling in liver from rats in which the complex was maximally activated by refeeding following 48 hr of starvation. Using protein and peptide fractionation methods, TiO(2) affinity purification of phosphopeptides and mass spectrometry, we reproducibly identified and quantified over four thousand phosphopeptides. Along with 5 known rapamycin-sensitive phosphorylation events, we identified 62 new rapamycin-responsive candidate phosphorylation sites. Among these were PRAS40, gephyrin, and AMP kinase 2. We observed similar proportions of increased and reduced phosphorylation in response to rapamycin. Gene ontology analysis revealed over-representation of mTOR pathway components among rapamycin-sensitive phosphopeptide candidates.In addition to identifying potential new mTORC1-mediated phosphorylation events, and providing information relevant to the biology of this signaling network, our experimental and analytical approaches indicate the feasibility of large-scale phosphoproteomic profiling of tissue samples to study physiological signaling events in vivo

    AN ACCURATE AND EFFICIENT ALGORITHM FOR PEPTIDE AND PTM IDENTIFICATION BY TANDEM MASS SPECTROMETRY

    Full text link

    Conformational states of macromolecular assemblies explored by integrative structure calculation

    Get PDF
    A detailed description of macromolecular assemblies in multiple conformational states can be very valuable for understanding cellular processes. At present, structural determination of most assemblies in different biologically relevant conformations cannot be achieved by a single technique and thus requires an integrative approach that combines information from multiple sources. Different techniques require different computational methods to allow efficient and accurate data processing and analysis. Here, we summarize the latest advances and future challenges in computational methods that help the interpretation of data from two techniques—mass spectrometry and three-dimensional cryo-electron microscopy (with focus on alignment and classification of heterogeneous subtomograms from cryo-electron tomography). We evaluate how new developments in these two broad fields will lead to further integration with atomic structures to broaden our picture of the dynamic behavior of assemblies in their native environment

    Structure of the human signal peptidase complex reveals the determinants for signal peptide cleavage

    Get PDF
    The signal peptidase complex (SPC) is an essential membrane complex in the endoplasmic reticulum (ER), where it removes signal peptides (SPs) from a large variety of secretory pre-proteins with exquisite specificity. Although the determinants of this process have been established empirically, the molecular details of SP recognition and removal remain elusive. Here, we show that the human SPC exists in two functional paralogs with distinct proteolytic subunits. We determined the atomic structures of both paralogs using electron cryo-microscopy and structural proteomics. The active site is formed by a catalytic triad and abuts the ER membrane, where a transmembrane window collectively formed by all subunits locally thins the bilayer. Molecular dynamics simulations indicate that this unique architecture generates specificity for SPs based on the length of their hydrophobic segments

    Parallel algorithms for real-time peptide-spectrum matching

    Get PDF
    Tandem mass spectrometry is a powerful experimental tool used in molecular biology to determine the composition of protein mixtures. It has become a standard technique for protein identification. Due to the rapid development of mass spectrometry technology, the instrument can now produce a large number of mass spectra which are used for peptide identification. The increasing data size demands efficient software tools to perform peptide identification. In a tandem mass experiment, peptide ion selection algorithms generally select only the most abundant peptide ions for further fragmentation. Because of this, the low-abundance proteins in a sample rarely get identified. To address this problem, researchers develop the notion of a `dynamic exclusion list', which maintains a list of newly selected peptide ions, and it ensures these peptide ions do not get selected again for a certain time. In this way, other peptide ions will get more opportunity to be selected and identified, allowing for identification of peptides of lower abundance. However, a better method is to also include the identification results into the `dynamic exclusion list' approach. In order to do this, a real-time peptide identification algorithm is required. In this thesis, we introduce methods to improve the speed of peptide identification so that the `dynamic exclusion list' approach can use the peptide identification results without affecting the throughput of the instrument. Our work is based on RT-PSM, a real-time program for peptide-spectrum matching with statistical significance. We profile the speed of RT-PSM and find out that the peptide-spectrum scoring module is the most time consuming portion. Given by the profiling results, we introduce methods to parallelize the peptide-spectrum scoring algorithm. In this thesis, we propose two parallel algorithms using different technologies. We introduce parallel peptide-spectrum matching using SIMD instructions. We implemented and tested the parallel algorithm on Intel SSE architecture. The test results show that a 18-fold speedup on the entire process is obtained. The second parallel algorithm is developed using NVIDIA CUDA technology. We describe two CUDA kernels based on different algorithms and compare the performance of the two kernels. The more efficient algorithm is integrated into RT-PSM. The time measurement results show that a 190-fold speedup on the scoring module is achieved and 26-fold speedup on the entire process is obtained. We perform profiling on the CUDA version again to show that the scoring module has been optimized sufficiently to the point where it is no longer the most time-consuming module in the CUDA version of RT-PSM. In addition, we evaluate the feasibility of creating a metric index to reduce the number of candidate peptides. We describe evaluation methods, and show that general indexing methods are not likely feasible for RT-PSM

    A parent mass filter algorithm for peptide sequencing from tandem mass spectra

    Get PDF
    Master'sMASTER OF SCIENC
    corecore