49 research outputs found

    Computational analysis of unassigned high-quality MS/MS spectra in proteomic data sets

    Full text link
    In a typical shotgun proteomics experiment, a significant number of high-quality MS/MS spectra remain “unassigned.” The main focus of this work is to improve our understanding of various sources of unassigned high-quality spectra. To achieve this, we designed an iterative computational approach for more efficient interrogation of MS/MS data. The method involves multiple stages of database searching with different search parameters, spectral library searching, blind searching for modified peptides, and genomic database searching. The method is applied to a large publicly available shotgun proteomic data set.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/77526/1/2712_ftp.pd

    Calorie restriction alters mitochondrial protein acetylation

    Full text link
    Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/72130/1/j.1474-9726.2009.00503.x.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/72130/2/ACEL_503_sm_FigS1.pd

    SAW: A Method to Identify Splicing Events from RNA-Seq Data Based on Splicing Fingerprints

    Get PDF
    Splicing event identification is one of the most important issues in the comprehensive analysis of transcription profile. Recent development of next-generation sequencing technology has generated an extensive profile of alternative splicing. However, while many of these splicing events are between exons that are relatively close on genome sequences, reads generated by RNA-Seq are not limited to alternative splicing between close exons but occur in virtually all splicing events. In this work, a novel method, SAW, was proposed for the identification of all splicing events based on short reads from RNA-Seq. It was observed that short reads not in known gene models are actually absent words from known gene sequences. An efficient method to filter and cluster these short reads by fingerprint fragments of splicing events without aligning short reads to genome sequences was developed. Additionally, the possible splicing sites were also determined without alignment against genome sequences. A consensus sequence was then generated for each short read cluster, which was then aligned to the genome sequences. Results demonstrated that this method could identify more than 90% of the known splicing events with a very low false discovery rate, as well as accurately identify, a number of novel splicing events between distant exons

    Metabolites of Purine Nucleoside Phosphorylase (NP) in Serum Have the Potential to Delineate Pancreatic Adenocarcinoma

    Get PDF
    Pancreatic Adenocarcinoma (PDAC), the fourth highest cause of cancer related deaths in the United States, has the most aggressive presentation resulting in a very short median survival time for the affected patients. Early detection of PDAC is confounded by lack of specific markers that has motivated the use of high throughput molecular approaches to delineate potential biomarkers. To pursue identification of a distinct marker, this study profiled the secretory proteome in 16 PDAC, 2 carcinoma in situ (CIS) and 7 benign patients using label-free mass spectrometry coupled to 1D-SDS-PAGE and Strong Cation-Exchange Chromatography (SCX). A total of 431 proteins were detected of which 56 were found to be significantly elevated in PDAC. Included in this differential set were Parkinson disease autosomal recessive, early onset 7 (PARK 7) and Alpha Synuclein (aSyn), both of which are known to be pathognomonic to Parkinson's disease as well as metabolic enzymes like Purine Nucleoside Phosphorylase (NP) which has been exploited as therapeutic target in cancers. Tissue Microarray analysis confirmed higher expression of aSyn and NP in ductal epithelia of pancreatic tumors compared to benign ducts. Furthermore, extent of both aSyn and NP staining positively correlated with tumor stage and perineural invasion while their intensity of staining correlated with the existence of metastatic lesions in the PDAC tissues. From the biomarker perspective, NP protein levels were higher in PDAC sera and furthermore serum levels of its downstream metabolites guanosine and adenosine were able to distinguish PDAC from benign in an unsupervised hierarchical classification model. Overall, this study for the first time describes elevated levels of aSyn in PDAC as well as highlights the potential of evaluating NP protein expression and levels of its downstream metabolites to develop a multiplex panel for non-invasive detection of PDAC

    Use of proteomics as a method of enhancing genome annotation.

    Full text link
    While the human genome was sequenced in 2001, its full annotation continues to be a challenging area of research. Our inability to clearly define the location of a gene and its protein products are major obstacles to completing the annotation. In recent years, alternative splicing (AS) as been recognized as a major biological factor contributing to increased complexity in the genome. It is through AS that a single genomic locus can encode for multiple protein products. In the case of the human genome, AS enables an estimated 40,000 known proteins to be produced from the approximately 23,000 gene loci currently annotated. The number of known proteins is still considered only an estimate and does not take into account, tissue-specific proteins and post-translational modifications. Numerous computational methods have been developed to identify coding regions in genomic sequence. However, even the best software packages available today are only 70-75% accurate. In order to overcome this, it is important to combine computational methods with experimental data in order to achieve better annotations. With the advent of high-throughput proteomics, large amounts of tandem mass spectral data are now being generated. This data can be interrogated to determine what proteins were present in the original biological sample analyzed. However, searching mass spectral data using databases of known proteins would overlook potentially novel AS isoforms. To address this concern it is necessary to search a database that contains both known and potential protein sequences. In this thesis we describe the use of established technologies to identify novel exons to previously annotated genes. We search a large repository of mass spectral data collected from human blood against an exhaustive six-frame translation of the human genome. In addition, the subset of the mass spectral data was searched using a database of putative alternatively spliced transcripts constructed from expression data to validate poorly characterized transcripts. Through our six-frame translation searches, we identified 275 novel exons to known genes. In searching our putative transcript database we were able give proteomics-based support for 78 transcripts for which there was previously only weak expression data evidence.Ph.D.BioinformaticsBiological SciencesUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/126687/2/3276157.pd

    Significance Analysis of Spectral Count Data in Label-free Shotgun Proteomics*S⃞

    No full text
    Spectral counting has become a commonly used approach for measuring protein abundance in label-free shotgun proteomics. At the same time, the development of data analysis methods has lagged behind. Currently most studies utilizing spectral counts rely on simple data transforms and posthoc corrections of conventional signal-to-noise ratio statistics. However, these adjustments can neither handle the bias toward high abundance proteins nor deal with the drawbacks due to the limited number of replicates. We present a novel statistical framework (QSpec) for the significance analysis of differential expression with extensions to a variety of experimental design factors and adjustments for protein properties. Using synthetic and real experimental data sets, we show that the proposed method outperforms conventional statistical methods that search for differential expression for individual proteins. We illustrate the flexibility of the model by analyzing a data set with a complicated experimental design involving cellular localization and time course

    The Spatial Form of Houses Built by Italian Migrants in Post WWII Brisbane, Australia

    No full text
    The literature reveals that despite the study of the relationship between human behavior, activities and built form has focused on physical spatial environments at any scale, ranging from built environment to built form, the investigation of micro-scale housing has been neglected in the past. Namely, regardless of the interest to this relationship, direct assessment of the extent to which migrants’ human behavior and activities influence and are also influenced by the spatial form of their houses is still rare in the field. This paper focuses on the exploration of the relationship between human behavior, activities and the spatial form of houses built by Italian migrants in post WWII Brisbane. The paper argues that the spatial form of migrants’ houses was influenced by two factors: the need to perform working and social activities dictated by culture as a way of life; urbanization patterns present in migrants’ native and host built environment

    Comparative Analysis of Different Label-Free Mass Spectrometry Based Protein Abundance Estimates and Their Correlation with RNA-Seq Gene Expression Data

    No full text
    An increasing number of studies involve integrative analysis of gene and protein expression data taking advantage of new technologies such as next-generation transcriptome sequencing (RNA-Seq) and highly sensitive mass spectrometry (MS) instrumentation. Thus, it becomes interesting to revisit the correlative analysis of gene and protein expression data using more recently generated data sets. Furthermore, within the proteomics community there is a substantial interest in comparing the performance of different label-free quantitative proteomic strategies. Gene expression data can be used as an indirect benchmark for such protein-level comparisons. In this work we use publicly available mouse data to perform a joint analysis of genomic and proteomic data obtained on the same organism. First, we perform a comparative analysis of different label-free protein quantification methods (intensity based and spectral count based and using various associated data normalization steps) using several software tools on the proteomic side. Similarly, we perform correlative analysis of gene expression data derived using microarray and RNA-Seq methods on the genomic side. We also investigate the correlation between gene and protein expression data, and various factors affecting the accuracy of quantitation at both levels. It is observed that spectral count based protein abundance metrics, which are easy to extract from any published data, are comparable to intensity based measures with respect to correlation with gene expression data. The results of this work should be useful for designing robust computational pipelines for extraction and joint analysis of gene and protein expression data in the context of integrative studies

    Comparative Analysis of Different Label-Free Mass Spectrometry Based Protein Abundance Estimates and Their Correlation with RNA-Seq Gene Expression Data

    No full text
    An increasing number of studies involve integrative analysis of gene and protein expression data taking advantage of new technologies such as next-generation transcriptome sequencing (RNA-Seq) and highly sensitive mass spectrometry (MS) instrumentation. Thus, it becomes interesting to revisit the correlative analysis of gene and protein expression data using more recently generated data sets. Furthermore, within the proteomics community there is a substantial interest in comparing the performance of different label-free quantitative proteomic strategies. Gene expression data can be used as an indirect benchmark for such protein-level comparisons. In this work we use publicly available mouse data to perform a joint analysis of genomic and proteomic data obtained on the same organism. First, we perform a comparative analysis of different label-free protein quantification methods (intensity based and spectral count based and using various associated data normalization steps) using several software tools on the proteomic side. Similarly, we perform correlative analysis of gene expression data derived using microarray and RNA-Seq methods on the genomic side. We also investigate the correlation between gene and protein expression data, and various factors affecting the accuracy of quantitation at both levels. It is observed that spectral count based protein abundance metrics, which are easy to extract from any published data, are comparable to intensity based measures with respect to correlation with gene expression data. The results of this work should be useful for designing robust computational pipelines for extraction and joint analysis of gene and protein expression data in the context of integrative studies

    Comparative Analysis of Different Label-Free Mass Spectrometry Based Protein Abundance Estimates and Their Correlation with RNA-Seq Gene Expression Data

    No full text
    An increasing number of studies involve integrative analysis of gene and protein expression data taking advantage of new technologies such as next-generation transcriptome sequencing (RNA-Seq) and highly sensitive mass spectrometry (MS) instrumentation. Thus, it becomes interesting to revisit the correlative analysis of gene and protein expression data using more recently generated data sets. Furthermore, within the proteomics community there is a substantial interest in comparing the performance of different label-free quantitative proteomic strategies. Gene expression data can be used as an indirect benchmark for such protein-level comparisons. In this work we use publicly available mouse data to perform a joint analysis of genomic and proteomic data obtained on the same organism. First, we perform a comparative analysis of different label-free protein quantification methods (intensity based and spectral count based and using various associated data normalization steps) using several software tools on the proteomic side. Similarly, we perform correlative analysis of gene expression data derived using microarray and RNA-Seq methods on the genomic side. We also investigate the correlation between gene and protein expression data, and various factors affecting the accuracy of quantitation at both levels. It is observed that spectral count based protein abundance metrics, which are easy to extract from any published data, are comparable to intensity based measures with respect to correlation with gene expression data. The results of this work should be useful for designing robust computational pipelines for extraction and joint analysis of gene and protein expression data in the context of integrative studies
    corecore