29 research outputs found
Effective Leveraging of Targeted Search Spaces for Improving Peptide Identification in Tandem Mass Spectrometry Based Proteomics
In shotgun proteomics,
peptides are typically identified using
database searching, which involves scoring acquired tandem mass spectra
against peptides derived from standard protein sequence databases
such as Uniprot, Refseq, or Ensembl. In this strategy, the sensitivity
of peptide identification is known to be affected by the size of the
search space. Therefore, creating a targeted sequence database containing
only peptides likely to be present in the analyzed sample can be a
useful technique for improving the sensitivity of peptide identification.
In this study, we describe how targeted peptide databases can be created
based on the frequency of identification in the global proteome machine
database (GPMDB), the largest publicly available repository of peptide
and protein identification data. We demonstrate that targeted peptide
databases can be easily integrated into existing proteome analysis
workflows and describe a computational strategy for minimizing any
loss of peptide identifications arising from potential search space
incompleteness in the targeted search spaces. We demonstrate the performance
of our workflow using several data sets of varying size and sample
complexity
The Spatial Form of Houses Built by Italian Migrants in Post WWII Brisbane, Australia
The literature reveals that despite the study of the relationship between human behavior, activities and built form has focused on physical spatial environments at any scale, ranging from built environment to built form, the investigation of micro-scale housing has been neglected in the past. Namely, regardless of the interest to this relationship, direct assessment of the extent to which migrants’ human behavior and activities influence and are also influenced by the spatial form of their houses is still rare in the field. This paper focuses on the exploration of the relationship between human behavior, activities and the spatial form of houses built by Italian migrants in post WWII Brisbane. The paper argues that the spatial form of migrants’ houses was influenced by two factors: the need to perform working and social activities dictated by culture as a way of life; urbanization patterns present in migrants’ native and host built environment
Comparative Analysis of Different Label-Free Mass Spectrometry Based Protein Abundance Estimates and Their Correlation with RNA-Seq Gene Expression Data
An increasing number of studies involve integrative analysis
of
gene and protein expression data taking advantage of new technologies
such as next-generation transcriptome sequencing (RNA-Seq) and highly
sensitive mass spectrometry (MS) instrumentation. Thus, it becomes
interesting to revisit the correlative analysis of gene and protein
expression data using more recently generated data sets. Furthermore,
within the proteomics community there is a substantial interest in
comparing the performance of different label-free quantitative proteomic
strategies. Gene expression data can be used as an indirect benchmark
for such protein-level comparisons. In this work we use publicly available
mouse data to perform a joint analysis of genomic and proteomic data
obtained on the same organism. First, we perform a comparative analysis
of different label-free protein quantification methods (intensity
based and spectral count based and using various associated data normalization
steps) using several software tools on the proteomic side. Similarly,
we perform correlative analysis of gene expression data derived using
microarray and RNA-Seq methods on the genomic side. We also investigate
the correlation between gene and protein expression data, and various
factors affecting the accuracy of quantitation at both levels. It
is observed that spectral count based protein abundance metrics, which
are easy to extract from any published data, are comparable to intensity
based measures with respect to correlation with gene expression data.
The results of this work should be useful for designing robust computational
pipelines for extraction and joint analysis of gene and protein expression
data in the context of integrative studies
Comparative Analysis of Different Label-Free Mass Spectrometry Based Protein Abundance Estimates and Their Correlation with RNA-Seq Gene Expression Data
An increasing number of studies involve integrative analysis
of
gene and protein expression data taking advantage of new technologies
such as next-generation transcriptome sequencing (RNA-Seq) and highly
sensitive mass spectrometry (MS) instrumentation. Thus, it becomes
interesting to revisit the correlative analysis of gene and protein
expression data using more recently generated data sets. Furthermore,
within the proteomics community there is a substantial interest in
comparing the performance of different label-free quantitative proteomic
strategies. Gene expression data can be used as an indirect benchmark
for such protein-level comparisons. In this work we use publicly available
mouse data to perform a joint analysis of genomic and proteomic data
obtained on the same organism. First, we perform a comparative analysis
of different label-free protein quantification methods (intensity
based and spectral count based and using various associated data normalization
steps) using several software tools on the proteomic side. Similarly,
we perform correlative analysis of gene expression data derived using
microarray and RNA-Seq methods on the genomic side. We also investigate
the correlation between gene and protein expression data, and various
factors affecting the accuracy of quantitation at both levels. It
is observed that spectral count based protein abundance metrics, which
are easy to extract from any published data, are comparable to intensity
based measures with respect to correlation with gene expression data.
The results of this work should be useful for designing robust computational
pipelines for extraction and joint analysis of gene and protein expression
data in the context of integrative studies
Comparative Analysis of Different Label-Free Mass Spectrometry Based Protein Abundance Estimates and Their Correlation with RNA-Seq Gene Expression Data
An increasing number of studies involve integrative analysis
of
gene and protein expression data taking advantage of new technologies
such as next-generation transcriptome sequencing (RNA-Seq) and highly
sensitive mass spectrometry (MS) instrumentation. Thus, it becomes
interesting to revisit the correlative analysis of gene and protein
expression data using more recently generated data sets. Furthermore,
within the proteomics community there is a substantial interest in
comparing the performance of different label-free quantitative proteomic
strategies. Gene expression data can be used as an indirect benchmark
for such protein-level comparisons. In this work we use publicly available
mouse data to perform a joint analysis of genomic and proteomic data
obtained on the same organism. First, we perform a comparative analysis
of different label-free protein quantification methods (intensity
based and spectral count based and using various associated data normalization
steps) using several software tools on the proteomic side. Similarly,
we perform correlative analysis of gene expression data derived using
microarray and RNA-Seq methods on the genomic side. We also investigate
the correlation between gene and protein expression data, and various
factors affecting the accuracy of quantitation at both levels. It
is observed that spectral count based protein abundance metrics, which
are easy to extract from any published data, are comparable to intensity
based measures with respect to correlation with gene expression data.
The results of this work should be useful for designing robust computational
pipelines for extraction and joint analysis of gene and protein expression
data in the context of integrative studies
Comparative Analysis of Different Label-Free Mass Spectrometry Based Protein Abundance Estimates and Their Correlation with RNA-Seq Gene Expression Data
An increasing number of studies involve integrative analysis
of
gene and protein expression data taking advantage of new technologies
such as next-generation transcriptome sequencing (RNA-Seq) and highly
sensitive mass spectrometry (MS) instrumentation. Thus, it becomes
interesting to revisit the correlative analysis of gene and protein
expression data using more recently generated data sets. Furthermore,
within the proteomics community there is a substantial interest in
comparing the performance of different label-free quantitative proteomic
strategies. Gene expression data can be used as an indirect benchmark
for such protein-level comparisons. In this work we use publicly available
mouse data to perform a joint analysis of genomic and proteomic data
obtained on the same organism. First, we perform a comparative analysis
of different label-free protein quantification methods (intensity
based and spectral count based and using various associated data normalization
steps) using several software tools on the proteomic side. Similarly,
we perform correlative analysis of gene expression data derived using
microarray and RNA-Seq methods on the genomic side. We also investigate
the correlation between gene and protein expression data, and various
factors affecting the accuracy of quantitation at both levels. It
is observed that spectral count based protein abundance metrics, which
are easy to extract from any published data, are comparable to intensity
based measures with respect to correlation with gene expression data.
The results of this work should be useful for designing robust computational
pipelines for extraction and joint analysis of gene and protein expression
data in the context of integrative studies
Comparative Analysis of Different Label-Free Mass Spectrometry Based Protein Abundance Estimates and Their Correlation with RNA-Seq Gene Expression Data
An increasing number of studies involve integrative analysis
of
gene and protein expression data taking advantage of new technologies
such as next-generation transcriptome sequencing (RNA-Seq) and highly
sensitive mass spectrometry (MS) instrumentation. Thus, it becomes
interesting to revisit the correlative analysis of gene and protein
expression data using more recently generated data sets. Furthermore,
within the proteomics community there is a substantial interest in
comparing the performance of different label-free quantitative proteomic
strategies. Gene expression data can be used as an indirect benchmark
for such protein-level comparisons. In this work we use publicly available
mouse data to perform a joint analysis of genomic and proteomic data
obtained on the same organism. First, we perform a comparative analysis
of different label-free protein quantification methods (intensity
based and spectral count based and using various associated data normalization
steps) using several software tools on the proteomic side. Similarly,
we perform correlative analysis of gene expression data derived using
microarray and RNA-Seq methods on the genomic side. We also investigate
the correlation between gene and protein expression data, and various
factors affecting the accuracy of quantitation at both levels. It
is observed that spectral count based protein abundance metrics, which
are easy to extract from any published data, are comparable to intensity
based measures with respect to correlation with gene expression data.
The results of this work should be useful for designing robust computational
pipelines for extraction and joint analysis of gene and protein expression
data in the context of integrative studies
Utility of RNA-seq and GPMDB Protein Observation Frequency for Improving the Sensitivity of Protein Identification by Tandem MS
Tandem mass spectrometry (MS/MS)
followed by database search is
the method of choice for protein identification in proteomic studies.
Database searching methods employ spectral matching algorithms and
statistical models to identify and quantify proteins in a sample.
In general, these methods do not utilize any information other than
spectral data for protein identification. However, considering the
wealth of external data available for many biological systems, analysis
methods can incorporate such information to improve the sensitivity
of protein identification. In this study, we present a method to utilize
Global Proteome Machine Database identification frequencies and RNA-seq
transcript abundances to adjust the confidence scores of protein identifications.
The method described is particularly useful for samples with low-to-moderate
proteome coverage (i.e., <2000–3000 proteins), where we
observe up to an 8% improvement in the number of proteins identified
at a 1% false discovery rate
SAINT-MS1: Protein–Protein Interaction Scoring Using Label-free Intensity Data in Affinity Purification-Mass Spectrometry Experiments
We present a statistical method SAINT-MS1 for scoring
protein–protein
interactions based on the label-free MS1 intensity data from affinity
purification-mass spectrometry (AP-MS) experiments. The method is
an extension of Significance Analysis of INTeractome (SAINT), a model-based
method previously developed for spectral count data. We reformulated
the statistical model for log-transformed intensity data, including
adequate treatment of missing observations, that is, interactions
identified in some but not all replicate purifications. We demonstrate
the performance of SAINT-MS1 using two recently published data sets:
a small LTQ-Orbitrap data set with three replicate purifications of
single human bait protein and control purifications and a larger drosophila
data set targeting insulin receptor/target of rapamycin signaling
pathway generated using an LTQ-FT instrument. Using the drosophila
data set, we also compare and discuss the performance of SAINT analysis
based on spectral count and MS1 intensity data in terms of the recovery
of orthologous and literature-curated interactions. Given rapid advances
in high mass accuracy instrumentation and intensity-based label-free
quantification software, we expect that SAINT-MS1 will become a useful
tool allowing improved detection of protein interactions in label-free
AP-MS data, especially in the low abundance range
Implementing the MSFragger Search Engine as a Node in Proteome Discoverer
Here, we describe the implementation
of the fast proteomics search
engine MSFragger as a processing node in the widely used Proteome
Discoverer (PD) software platform. PeptideProphet (via the Philosopher
tool kit) is also implemented as an additional PD node to allow validation
of MSFragger open (mass-tolerant) search results. These two nodes,
along with the existing Percolator validation module, allow users
to employ different search strategies and conveniently inspect search
results through PD. Our results have demonstrated the improved numbers
of PSMs, peptides, and proteins identified by MSFragger coupled with
Percolator and significantly faster search speed compared to the conventional
SEQUEST/Percolator PD workflows. The MSFragger-PD node is available
at https://github.com/nesvilab/PD-Nodes/releases/