21 research outputs found
ProSight PTM 2.0: improved protein identification and characterization for top down mass spectrometry
ProSight PTM 2.0 (http://prosightptm2.scs.uiuc.edu) is the next generation of the ProSight PTM web-based system for the identification and characterization of proteins using top down tandem mass spectrometry. It introduces an entirely new data-driven interface, integrated Sequence Gazer for protein characterization, support for fixed modifications, terminal modifications and improved support for multiple precursor ions (multiplexing). Furthermore, it supports data import and export for local analysis and collaboration
The complexity of bioinformatics: techniques for addressing the combinatorial explosion in proteomics and genomics
The last decade has seen an explosion of data arising from the development and proliferation of high-throughput data gathering and analysis pipelines. In order to transform this data into useful hypotheses and conclusions, it is necessary to determine which of it is pertinent to the problem being studied, and sometimes, conversely, which of many hypotheses being considered is best supported by the data at hand. In particular, the field of proteomics often grapples with this challenge, due to being at the confluence of a large number of high-throughput data pipelines. This work presents a series of computational frameworks that address this challenge in a manner that is both computationally efficient and biologically informative, acting as selective filters for the vast amount of data being processed.
A system is first presented to vastly reduce the potential combinatoric complexity of post-translational modifications (PTMs) and coding single nucleotide polymorphisms (cSNPs) for Top Down proteomics. Top Down proteomics is uniquely susceptible to a combinatorial explosion; as sequence length increases, the number of potential combinations of mass shift-inducing sequence features increases exponentially. This may be addressed to some extent by the process of shotgun annotation, where combinations of known PTMs and cSNPs are considered. This is in contrast to the rule-based variable modification approach prevalent in Bottom Up proteomics, where all residues of a given type are considered to be potentially modified in a specified manner. However, as high-throughput annotation pipelines vastly increase the number of known modifications and polymorphisms, the number of their combinations grows exponentially and eventually becomes unmanageable. It becomes necessary to restrict the potential combination space in a manner that does not unduly impinge on the identification and characterization capabilities of shotgun annotation. Built as part of a general framework for sequence transformation, the system being presented utilizes a genetic algorithm to identify a group of PTMs and cSNPs that is most suitable for inclusion in a shotgun-annotated sequence database. Additionally, a number of other advancements are presented in the bioinformatics of Top Down proteomics, including a cluster implementation of the ProSight search engine, and a design plan for the next generation of ProSight, built using the principles of online sequence transformation and optimization. This addresses the combinatorial explosion by providing means of efficiently restricting the search space, minimizing the amount of duplicated effort, and leveraging modern processor design to maximize throughput.
Second, genetic algorithms are applied to the problem of de novo peptide sequencing in Bottom Up proteomics by means of ultra-high-resolution mass spectrometry. Rather than detecting large numbers of less accurate fragment peaks as is presently typical in Bottom Up proteomics, detecting fragment ions at high resolution results in smaller numbers of highly accurate monoisotopic masses after deisotoping. This allows potential de novo sequence solutions to have exceedingly low fragment mass degeneracy. Presently, algorithms for de novo peptide sequencing that fully take advantage of this capability have been lacking. A system is presented for incorporating numerous metrics of solution quality simultaneously to evolve a sequence solution that best fits available data. The nature of proteomic data and its amenability to analysis by means of genetic algorithms is discussed. This system demonstrates highly confident automatic de novo peptide sequencing using a small number of confident fragment masses, potentially measured at the limits of detection.
Third, a system is presented for the efficient discovery of protein-DNA interactions by means of multiple simultaneous gene expression measurements. A major problem in discovering transcription factor binding motifs is that identifying overrepresented sequence motifs is insufficient; most are noise, and some only bind transcription factors under specific biological conditions. It is possible to identify real motifs by the correlation of their presence to differential gene expression under a particular biological condition. By employing multivariate penalized regression, the system described is capable of efficiently identifying transcription factor binding motifs whose presence strongly correlates with gene expression in the measured biological condition from amongst hundreds of candidates. A small, highly confident set of motifs is selected, which may be used for further bioinformatic studies, or as targets for in vivo or in vitro experiments
A matched-pair analysis of stereotactic body radiotherapy (SBRT) for oligometastatic lung tumors from colorectal cancer versus early stage non-small cell lung cancer
Abstract Background The use of stereotactic body radiotherapy (SBRT) for early-stage primary non-small cell lung cancer (NSCLC) reported excellent local control rates. But the optimal SBRT dose for oligometastatic lung tumors (OLTs) from colorectal cancer (CRC) has not yet been determined. This study aimed to evaluate whether SBRT to a dose of 48–60 Gy in 4–5 fractions could result in similar local outcomes for OLTs from CRC as compared to early-stage NSCLC, and to examine potential dose-response relationships for OLTs from CRC. Methods OLTs from CRC and primary NSCLCs treated with SBRT to 48–60 Gy in 4–5 fractions at a single institution were evaluated, and a matched-pair analysis was performed. Local recurrence-free survival (LRFS) was estimated by the Kaplan-Meier method. Univariate Cox regression was performed to identify significant predictors. Results There were 72 lung lesions in 61 patients (24 OLTs from CRC in 15 patients and 48 NSCLCs in 46 patients) were analyzed with a median follow-up of 30 months. LRFS for OLTs from CRC was significantly worse than that of NSCLC when treated with 48–60 Gy/4–5 fx (p = 0.006). The 1, 3 and 5-year LRFS of OLTs from CRC vs NSCLC were 80.6% vs. 100%, 68.6% vs. 97.2%, and 68.6% vs. 81.0%, respectively. On univariate analysis, OLTs from CRC treated with higher dose (BED10 = 132 Gy) exhibited significantly better local recurrence-free survival than those treated to lower doses (BED10 ≤ 105.6 Gy) (p = 0.0022). The 1 and 3-year LRFS rates for OLTs treated to a higher dose (BED10 = 132 Gy) were 88.9% and 81.5%, vs 33.3%, and not achieved for lower doses (BED10 ≤ 105.6 Gy). Conclusion The LRFS of OLTs from CRC after SBRT of 48–60 Gy/4–5 fx was significantly worse than that of primary NSCLC. Lower dose SBRT appeared to have inferior control for OLTs of CRC in this cohort. Further studies with larger sample sizes are needed
Quantitative Peptidomics for Discovery of Circadian-Related Peptides from the Rat Suprachiasmatic Nucleus
In mammals the suprachiasmatic nucleus (SCN), the master
circadian
clock, is sensitive to light input via the optic chiasm and synchronizes
many daily biological rhythms. Here we explore variations in the expression
levels of neuropeptides present in the SCN of rats using a label-free
quantification approach that is based on integrating peak intensities
between daytime, Zeitgeber time (ZT) 6, and nighttime, ZT 18. From
nine analyses comparing the levels between these two time points,
10 endogenous peptides derived from eight prohormones exhibited significant
differences in their expression levels (adjusted <i>p</i>-value <0.05). Of these, seven peptides derived from six prohormones,
including GRP, PACAP, and CART, exhibited ≥30% increases at
ZT 18, and the VGRPEWWMDYQ peptide derived from proenkephalin
A showed a >50% increase at nighttime. Several endogenous peptides
showing statistically significant changes in this study have not been
previously reported to alter their levels as a function of time of
day, nor have they been implicated in prior functional SCN studies.
This information on peptide expression changes serves as a resource
for discovering unknown peptide regulators that affect circadian rhythms
in the SCN
Comparing Label-Free Quantitative Peptidomics Approaches to Characterize Diurnal Variation of Peptides in the Rat Suprachiasmatic Nucleus
Mammalian
circadian rhythm is maintained by the suprachiasmatic nucleus (SCN)
via an intricate set of neuropeptides and other signaling molecules.
In this work, peptidomic analyses from two times of day were examined
to characterize variation in SCN peptides using three different label-free
quantitation approaches: spectral count, spectra index and SIEVE.
Of the 448 identified peptides, 207 peptides were analyzed by two
label-free methods, spectral count and spectral index. There were
24 peptides with significant (adjusted <i>p</i>-value <
0.01) differential peptide abundances between daytime and nighttime,
including multiple peptides derived from secretogranin II, cocaine
and amphetamine regulated transcript, and proprotein convertase subtilisin/kexin
type 1 inhibitor. Interestingly, more peptides were analyzable and
had significantly different abundances between the two time points
using the spectral count and spectral index methods than with a prior
analysis using the SIEVE method with the same data. The results of
this study reveal the importance of using the appropriate data analysis
approaches for label-free relative quantitation of peptides. The detection
of significant changes in so rich a set of neuropeptides reflects
the dynamic nature of the SCN and the number of influences such as
feeding behavior on circadian rhythm. Using spectral count and spectral
index, peptide level changes are correlated to time of day, suggesting
their key role in circadian function
A Proteomic Survey of Nonribosomal Peptide and Polyketide Biosynthesis in Actinobacteria
Actinobacteria such as streptomycetes are renowned for their ability to produce bioactive natural products including nonribosomal peptides (NRPs) and polyketides (PKs). The advent of genome sequencing has revealed an even larger genetic repertoire for secondary metabolism with most of the small molecule products of these gene clusters still unknown. Here, we employed a “protein-first” method called PrISM (Proteomic Investigation of Secondary Metabolism) to screen 26 unsequenced actinomycetes using mass spectrometry-based proteomics for the targeted detection of expressed nonribosomal peptide synthetases or polyketide synthases. Improvements to the original PrISM screening approach (Nat. Biotechnol. 2009, 27, 951−956), for example, improved <i>de novo</i> peptide sequencing, have enabled the discovery of 10 NRPS/PKS gene clusters from 6 strains. Taking advantage of the concurrence of biosynthetic enzymes and the secondary metabolites they generate, two natural products were associated with their previously “orphan” gene clusters. This work has demonstrated the feasibility of a proteomics-based strategy for use in screening for NRP/PK production in actinomycetes (often >8 Mbp, high GC genomes) versus the bacilli (2–4 Mbp genomes) used previously