16 research outputs found
Lessons learned: Linking patient-reported outcomes data with administrative databases
Introduction
Since 2007, Cancer Care Ontario (CCO) has systematically collected patient-reported outcomes (PROs) in the form of symptom data, for cancer outpatients visiting regional cancer centres or affiliate institutions. Data are used in real-time to facilitate conversation between clinicians and patients and have recently been combined with provincial administrative databases.
Objectives and Approach
CCO collects PROs using the Edmonton Symptom Assessment System (ESAS), which scores 9 symptoms on a scale of 0 (no symptoms) to 10 (worst symptom severity). Data were imported from CCO in 2015 and linked to a cancer cohort at ICES. We investigated differences between patients who completed 1 ESAS record and patients who did not, as well as the number of records, timing of data collection and missingness. We describe our experience linking and using the PRO data to administrative data, including presenting trajectories of symptoms over time and combining scores into composite indices.
Results
120,745 cancer patients had 729,861 symptom records between 2007 and 2014. Not all patients with a cancer diagnosis had 1 ESAS record and this varied by patient, disease and system level factors. Because implementation occurred from a clinical perspective, data collection was irregular within and across patients and depended on treatment and other factors; the number of records per patient varied, as well the number of contributing patients in each time period following diagnosis. Attempts were made to create meaningful composite indices by combining all symptom scores as well as combining multiple high scores for each individual symptom. As a result, selecting the best statistical analysis to use these PRO data as an exposure or outcome is still uncertain.
Conclusion/Implications
PRO data linked to provincial, administrative data holdings represent a new frontier for population-based cancer research, both in their challenging structure as well as their implications for clinical practice and health system. These lessons learned will hopefully support other researchers rigorous use of these data in the future
Investigating Associations Between Preoperative Patient-Reported Symptom Burden and Postoperative Outcomes Following Major Cancer Surgery: A Retrospective Cohort Study
Patient-reported outcomes (PROs) are prognostic of long-term survival in cancer patients. However, their association with postoperative outcomes following major oncologic surgery is not well characterized. A retrospective population-based cohort study of rectal cancer patients undergoing neoadjuvant radiotherapy and proctectomy was conducted. Receiver operating characteristic analysis was used to select a scoring approach for the Edmonton Symptom Assessment System to define elevated preoperative symptom burden. Multivariable regression analyses were conducted to investigate associations between preoperative symptom scores and postoperative outcomes. High preoperative symptom scores were not associated with postoperative major morbidity (OR 1.28, 95% CI 0.84-1.97). However, high preoperative symptom scores were associated with prolonged postoperative length of stay (IRR 1.23, 95% CI 1.14-1.32), 30-day hospital readmission (OR 1.74, 95% CI 1.30-2.34), and 30-day post-discharge ED visits (OR 1.34, 95% CI 1.05-1.71). PROs can contribute important information for identification of patients at risk for increased healthcare utilization in the postoperative period.M.Sc
A Comprehensive Evaluation of Consensus Spectrum Generation Methods in Proteomics.
Spectrum clustering is a powerful strategy to minimize redundant mass spectra by grouping them based on similarity, with the aim of forming groups of mass spectra from the same repeatedly measured analytes. Each such group of near-identical spectra can be represented by its so-called consensus spectrum for downstream processing. Although several algorithms for spectrum clustering have been adequately benchmarked and tested, the influence of the consensus spectrum generation step is rarely evaluated. Here, we present an implementation and benchmark of common consensus spectrum algorithms, including spectrum averaging, spectrum binning, the most similar spectrum, and the best-identified spectrum. We have analyzed diverse public data sets using two different clustering algorithms (spectra-cluster and MaRaCluster) to evaluate how the consensus spectrum generation procedure influences downstream peptide identification. The BEST and BIN methods were found the most reliable methods for consensus spectrum generation, including for data sets with post-translational modifications (PTM) such as phosphorylation. All source code and data of the present study are freely available on GitHub at https://github.com/statisticalbiotechnology/representative-spectra-benchmark
A Comprehensive Evaluation of Consensus Spectrum Generation Methods in Proteomics
Spectrum clustering is a powerful strategy to minimize redundant mass spectra by grouping them based on similarity, with the aim of forming groups of mass spectra from the same repeatedly measured analytes. Each such group of near-identical spectra can be represented by its so-called consensus spectrum for downstream processing. Although several algorithms for spectrum clustering have been adequately benchmarked and tested, the influence of the consensus spectrum generation step is rarely evaluated. Here, we present an implementation and benchmark of common consensus spectrum algorithms, including spectrum averaging, spectrum binning, the most similar spectrum, and the best-identified spectrum. We have analyzed diverse public data sets using two different clustering algorithms (spectra-cluster and MaRaCluster) to evaluate how the consensus spectrum generation procedure influences downstream peptide identification. The BEST and BIN methods were found the most reliable methods for consensus spectrum generation, including for data sets with post-translational modifications (PTM) such as phosphorylation. All source code and data of the present study are freely available on GitHub at https://github.com/statisticalbiotechnology/representative-spectra-benchmark
MS/MS-Free Protein Identification in Complex Mixtures Using Multiple Enzymes with Complementary Specificity
In this work, we present the results
of evaluation of a workflow
that employs a multienzyme digestion strategy for MS1-based protein
identification in “shotgun” proteomic applications.
In the proposed strategy, several cleavage reagents of different specificity
were used for parallel digestion of the protein sample followed by
MS1 and retention time (RT) based search. Proof of principle for the
proposed strategy was performed using experimental data obtained for
the annotated 48-protein standard. By using the developed approach,
up to 90% of proteins from the standard were unambiguously identified.
The approach was further applied to HeLa proteome data. For the sample
of this complexity, the proposed MS1-only strategy determined correctly
up to 34% of all proteins identified using standard MS/MS-based database
search. It was also found that the results of MS1-only search were
independent of the chromatographic gradient time in a wide range of
gradients from 15–120 min. Potentially, rapid MS1-only proteome
characterization can be an alternative or complementary to the MS/MS-based
“shotgun” analyses in the studies, in which the experimental
time is more important than the depth of the proteome coverage
MS/MS-Free Protein Identification in Complex Mixtures Using Multiple Enzymes with Complementary Specificity
In this work, we present the results
of evaluation of a workflow
that employs a multienzyme digestion strategy for MS1-based protein
identification in “shotgun” proteomic applications.
In the proposed strategy, several cleavage reagents of different specificity
were used for parallel digestion of the protein sample followed by
MS1 and retention time (RT) based search. Proof of principle for the
proposed strategy was performed using experimental data obtained for
the annotated 48-protein standard. By using the developed approach,
up to 90% of proteins from the standard were unambiguously identified.
The approach was further applied to HeLa proteome data. For the sample
of this complexity, the proposed MS1-only strategy determined correctly
up to 34% of all proteins identified using standard MS/MS-based database
search. It was also found that the results of MS1-only search were
independent of the chromatographic gradient time in a wide range of
gradients from 15–120 min. Potentially, rapid MS1-only proteome
characterization can be an alternative or complementary to the MS/MS-based
“shotgun” analyses in the studies, in which the experimental
time is more important than the depth of the proteome coverage
IdentiPy: An Extensible Search Engine for Protein Identification in Shotgun Proteomics
We
present an open-source, extensible search engine for shotgun
proteomics. Implemented in Python programming language, IdentiPy shows
competitive processing speed and sensitivity compared with the state-of-the-art
search engines. It is equipped with a user-friendly web interface,
IdentiPy Server, enabling the use of a single server installation
accessed from multiple workstations. Using a simplified version of
X!Tandem scoring algorithm and its novel “autotune”
feature, IdentiPy outperforms the popular alternatives on high-resolution
data sets. Autotune adjusts the search parameters for the particular
data set, resulting in improved search efficiency and simplifying
the user experience. IdentiPy with the autotune feature shows higher
sensitivity compared with the evaluated search engines. IdentiPy Server
has built-in postprocessing and protein inference procedures and provides
graphic visualization of the statistical properties of the data set
and the search results. It is open-source and can be freely extended
to use third-party scoring functions or processing algorithms and
allows customization of the search workflow for specialized applications