Search CORE

2 research outputs found

Improving Peptide Identification in Proteomics Data Analysis through Repeat-Preserving Decoy and Decoy-Free Retraining

Author: Moosa Johra Muhammad
Publication venue: 'University of Waterloo'
Publication date: 25/12/2023
Field of study

Accurately identifying peptides in proteomics is central to understanding the complexities of biological systems. Despite the advancements in proteomics data analysis, challenges related to False Discovery Rate (FDR) estimation and peptide identification persist. This thesis offers two novel contributions that address these pressing issues. The first part of the thesis focuses on a critical issue plaguing traditional target-decoy approaches—the inability to preserve repeated peptide structures in decoy databases. Addressing this, we introduce a novel algorithm for decoy database generation that utilizes the de Bruijn graph model. This innovative method effectively conserves the structural repeats found in target protein databases, thereby significantly enhancing the precision of FDR estimations. Comparative evaluations reveal that our de Bruijn graph-based model excels in FDR accuracy and increases the rate of peptide identifications, outperforming existing algorithms. The second part introduces a machine learning-based retraining strategy for refining Peptide-Spectrum Matches (PSMs). Unlike traditional methods that draw from target and decoy databases for positive and negative training examples, our research presents a novel strategy for calculating \textit{next-best} PSMs. Specifically, our approach employs the \textit{best} and the \textit{next-best} peptides from the same spectrum as the respective positive and negative examples for training. We introduce a tailored solution involving a split database search to address the critical requirement for a sufficient quantity of \textit{next-best} PSMs to estimate the accurate separation between true and false distribution. This innovative decoy-free training paradigm yields notable improvements in peptide identification rates while preserving the integrity of FDR estimations. The effectiveness of this approach has been corroborated through empirical testing, including integration with well-known algorithms like Mokapot and the application of various machine-learning algorithms such as logistic regression, XGBoost, and neural networks. The thesis also explores the broader implications and possible extensions of the proposed decoy-free re-training method to complement these core contributions. It speculates how the concept of \textit{next-best} PSMs could be adapted for other proteomics applications like FDR estimation on spectral library search. This line of inquiry opens new avenues for future research. In summary, the research encapsulated in this thesis advances the field of bottom-up proteomics by offering solutions for more accurate FDR estimation and enhanced peptide identification. As such, it serves as a foundational framework for future research and presents immediate applications for more reliable and robust proteomics data analysis

University of Waterloo's Institutional Repository

Gene selection for cancer classification with the help of bees

Author: A Balmain
A Banharnsakun
A Bhattacharjee
A Brazma
A Choudhary
A Dussutour
A Farji-Brener
A Statnikov
A Statnikov
AG Karegowda
AI Su
AV Tinker
B Wu
BJ Norton
BK Verma
C Giallourakis
C Lazar
C Xu
CA Markowski
CC Chang
CJ Tu
CL Nutt
CM Bishop
D Chen
D Karaboga
D Karaboga
D Karaboga
D Karaboga
D Karaboga
D Karaboga
D Singh
D Teodorovic
DM Gordon
DM Gordon
DM Gordon
DV Nguyen
EL Lehmann
ER Dougherty
F Ahmade
F Emmert-Streib
F Kang
F Kang
F Roces
F Roces
F Wilcoxon
FJ Rodriguez
G George
G Li
G Stephanopoulos
G Xu
G Yan
G Zhu
GEP Box
H Drias
H Hu
H Liu
H Shah
H Sharma
H Torres-Contreras
H Yu
H Zhang
HF Wedde
I Eksin
I Guyon
I Guyon
I Inza
J Hamidi
J Ji
J Kennedy
J Khan
J Kiefer
J Wang
J Xu
J-Q Li
JC Bansal
JC Bansal
JC Chang
JD Gibbons
JE Staunton
JG Zhang
JH Cho
JJ Howard
JJ Liu
JL Deneubourg
Johra Muhammad Moosa
JW Lee
L Breiman
L Deng
L Lan
L Li
L Wang
LW Jacobs
LY Chuang
LY Chuang
LY Chuang
LY Chuang
M Bollazzi
M Dorigo
M Hollander
M Kefayat
M Mohamad
M Pirooznia
M Schena
MA Shipp
MA Tahir
MH Kashan
MJ Greene
Mohammad Kaykobad
Mohammad Sohel Rahman
MS Mohamad
MS Mohamad
MS Mohamad
N Todorovic
OK Erol
P Mukherjee
PA Devijver
PE Lønning
PW TSai
PY Kumbhar
Q Shen
Q Zhou
QK Pan
QK Pan
R Akbari
R Cai
R Debnath
R Díaz-Uriarte
R Hooke
R Kohavi
R Kohavi
R Mallika
R Murugan
R Ruiz
Rameen Shakur
RJ Schafer
RN Khushaba
S Bicciato
S Bitam
S Dudoit
S Guo
S Knudsen
S Kumar
S Kumar
S Li
S Omkar
S Pavlidis
S Ramaswamy
S Siegel
S Sundar
S Wang
S Yang
SA Armstrong
SL Pomeroy
SL Wang
SP Fodor
SS Jadon
SS Jeffrey
T Davidović
T Li
T Stützle
TK Sharma
TM Cover
TR Golub
TS Furey
V Saravanan
V Tereshko
V Tereshko
V Tereshko
VN Vapnik
W Li
W Li
W Szeto
W-F Gao
WH Au
WH Kruskal
WH Press
X Wang
X Yan
X Yu
X Zhou
Y Leung
Y Lu
Y Saeys
Y Tan
Y Wang
Y Wang
Y Xu
Y Zhang
Y Zhang
Z Liu
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref