Top-down liquid chromatography-mass spectrometry (LC-MS)
analyzes
intact proteoforms and generates mass spectra containing peaks of
proteoforms with various isotopic compositions, charge states, and
retention times. An essential step in top-down MS data analysis is
proteoform feature detection, which aims to group these peaks into
peak sets (features), each containing all peaks of a proteoform. Accurate
protein feature detection enhances the accuracy in MS-based proteoform
identification and quantification. Here, we present TopFD, a software
tool for top-down MS feature detection that integrates algorithms
for proteoform feature detection, feature boundary refinement, and
machine learning models for proteoform feature evaluation. We performed
extensive benchmarking of TopFD, ProMex, FlashDeconv, and Xtract using
seven top-down MS data sets and demonstrated that TopFD outperforms
other tools in feature accuracy, reproducibility, and feature abundance
reproducibility