81 research outputs found

    Improving mass defect filters for human proteins

    Get PDF
    The mass defect of a substance can be used in mass spectral analysis to identify peaks as likely belonging to a compound class, such as peptides, if the mass defect is within the known range for that compound class. For peptides, a range of possible mass defects was calculated previously, using a set of theoretical peptides, where all possible amino acid combinations were considered (Mann, M. Abstract from the 43rd Annual Conference on Mass Spectrometry and Allied Topics; 1995, ASMS). We compare that range of theoretical peptide mass defects to new values obtained from in silico tryptic digests of proteins that are abundant in human serum and human seminal fluid. The range of mass defect values encompassing 95% of peptides for the human protein data sets was found to be up to 50% smaller than the previously reported mass defect range for the theoretical peptides. The smaller range established for human tryptic peptides can be used to improve peptide mass defect filters by excluding more species that are not likely to be peptides, thus improving filter selectivity for peptides during proteomic data analysis

    Advances, obstacles, and opportunities for machine learning in proteomics

    Get PDF
    The fields of proteomics and machine learning are both large disciplines, each producing well over 5,000 publications per year. However, studies combining both fields are still relatively rare, with only about 2% of recent proteomics papers including machine learning. This review, which focuses on the intersection of the fields, is intended to inspire proteomics researchers to develop skills and knowledge in the application of machine learning. A brief tutorial introduction to machine learning is provided, and research advances that rely on both fields, particularly as they relate to proteomics tools development and biomarker discovery, are highlighted. Key knowledge gaps and opportunities for scientific advancement are also enumerated

    Advances, obstacles, and opportunities for machine learning in proteomics

    Get PDF
    The fields of proteomics and machine learning are both large disciplines, each producing well over 5,000 publications per year. However, studies combining both fields are still relatively rare, with only about 2% of recent proteomics papers including machine learning. This review, which focuses on the intersection of the fields, is intended to inspire proteomics researchers to develop skills and knowledge in the application of machine learning. A brief tutorial introduction to machine learning is provided, and research advances that rely on both fields, particularly as they relate to proteomics tools development and biomarker discovery, are highlighted. Key knowledge gaps and opportunities for scientific advancement are also enumerated

    Software for Automated Interpretation of Mass Spectrometry Data from Glycans and Glycopeptides

    Get PDF
    The purpose of this review is to provide those interested in glycosylation analysis with the most updated information on the availability of automated tools for MS characterization of N-linked and O-linked glycosylation types. Specifically, this review describes software tools that facilitate elucidation of glycosylation from MS data on the basis of mass alone, as well as software designed to speed the interpretation of glycan and glycopeptide fragmentation from MS/MS data. This review focuses equally on software designed to interpret the composition of released glycans and on tools to characterize N-linked and O-linked glycopeptides. Several websites have been compiled and described that will be helpful to the reader who is interested in further exploring the described tools

    Rapid LC-MS Based High-Throughput Screening Method, Affording No False Positives or False Negatives, Identifies a New Inhibitor for Carbonic Anhydrase

    Get PDF
    Developing effective high-throughput screening (HTS) methods is of paramount importance in the early stage of drug discovery. While rugged and robust assays may be easily developed for certain enzymes, HTS assays designed to identify ligands that block protein binding are much more challenging to develop; attenuating the number of false positives and false negatives under high-throughput screening conditions is particularly difficult. We describe an MS-based HTS workflow that addresses these challenges. The assay mitigates false positives by selectively identifying positive hits exclusively when a ligand at the binding site of interest is displaced; it mitigates false negatives by detecting a reporter compound that ionizes well, not by detecting the ligand binder, which may not ionize. The method was validated by detecting known binders of three proteins, pepsin, maltose binding protein (MBP), and carbonic anhydrase (CA) in the presence of hundreds of non-binders. We also identified a novel CA binder, pifithrin-”, which could not have been identified by any other MS-based assay because of its poor ionization efficiency. This new method addresses many of the challenges that are currently encountered during high-throughput screening

    Absolute Quantitation of Glycosylation Site Occupancy Using Isotopically Labeled Standards and LC-MS

    Get PDF
    This document is the Accepted Manuscript version of a Published Work that appeared in final form in the Journal of the American Chemical Society, copyright © American Chemical Society after peer review and technical editing by the publisher. To access the final edited and published work see http://doi.org/10.1007/s13361-014-0859-2.N-linked glycans are required to maintain appropriate biological functions on proteins. Underglycosylation leads to many diseases in plants and animals; therefore, characterizing the extent of glycosylation on proteins is an important step in understanding, diagnosing, and treating diseases. To determine the glycosylation site occupancy, protein N-glycosidase F (PNGase F) is typically used to detach the glycan from the protein, during which the formerly glycosylated asparagine undergoes deamidation to become an aspartic acid. By comparing the abundance of the resulting peptide containing aspartic acid against the one containing non-glycosylated asparagine, the glycosylation site occupancy can be evaluated. However, this approach can give inaccurate results when spontaneous chemical deamidation of the non-glycosylated asparagine occurs. To overcome this limitation, we developed a new method to measure the glycosylation site occupancy that does not rely on converting glycosylated peptides to their deglycosylated forms. Specifically, the overall protein concentration and the non-glycosylated portion of the protein are quantified simultaneously by using heavy isotope-labeled internal standards coupled with LC-MS analysis, and the extent of site occupancy is accurately determined. The efficacy of the method was demonstrated by quantifying the occupancy of a glycosylation site on bovine fetuin. The developed method is the first work that measures the glycosylation site occupancy without using PNGase F, and it can be done in parallel with glycopeptide analysis because the glycan remains intact throughout the workflow

    So You Discovered a Potential Glycan-Based Biomarker; Now What? We Developed a High-Throughput Method for Quantitative Clinical Glycan Biomarker Validation

    Get PDF
    Glycomic-based approaches to discover potential biomarkers have shown great promise in their ability to distinguish between healthy and diseased individuals; these methods can identify when aberrant glycosylation is significant, but they cannot practically be adapted into widely implemented diagnostic assays because they are too complex, expensive, and low-throughput. We have developed a new strategy that addresses challenges associated with sample preparation, sample throughput, instrumentation needs, and data analysis to transfer the valuable knowledge provided by protein glycosylation into a clinical environment. Notably, the detection limits of the assay are in the single-digit picomole range. Proof of principle is demonstrated by quantifying the changes in the sialic acid content in fetuin. As the sialic acid content in proteins varies in a number of disease states, this example demonstrates the utility of the method for biomarker analysis. Furthermore, the developed method can be adapted to other biologically important saccharides, affording a broad array of quantitative glycomic analyses that are accessible in a high-throughput, plate-reader format. These studies enable glycomic-based biomarker discovery efforts to transition through the difficult landscape of developing a potential biomarker into a clinical assay

    Analysis of the disulfide bond arrangement of the HIV envelope protein CON-S gp140 ΔCFI shows variability in the V1 and V2 regions

    Get PDF
    Disulfide bonding of cysteines is one of the most important protein modifications, and it plays a key role in establishing/maintaining protein structures in biologically active forms. Therefore, the determination of disulfide bond arrangement is one important aspect to understanding the chemical structure of a protein and defining its functional domains. Herein, aiming to understand how the HIV-1 envelope protein’s structure influences its immunogenicity, we used an MS-based approach, liquid chromatography electrospray ionization Fourier transform ion cyclotron resonance (LC/ESI-FTICR) mass spectrometry, to determine the disulfide linkages on an oligomeric form of the group M consensus HIV-1 envelope protein (Env), CON-S gp140 ΔCFI. This protein has marked improvement in its immunogenicity, compared to monomeric gp120 and wild-type forms of gp140 Envs. Our results demonstrate that the disulfide connectivity in the Nterminal region of CON-S gp140 ΔCFI is different from the disulfide bonding previously reported in the monomeric form of gp120 HIV-1 Env. Additionally, heterogeneity of the disulfide bonding was detected in this region. These data suggest that the V1/V2 region does not have a single, conserved disulfide bonding pattern, and that variability could impact immunogenicity of expressed Envs

    GlycoPep MassList: Software to Generate Massive Inclusion Lists for Glycopeptide Analyses

    Get PDF
    Protein glycosylation drives many biological processes and serves as markers for disease; therefore, the development of tools to study glycosylation is an essential and growing area of research. Mass spectrometry can be used to identify both the glycans of interest and the glycosylation sites to which those glycans are attached, when proteins are proteolytically digested and their glycopeptides are analyzed by a combination of high-resolution mass spectrometry (MS) and tandem mass spectrometry (MS/MS) methods. One major challenge in these experiments is collecting the requisite MS/MS data. The digested glycopeptides are often present in complex mixtures and in low abundance, and the most commonly used approach to collect MS/MS data on these species is data-dependent acquisition (DDA), where only the most intense precursor ions trigger MS/MS. DDA results in limited glycopeptide coverage. Semi-targeted data acquisition is an alternative experimental approach that can alleviate this difficulty. However, due to the massive heterogeneity of glycopeptides, it is not obvious how to expediently generate inclusion lists for these types of analyses. To solve this problem, we developed the software tool GlycoPep MassList, which can be used to generate inclusion lists for liquid chromatography tandem-mass spectrometry (LC-MS/MS) experiments. The utility of the software was tested by conducting comparisons between semi-targeted and untargeted data-dependent analysis experiments on a variety of proteins, including IgG, a protein whose glycosylation must be characterized during its production as a biotherapeutic. When the GlycoPep MassList software was used to generate inclusion lists for LC-MS/MS experiments, more unique glycopeptides were selected for fragmentation. Generally, ∌30 % more unique glycopeptides can be analyzed per protein, in the simplest cases, with low background. In cases where background ions from proteins or other interferents are high, usage of an inclusion list is even more advantageous. The software is freely publically accessible

    Distinguishing academic science writing from humans or ChatGPT with over 99% accuracy using off-the-shelf machine learning tools

    Get PDF
    ChatGPT has enabled access to artificial intelligence (AI)-generated writing for the masses, initiating a culture shift in the way people work, learn, and write. The need to discriminate human writing from AI is now both critical and urgent. Addressing this need, we report a method for discriminating text generated by ChatGPT from (human) academic scientists, relying on prevalent and accessible supervised classification methods. The approach uses new features for discriminating (these) humans from AI; as examples, scientists write long paragraphs and have a penchant for equivocal language, frequently using words like “but,” “however,” and “although.” With a set of 20 features, we built a model that assigns the author, as human or AI, at over 99% accuracy. This strategy could be further adapted and developed by others with basic skills in supervised classification, enabling access to many highly accurate and targeted models for detecting AI usage in academic writing and beyond
    • 

    corecore