1,458 research outputs found

    Improved quality control processing of peptide-centric LC-MS proteomics data

    Get PDF
    Motivation: In the analysis of differential peptide peak intensities (i.e. abundance measures), LC-MS analyses with poor quality peptide abundance data can bias downstream statistical analyses and hence the biological interpretation for an otherwise high-quality dataset. Although considerable effort has been placed on assuring the quality of the peptide identification with respect to spectral processing, to date quality assessment of the subsequent peptide abundance data matrix has been limited to a subjective visual inspection of run-by-run correlation or individual peptide components. Identifying statistical outliers is a critical step in the processing of proteomics data as many of the downstream statistical analyses [e.g. analysis of variance (ANOVA)] rely upon accurate estimates of sample variance, and their results are influenced by extreme values

    Current challenges in software solutions for mass spectrometry-based quantitative proteomics

    Get PDF
    This work was in part supported by the PRIME-XS project, grant agreement number 262067, funded by the European Union seventh Framework Programme; The Netherlands Proteomics Centre, embedded in The Netherlands Genomics Initiative; The Netherlands Bioinformatics Centre; and the Centre for Biomedical Genetics (to S.C., B.B. and A.J.R.H); by NIH grants NCRR RR001614 and RR019934 (to the UCSF Mass Spectrometry Facility, director: A.L. Burlingame, P.B.); and by grants from the MRC, CR-UK, BBSRC and Barts and the London Charity (to P.C.

    Computational Framework for Data-Independent Acquisition Proteomics.

    Full text link
    Mass spectrometry (MS) is one of the main techniques for high throughput discovery- and targeted-based proteomics experiments. The most popular method for MS data acquisition has been data dependent acquisition (DDA) strategy which primarily selects high abundance peptides for MS/MS sequencing. DDA incorporates stochastic data acquisitions to avoid repetitive sequencing of same peptide, resulting in relatively irreproducible results for low abundance peptides between experiments. Data independent acquisition (DIA), in which peptide fragment signals are systematically acquired, is emerging as a promising alternative to address the DDA's stochasticity. DIA results in more complex signals, posing computational challenges for complex sample and high-throughput analysis. As a result, targeted extraction which requires pre-existing spectral libraries has been the most commonly used approach for automated DIA data analysis. However, building spectral libraries requires additional amount of analysis time and sample materials which are the major barriers for most research groups. In my dissertation, I develop a computational tool called DIA-Umpire, which includes computational and signal processing algorithms to enable untargeted DIA identification and quantification analysis without any prior spectral library. In the first study, a signal feature detection algorithm is developed to extract and assemble peptide precursor and fragment signals into pseudo MS/MS spectra which can be analyzed by the existing DDA untargeted analysis tools. This novel step enables direct and untargeted (spectral library-free) DIA identification analysis and we show the performance using complex samples including human cell lysate and glycoproteomics datasets. In the second study, a hybrid approach is developed to further improve the DIA quantification sensitivity and reproducibility. The performance of DIA-Umpire quantification approach is demonstrated using an affinity-purification mass spectrometry experiment for protein-protein interaction analysis. Lastly, in the third study, I improve the DIA-Umpire pipeline for data obtained from the Orbitrap family of mass spectrometers. Using public datasets, I show that the improved version of DIA-Umpire is capable of highly sensitive, untargeted analysis of DIA data for the data generated using Orbitrap family of mass spectrometers. The dissertation work addresses the barriers of DIA analysis and should facilitate the adoption of DIA strategy for a broad range of discovery proteomics applications.PhDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120699/1/tsouc_1.pd

    A peptide-centric quantitative proteomics dataset for the phenotypic assessment of Alzheimer\u27s disease

    Get PDF
    Alzheimer\u27s disease (AD) is a looming public health disaster with limited interventions. Alzheimer\u27s is a complex disease that can present with or without causative mutations and can be accompanied by a range of age-related comorbidities. This diverse presentation makes it difficult to study molecular changes specific to AD. To better understand the molecular signatures of disease we constructed a unique human brain sample cohort inclusive of autosomal dominant AD dementia (ADD), sporadic ADD, and those without dementia but with high AD histopathologic burden, and cognitively normal individuals with no/minimal AD histopathologic burden. All samples are clinically well characterized, and brain tissue was preserved postmortem by rapid autopsy. Samples from four brain regions were processed and analyzed by data-independent acquisition LC-MS/MS. Here we present a high-quality quantitative dataset at the peptide and protein level for each brain region. Multiple internal and external control strategies were included in this experiment to ensure data quality. All data are deposited in the ProteomeXchange repositories and available from each step of our processing

    A primary human T-cell spectral library to facilitate large scale quantitative T-cell proteomics.

    Get PDF
    Data independent analysis (DIA) exemplified by sequential window acquisition of all theoretical mass spectra (SWATH-MS) provides robust quantitative proteomics data, but the lack of a public primary human T-cell spectral library is a current resource gap. Here, we report the generation of a high-quality spectral library containing data for 4,833 distinct proteins from human T-cells across genetically unrelated donors, covering ~24% proteins of the UniProt/SwissProt reviewed human proteome. SWATH-MS analysis of 18 primary T-cell samples using the new human T-cell spectral library reliably identified and quantified 2,850 proteins at 1% false discovery rate (FDR). In comparison, the larger Pan-human spectral library identified and quantified 2,794 T-cell proteins in the same dataset. As the libraries identified an overlapping set of proteins, combining the two libraries resulted in quantification of 4,078 human T-cell proteins. Collectively, this large data archive will be a useful public resource for human T-cell proteomic studies. The human T-cell library is available at SWATHAtlas and the data are available via ProteomeXchange (PXD019446 and PXD019542) and PeptideAtlas (PASS01587)

    Proteomic Tools for Food and Feed Authentication

    Get PDF
    På grunn av globalt økende etterspørsel etter mat og fôr, introduseres nye proteinholdige ingredienser i matsystemene våre i økende skala. Innføring av nye ingredienser og introduksjon av sirkulære matsystemer gir nye utfordringer når det gjelder metoder for avsløring av henholdsvis fôr- og matsvindel. I denne sammenhengen er det viktig å utvikle raske, sensitive og robuste molekylære metoder som kan implementeres i kontroll og overvåkningsøyemed. Tidligere har fremskritt ved bruk av slike verktøy blitt hemmet av en generell mangel på annoterte referansegenomer for målarter som ofte brukes, eller nylig er introdusert, i fôr eller matpreparater. Fokuset for denne doktorgraden er å utvikle og implementere massespektrometriske metoder (LC-MS/MS) som er i stand til å identifisere, differensiere og kvantifisere proteinholdige ingredienser av animalsk og planteopprinnelse i ulike mat- og fôrblandinger ved bruk av massespektra fingeravtrykk. Arbeidet som presenteres i denne doktorgraden omfatter «bottom-up» proteomiske arbeidsflyter ved bruk av høytrykksvæskekromatografi (HPLC) tandem massespektrometri (MS/MS). Databehandling ble utført ved å bruke direkte spektrasammenligning (compareMS2) og spektrabibliotekmatching (SLM) analyser ved bruk av verktøy fra Trans-Proteomics Pipeline (TPP) og annen åpen kildekode til bioinformatisk programvare. Alle data generert og publisert i løpet av denne doktorgraden har blitt gjort tilgjengelig på offentlige repositrium for MS-data, for eksempel Mass Spectrometry Interactive Virtual Environment (MassIVE), som følger FAIR-prinsippene. Den SLM baserte arbeidsflyten brukt i denne doktorgraden klarte å differensiere ulike prosesserte animalske proteiner (PAP) som storfemelk og bovint blod. SLM ble også brukt til å differensiere ulike insektarter og for å detektere om larver av svart soldatflue (BSF) var fôret med PAP. SLM-metoden ble også brukt til å identifisere og kvantifisere innholdet i et blandingsprodukt av 3 ulike fiskearter. Det ble også funnet at SLM basert proteomikk kan brukes til å identifisere vanlige allergener i insektsprøver tiltenkt humant konsum. Denne tilnærmingen ble også implementert med suksess for å differensiere mellom soyabønneprøver som var enten dyrket organisk, konvensjonelt eller inneholdt genetiske modifikasjoner (GM). I tillegg ble differensiell proteinekspresjon påvist mellom prøver av GM, konvensjonelt og økologisk dyrkede soyabønner. Dette førte til identifisering av to nye peptidmarkører for effektiv sporing av GM-avlinger i mat og fôr. Denne doktorgraden har vist at den SLM baserte metoden er i stand til å identifisere både art og vevstype brukt i et proteinholdig matprodukt eller fôringredients det være seg PAP, plante-, pattedyr- eller fiskeproteiner. Fremtidig arbeid bør fokusere på differensiering og avsløring av svindel i sjømat, som nylig ble fremhevet som et fremvoksende tema i det globale matmarkedet. Alle arts- og vevsspesifikke MS-data samlet inn i det ovennevnte arbeidet vil gjøres tilgjengelig fra i dedikert nettbaserte tjenester. Sistnevnte utvikles for tiden internt, og etter skikkelig kvalitetstesting er det tenkt å bli utgitt offentlig for å gi forskningsmiljøer og myndigheter en lett tilgjengelig plattform for autentisering og identifisering av proteinholdige ingredienser i fôr- og mat. Due to globally rising demands for food and feed, novel proteinaceous ingredients are introduced into our food systems on an increasing scale. The introduction of novel ingredients and circularity of the food system gives rise to novel challenges concerning the detection of feed and food fraud and the determination of feed and food authenticity, respectively. In this context, developing and increasing the implementation of rapid, sensitive, and robust molecular methods are essential. In the past, progress in applying such tools has been hampered by a general lack of wellannotated reference genomes of target species commonly used or newly introduced in feed or food preparations. This PhD focused on developing and implementing mass spectrometry-based approaches to identify, differentiate, and quantify proteinaceous ingredients of animal and plant origin in various food and feed mixes without using any genomic information. The work presented in this PhD implemented bottom-up proteomic workflows using high-performance liquid chromatography (HPLC) tandem mass spectrometry (MS/MS). Data analyses were done using direct spectra comparison (compareMS2), spectra library matching (SLM), Trans-Proteomics Pipeline (TPP), and MaxQuant software. All data generated and published during this PhD have been made available on public repositories for proteomics data, such as the Mass Spectrometry Interactive Virtual Environment (MassIVE), following Findable, Accessible, Interoperable, and Reusable (FAIR) principles. The untargeted proteomics SLM workflow implemented during this PhD successfully differentiated processed animal proteins such as bovine milk and bovine blood. The SLM was also used to identify and authenticate food and feed-grade insect species and to detect if black soldier fly (BSF) larvae were fed on the prohibited PAP. Using the SLM workflow, it was also possible to quantify and authenticate the different species in fish mixtures containing muscle tissues from three different fish species. It was also shown that untargeted proteomics could be used to identify common allergens in foodgrade insect samples. Also, the proteomic approach was successfully implemented to separate thirty-one ready-to-market soybean samples farmed organically, conventionally, and with genetic modifications (GM). Differential protein expression was detected between GM, conventionally, and organically farmed soybean samples. Additional bioinformatics analyses led to the detection of two novel peptide markers for the efficient tracing of GM crops in food and feed. The proteomic tools implemented during this PhD were capable of species and tissues specific identification of proteinaceous food and feed ingredients, including processed animal proteins, plant, mammalian, and fish proteins. Future work should focus on the differentiation and detection of fraud in food and feed in the global food market. Webbased interphase will be developed for food and feed authentication using spectra libraries created during this PhD. Following proper quality testing, the web-based interphase will be released publicly to provide research and regulatory laboratories with an easily accessible platform for authenticating and identifying protein ingredients in feed and food samples.Doktorgradsavhandlin
    corecore