23 research outputs found

    The interactome of KRAB zinc finger proteins reveals the evolutionary history of their functional diversification.

    Get PDF
    KrĂŒppel-associated box (KRAB)-containing zinc finger proteins (KZFPs) are encoded in the hundreds by the genomes of higher vertebrates, and many act with the heterochromatin-inducing KAP1 as repressors of transposable elements (TEs) during early embryogenesis. Yet, their widespread expression in adult tissues and enrichment at other genetic loci indicate additional roles. Here, we characterized the protein interactome of 101 of the ~350 human KZFPs. Consistent with their targeting of TEs, most KZFPs conserved up to placental mammals essentially recruit KAP1 and associated effectors. In contrast, a subset of more ancient KZFPs rather interacts with factors related to functions such as genome architecture or RNA processing. Nevertheless, KZFPs from coelacanth, our most distant KZFP-encoding relative, bind the cognate KAP1. These results support a hypothetical model whereby KZFPs first emerged as TE-controlling repressors, were continuously renewed by turnover of their hosts' TE loads, and occasionally produced derivatives that escaped this evolutionary flushing by development and exaptation of novel functions

    Complex-centric Proteome Profiling by SEC-SWATH Mass Spectrometry

    No full text
    Living cells are highly ordered structures composed of different classes of molecules that form networks of interacting components to carry out cellular functions. Among the cellular biomolecules, proteins constitute the main effectors. Most functions, however, are not carried out by individual proteins alone, but by proteins arranged into complexes as functional units. A core goal in modern systems biology research is to understand at large how this physical interplay regulates proteome function and, specifically, the generation of distinct phenotypes as a function of genotype and environmental cues. Accordingly, methods to measure the physical organization of the proteome at breadth and along distinct functional states are dearly sought after. This thesis describes the development of a method to measure the arrangement of cellular proteins into complexes on a broad scale and in a systematic and quantitative fashion. We term the method complex-centric proteome profiling by SEC-SWATH-MS. It consists of recording size exclusion chromatography co-fractionation profiles of the protein components of native complexes by SWATH Mass Spectrometry. To infer the underlying complexes, concepts from peptide-centric analysis of SWATH Mass Spectrometry data are extended to the level of complex detection based on fragment subunit co-fractionation along size exclusion chromatography, leveraging prior knowledge on proteome connectivity from public repositories as basis for targeted, complex-centric analysis. In essence, the method takes as input (i), cell systems in one or more conditions and (ii), prior generic information on proteome connectivity and produces as output a quantitative assessment of the chromatographically resolved protein complexes and complex variants as liberated from the cell system. The core contents are the development of building blocks essential to the technique (Section II), the development of the computational framework implementing complex-centric analysis (Section III) as well as first applications to study the modular organization of the HEK293 cell line proteome in steady state and to study the re-organization of the modular proteome organization in two distinct states of cell cycle progression in the HeLa CCL2 cell line (Section IV). In summary, this PhD thesis describes the development of the experimental and computational resources to conduct complex-centric profiling of proteomes, generating ‘pictures’ of the cellular protein complex landscape at unprecedented breadth and level of detail. The capacity to depict proteome modularity at such level of detail and throughput outlines a road towards routine measurement of the proteome including its contextual state, rather than as the sum of its parts. Such studies are bound to generate systems-wide, mechanistic and conclusive insight into yet unknown aspects of cellular function - in both health and disease

    SWATH2stats: An R/bioconductor package to process and convert quantitative SWATH-MS proteomics data for downstream analysis tools

    No full text
    SWATH-MS is an acquisition and analysis technique of targeted proteomics that enables measuring several thousand proteins with high reproducibility and accuracy across many samples. OpenSWATH is popular open-source software for peptide identification and quantification from SWATH-MS data. For downstream statistical and quantitative analysis there exist different tools such as MSstats, mapDIA and aLFQ. However, the transfer of data from OpenSWATH to the downstream statistical tools is currently technically challenging. Here we introduce the R/Bioconductor package SWATH2stats, which allows convenient processing of the data into a format directly readable by the downstream analysis tools. In addition, SWATH2stats allows annotation, analyzing the variation and the reproducibility of the measurements, FDR estimation, and advanced filtering before submitting the processed data to downstream tools. These functionalities are important to quickly analyze the quality of the SWATH-MS data. Hence, SWATH2stats is a new open-source tool that summarizes several practical functionalities for analyzing, processing, and converting SWATH-MS data and thus facilitates the efficient analysis of large-scale SWATH/DIA datasets.ISSN:1932-620

    SWATH2stats: An R/Bioconductor Package to Process and Convert Quantitative SWATH-MS Proteomics Data for Downstream Analysis Tools

    No full text
    <div><p>SWATH-MS is an acquisition and analysis technique of targeted proteomics that enables measuring several thousand proteins with high reproducibility and accuracy across many samples. OpenSWATH is popular open-source software for peptide identification and quantification from SWATH-MS data. For downstream statistical and quantitative analysis there exist different tools such as MSstats, mapDIA and aLFQ. However, the transfer of data from OpenSWATH to the downstream statistical tools is currently technically challenging. Here we introduce the R/Bioconductor package SWATH2stats, which allows convenient processing of the data into a format directly readable by the downstream analysis tools. In addition, SWATH2stats allows annotation, analyzing the variation and the reproducibility of the measurements, FDR estimation, and advanced filtering before submitting the processed data to downstream tools. These functionalities are important to quickly analyze the quality of the SWATH-MS data. Hence, SWATH2stats is a new open-source tool that summarizes several practical functionalities for analyzing, processing, and converting SWATH-MS data and thus facilitates the efficient analysis of large-scale SWATH/DIA datasets.</p></div

    MSLibrarian: Optimized Predicted Spectral Libraries for Data-Independent Acquisition Proteomics

    No full text
    Data-independent acquisition-mass spectrometry (DIA-MS) is the method of choice for deep, consistent, and accurate single-shot profiling in bottom-up proteomics. While classic workflows for targeted quantification from DIA-MS data require auxiliary data-dependent acquisition (DDA) MS analysis of subject samples to derive prior-knowledge spectral libraries, library-free approaches based on in silico prediction promise deep DIA-MS profiling with reduced experimental effort and cost. Coverage and sensitivity in such analyses are however limited, in part, by the large library size and persistent deviations from the experimental data. We present MSLibrarian, a new workflow and tool to obtain optimized predicted spectral libraries by the integrated usage of spectrum-centric DIA data interpretation via the DIA-Umpire approach to inform and calibrate the in silico predicted library and analysis approach. Predicted-vs-observed comparisons enabled optimization of intensity prediction parameters, calibration of retention time prediction for deviating chromatographic setups, and optimization of the library scope and sample representativeness. Benchmarking via a dedicated ground-truth-embedded experiment of species-mixed proteins and quantitative ratio-validation confirmed gains of up to 13% on peptide and 8% on protein level at equivalent FDR control and validation criteria. MSLibrarian is made available as an open-source R software package, including step-by-step user instructions, at https://github.com/MarcIsak/MSLibrarian

    Functions included in the SWATH2stats package.

    No full text
    <p>Functions included in the SWATH2stats package.</p

    Multienzyme deep learning models improve peptide de novo sequencing by mass spectrometry proteomics

    No full text
    Generating and analyzing overlapping peptides through multienzymatic digestion is an efficient procedure for de novo protein using from bottom-up mass spectrometry (MS). Despite improved instrumentation and software, de novo MS data analysis remains challenging. In recent years, deep learning models have represented a performance breakthrough. Incorporating that technology into de novo protein sequencing workflows require machine-learning models capable of handling highly diverse MS data. In this study, we analyzed the requirements for assembling such generalizable deep learning models by systemcally varying the composition and size of the training set. We assessed the generated models' performances using two test sets composed of peptides originating from the multienzyme digestion of samples from various species. The peptide recall values on the test sets showed that the deep learning models generated from a collection of highly N- and C-termini diverse peptides generalized 76% more over the termini-restricted ones. Moreover, expanding the training set's size by adding peptides from the multienzymatic digestion with five proteases of several species samples led to a 2-3 fold generalizability gain. Furthermore, we tested the applicability of these multienzyme deep learning (MEM) models by fully de novo sequencing the heavy and light monomeric chains of five commercial antibodies (mAbs). MEMs extracted over 10000 matching and overlapped peptides across six different proteases mAb samples, achieving a 100% sequence coverage for 8 of the ten polypeptide chains. We foretell that the MEMs' proven improvements to de novo analysis will positively impact several applications, such as analyzing samples of high complexity, unknown nature, or the peptidomics field

    Rapid Profiling of Protein Complex Reorganization in Perturbed Systems

    No full text
    Protein complexes constitute the primary functional modules of cellular activity. To respond to perturbations, complexes undergo changes in their abundance, subunit composition, or state of modification. Understanding the function of biological systems requires global strategies to capture this contextual state information. Methods based on cofractionation paired with mass spectrometry have demonstrated the capability for deep biological insight, but the scope of studies using this approach has been limited by the large measurement time per biological sample and challenges with data analysis. There has been little uptake of this strategy into the broader life science community despite its rich biological information content. We present a rapid integrated experimental and computational workflow to assess the reorganization of protein complexes across multiple cellular states. The workflow combines short gradient chromatography and DIA/SWATH mass spectrometry with a data analysis toolset to quantify changes in a complex organization. We applied the workflow to study the global protein complex rearrangements of THP-1 cells undergoing monocyte to macrophage differentiation and subsequent stimulation of macrophage cells with lipopolysaccharide. We observed substantial proteome reorganization on differentiation and less pronounced changes in macrophage stimulation. We establish our integrated differential pipeline for rapid and state-specific profiling of protein complex organization.ISSN:1535-3893ISSN:1535-390

    DIP-MS: ultra-deep interaction proteomics for the deconvolution of protein complexes

    No full text
    Most proteins are organized in macromolecular assemblies, which represent key functional units regulating and catalyzing most cellular processes. Affinity purification of the protein of interest combined with liquid chromatography coupled to tandem mass spectrometry (AP-MS) represents the method of choice to identify interacting proteins. The composition of complex isoforms concurrently present in the AP sample can, however, not be resolved from a single AP-MS experiment but requires computational inference from multiple time- and resource-intensive reciprocal AP-MS experiments. Here we introduce deep interactome profiling by mass spectrometry (DIP-MS), which combines AP with blue-native-PAGE separation, data-independent acquisition with mass spectrometry and deep-learning-based signal processing to resolve complex isoforms sharing the same bait protein in a single experiment. We applied DIP-MS to probe the organization of the human prefoldin family of complexes, resolving distinct prefoldin holo- and subcomplex variants, complex-complex interactions and complex isoforms with new subunits that were experimentally validated. Our results demonstrate that DIP-MS can reveal proteome modularity at unprecedented depth and resolution.ISSN:1548-7105ISSN:1548-709