260 research outputs found

    The PRIDE database and related tools and resources in 2019: improving support for quantification data

    No full text
    The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data, and is one of the founding members of the global ProteomeXchange (PX) consortium. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2016. In the last 3years, public data sharing through PRIDE (as part of PX) has definitely become the norm in the field. In parallel, data re-use of public proteomics data has increased enormously, with multiple applications. We first describe the new architecture of PRIDE Archive, the archival component of PRIDE. PRIDE Archive and the related data submission framework have been further developed to support the increase in submitted data volumes and additional data types. A new scalable and fault tolerant storage backend, Application Programming Interface and web interface have been implemented, as a part of an ongoing process. Additionally, we emphasize the improved support for quantitative proteomics data through the mzTab format. At last, we outline key statistics on the current data contents and volume of downloads, and how PRIDE data are starting to be disseminated to added-value resources including Ensembl, UniProt and Expression Atlas

    MaxDIA enables library-based and library-free data-independent acquisition proteomics

    Get PDF
    MaxDIA is a software platform for analyzing data-independent acquisition (DIA) proteomics data within the MaxQuant software environment. Using spectral libraries, MaxDIA achieves deep proteome coverage with substantially better coefficients of variation in protein quantification than other software. MaxDIA is equipped with accurate false discovery rate (FDR) estimates on both library-to-DIA match and protein levels, including when using whole-proteome predicted spectral libraries. This is the foundation of discovery DIA-hypothesis-free analysis of DIA samples without library and with reliable FDR control. MaxDIA performs three- or four-dimensional feature detection of fragment data, and scoring of matches is augmented by machine learning on the features of an identification. MaxDIA's bootstrap DIA workflow performs multiple rounds of matching with increasing quality of recalibration and stringency of matching to the library. Combining MaxDIA with two new technologies-BoxCar acquisition and trapped ion mobility spectrometry-both lead to deep and accurate proteome quantification. The software platform MaxDIA streamlines analysis of data-independent acquisition proteomics

    The efficacy of various machine learning models for multi-class classification of RNA-seq expression data

    Full text link
    Late diagnosis and high costs are key factors that negatively impact the care of cancer patients worldwide. Although the availability of biological markers for the diagnosis of cancer type is increasing, costs and reliability of tests currently present a barrier to the adoption of their routine use. There is a pressing need for accurate methods that enable early diagnosis and cover a broad range of cancers. The use of machine learning and RNA-seq expression analysis has shown promise in the classification of cancer type. However, research is inconclusive about which type of machine learning models are optimal. The suitability of five algorithms were assessed for the classification of 17 different cancer types. Each algorithm was fine-tuned and trained on the full array of 18,015 genes per sample, for 4,221 samples (75 % of the dataset). They were then tested with 1,408 samples (25 % of the dataset) for which cancer types were withheld to determine the accuracy of prediction. The results show that ensemble algorithms achieve 100% accuracy in the classification of 14 out of 17 types of cancer. The clustering and classification models, while faster than the ensembles, performed poorly due to the high level of noise in the dataset. When the features were reduced to a list of 20 genes, the ensemble algorithms maintained an accuracy above 95% as opposed to the clustering and classification models.Comment: 12 pages, 4 figures, 3 tables, conference paper: Computing Conference 2019, published at https://link.springer.com/chapter/10.1007/978-3-030-22871-2_6

    Comparative proteomics: assessment of biological variability and dataset comparability

    Get PDF
    BACKGROUND: Comparative proteomics in bacteria are often hampered by the differential nature of dataset quality and/or inherent biological deviations. Although common practice compensates by reproducing and normalizing datasets from a single sample, the degree of certainty is limited in comparison of multiple dataset. To surmount these limitations, we introduce a two-step assessment criterion using: (1) the relative number of total spectra (R (TS)) to determine if two LC-MS/MS datasets are comparable and (2) nine glycolytic enzymes as internal standards for a more accurate calculation of relative amount of proteins. Lactococcus lactis HR279 and JHK24 strains expressing high or low levels (respectively) of green fluorescent protein (GFP) were used for the model system. GFP abundance was determined by spectral counting and direct fluorescence measurements. Statistical analysis determined relative GFP quantity obtained from our approach matched values obtained from fluorescence measurements. RESULTS: L. lactis HR279 and JHK24 demonstrates two datasets with an R (TS) value less than 1.4 accurately reflects relative differences in GFP levels between high and low expression strains. Without prior consideration of R (TS) and the use of internal standards, the relative increase in GFP calculated by spectral counting method was 3.92 ± 1.14 fold, which is not correlated with the value determined by the direct fluorescence measurement (2.86 ± 0.42 fold) with the p = 0.024. In contrast, 2.88 ± 0.92 fold was obtained by our approach showing a statistically insignificant difference (p = 0.95). CONCLUSIONS: Our two-step assessment demonstrates a useful approach to: (1) validate the comparability of two mass spectrometric datasets and (2) accurately calculate the relative amount of proteins between proteomic datasets. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0561-9) contains supplementary material, which is available to authorized users

    A FAIR guide for data providers to maximise sharing of human genomic data

    Get PDF
    It is generally acknowledged that, for reproducibility and progress of human genomic research, data sharing is critical. For every sharing transaction, a successful data exchange is produced between a data consumer and a data provider. Providers of human genomic data (e.g., publicly or privately funded repositories and data archives) fulfil their social contract with data donors when their shareable data conforms to FAIR (findable, accessible, interoperable, reusable) principles. Based on our experiences via Repositive (https://repositive.io), a leading discovery platform cataloguing all shared human genomic datasets, we propose guidelines for data providers wishing to maximise their shared data’s FAIRness. Citation: Corpas M, Kovalevskaya NV, McMurray A, Niel

    Ensuring meiotic DNA break formation in the mouse pseudoautosomal region

    Get PDF
    In mice, the pseudoautosomal region of the sex chromosomes undergoes a dynamic structural rearrangement to promote a high rate of DNA double-strand breaks and to ensure X-Y recombination. Sex chromosomes in males of most eutherian mammals share only a small homologous segment, the pseudoautosomal region (PAR), in which the formation of double-strand breaks (DSBs), pairing and crossing over must occur for correct meiotic segregation(1,2). How cells ensure that recombination occurs in the PAR is unknown. Here we present a dynamic ultrastructure of the PAR and identify controlling cis- and trans-acting factors that make the PAR the hottest segment for DSB formation in the male mouse genome. Before break formation, multiple DSB-promoting factors hyperaccumulate in the PAR, its chromosome axes elongate and the sister chromatids separate. These processes are linked to heterochromatic mo-2 minisatellite arrays, and require MEI4 and ANKRD31 proteins but not the axis components REC8 or HORMAD1. We propose that the repetitive DNA sequence of the PAR confers unique chromatin and higher-order structures that are crucial for recombination. Chromosome synapsis triggers collapse of the elongated PAR structure and, notably, oocytes can be reprogrammed to exhibit spermatocyte-like levels of DSBs in the PAR simply by delaying or preventing synapsis. Thus, the sexually dimorphic behaviour of the PAR is in part a result of kinetic differences between the sexes in a race between the maturation of the PAR structure, formation of DSBs and completion of pairing and synapsis. Our findings establish a mechanistic paradigm for the recombination of sex chromosomes during meiosis.Peer reviewe

    Quantitative proteomics analysis reveals important roles of N-glycosylation on ER quality control system for development and pathogenesis in Magnaporthe oryzae

    Get PDF
    The fungal pathogen Magnaporthe oryzae can cause rice blast and wheat blast diseases, which threatens worldwide food production. During infection, M. oryzae follows a sequence of distinct developmental stages adapted to survival and invasion of the host environment. M. oryzae attaches onto the host by the conidium, and then develops an appressorium to breach the host cuticle. After penetrating, it forms invasive hyphae to quickly spread in the host cells. Numerous genetic studies have focused on the mechanisms underlying each step in the infection process, but systemic approaches are needed for a broader, integrated understanding of regulatory events during M. oryzae pathogenesis. Many infection-related signaling events are regulated through post-translational protein modifications within the pathogen. N-linked glycosylation, in which a glycan moiety is added to the amide group of an asparagine residue, is an abundant modification known to be essential for M. oryzae infection. In this study, we employed a quantitative proteomics analysis to unravel the overall regulatory mechanisms of N-glycosylation at different developmental stages of M. oryzae. We detected changes in N-glycosylation levels at 559 glycosylated residues (N-glycosites) in 355 proteins during different stages, and determined that the ER quality control system is elaborately regulated by N-glycosylation. The insights gained will help us to better understand the regulatory mechanisms of infection in pathogenic fungi. These findings may be also important for developing novel strategies for fungal disease control. Genetic studies have shown essential functions of N-glycosylation during infection of the plant pathogenic fungi, however, systematic roles of N-glycosylation in fungi is still largely unknown. Biological analysis demonstrated N-glycosylated proteins were widely present at different development stages of Magnaporthe oryzae and especially increased in the appressorium and invasive hyphae. A large-scale quantitative proteomics analysis was then performed to explore the roles of N-glycosylation in M. oryzae. A total of 559 N-glycosites from 355 proteins were identified and quantified at different developmental stages. Functional classification to the N-glycosylated proteins revealed N-glycosylation can coordinate different cellular processes for mycelial growth, conidium formation, and appressorium formation. N-glycosylation can also modify key components in N-glycosylation, O-glycosylation and GPI anchor pathways, indicating intimate crosstalk between these pathways. Interestingly, we found nearly all key components of the endoplasmic reticulum quality control (ERQC) system were highly N-glycosylated in conidium and appressorium. Phenotypic analyses to the gene deletion mutants revealed four ERQC components, Gls1, Gls2, GTB1 and Cnx1, are important for mycelial growth, conidiation, and invasive hyphal growth in host cells. Subsequently, we identified the Gls1 N-glycosite N497 was important for invasive hyphal growth and partially required for conidiation, but didn't affect colony growth. Mutation of N497 resulted in reduction of Gls1 in protein level, and localization from ER into the vacuole, suggesting N497 is important for protein stability of Gls1. Our study showed a snapshot of the N-glycosylation landscape in plant pathogenic fungi, indicating functions of this modification in cellular processes, developments and pathogenesis
    corecore