Search CORE

260 research outputs found

A Protein Standard That Emulates Homology for the Characterization of Protein Inference Algorithms

Author: Edfors F.
Forsstrom B.
Hoopmann M.R.
Kall L.
Palmblad M.
Payne S.H.
Perez-Riverol Y.
The M.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 31/05/2018
Field of study

Proteomic

Leiden University Scholary Publications

The PRIDE database and related tools and resources in 2019: improving support for quantification data

Author: Audain E.
Bai J.
Bernal-Llinares M.
Brazma A.
Cox J.
Csordas A.
Eisenacher M.
Griss J.
Hewapathirana S.
Inuganti A.
Jarnuczak A.
Kundu D.
Mayer G.
Perez E.
Perez-Riverol Y.
Pfeuffer J.
Sachsenberg T.
Ternent T.
Tiwary S.
Uszkoreit J.
Vizcaino J.
Walzer M.
Yilmaz S.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2019
Field of study

The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data, and is one of the founding members of the global ProteomeXchange (PX) consortium. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2016. In the last 3years, public data sharing through PRIDE (as part of PX) has definitely become the norm in the field. In parallel, data re-use of public proteomics data has increased enormously, with multiple applications. We first describe the new architecture of PRIDE Archive, the archival component of PRIDE. PRIDE Archive and the related data submission framework have been further developed to support the increase in submitted data volumes and additional data types. A new scalable and fault tolerant storage backend, Application Programming Interface and web interface have been implemented, as a part of an ongoing process. Additionally, we emphasize the improved support for quantitative proteomics data through the mzTab format. At last, we outline key statistics on the current data contents and volume of downloads, and how PRIDE data are starting to be disseminated to added-value resources including Ensembl, UniProt and Expression Atlas

MPG.PuRe

MaxDIA enables library-based and library-free data-independent acquisition proteomics

Author: Cox J.
Distler U.
Hamzeiy H.
Humphrey S.
Itzhak D.
Kaspar-Schoenefeld S.
McCarthy F.
Nagaraj N.
Ohmayer U.
Perez-Riverol Y.
Prianichnikov N.
Rudolph J.
Salinas Soto F.
Sinitcyn P.
Steger M.
Tenzer S.
Wichmann C.
Yilmaz S.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

MaxDIA is a software platform for analyzing data-independent acquisition (DIA) proteomics data within the MaxQuant software environment. Using spectral libraries, MaxDIA achieves deep proteome coverage with substantially better coefficients of variation in protein quantification than other software. MaxDIA is equipped with accurate false discovery rate (FDR) estimates on both library-to-DIA match and protein levels, including when using whole-proteome predicted spectral libraries. This is the foundation of discovery DIA-hypothesis-free analysis of DIA samples without library and with reliable FDR control. MaxDIA performs three- or four-dimensional feature detection of fragment data, and scoring of matches is augmented by machine learning on the features of an identification. MaxDIA's bootstrap DIA workflow performs multiple rounds of matching with increasing quality of recalibration and stringency of matching to the library. Combining MaxDIA with two new technologies-BoxCar acquisition and trapped ion mobility spectrometry-both lead to deep and accurate proteome quantification. The software platform MaxDIA streamlines analysis of data-independent acquisition proteomics

MPG.PuRe

The efficacy of various machine learning models for multi-class classification of RNA-seq expression data

Author: A Statnikov
AT Azar
G Bartsch
G Sanz
H Rhee
I Ezkurdia
J Friedman
J Meng
J Zhi
JN Weinstein
L Breiman
M Al-Rajab
M Villamizar
MD Podolsky
MS Lawrence
ND Khalilabad
P Geurts
R Díaz-Uriarte
S Bram Ednersson
S Tarek
T Cover
X Li
Y Perez-Riverol
Y Shang
Y Tan
Z Ye
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/08/2019
Field of study

Late diagnosis and high costs are key factors that negatively impact the care of cancer patients worldwide. Although the availability of biological markers for the diagnosis of cancer type is increasing, costs and reliability of tests currently present a barrier to the adoption of their routine use. There is a pressing need for accurate methods that enable early diagnosis and cover a broad range of cancers. The use of machine learning and RNA-seq expression analysis has shown promise in the classification of cancer type. However, research is inconclusive about which type of machine learning models are optimal. The suitability of five algorithms were assessed for the classification of 17 different cancer types. Each algorithm was fine-tuned and trained on the full array of 18,015 genes per sample, for 4,221 samples (75 % of the dataset). They were then tested with 1,408 samples (25 % of the dataset) for which cancer types were withheld to determine the accuracy of prediction. The results show that ensemble algorithms achieve 100% accuracy in the classification of 14 out of 17 types of cancer. The clustering and classification models, while faster than the ensembles, performed poorly due to the high level of noise in the dataset. When the features were reduced to a list of 20 genes, the ensemble algorithms maintained an accuracy above 95% as opposed to the clustering and classification models.Comment: 12 pages, 4 figures, 3 tables, conference paper: Computing Conference 2019, published at https://link.springer.com/chapter/10.1007/978-3-030-22871-2_6

arXiv.org e-Print Archive

Crossref

Comparative proteomics: assessment of biological variability and dataset comparability

Author: A Bolotin
AC Paoletti
AM Brunner
B Zybailov
B Zybailov
C Tricarico
CW Turck
David A Mills
H Liu
H Rawsthorne
Hyun Joo An
JA Warrington
Jae Han Kim
JH Kim
KK Nielsen
M Bonnet-Duquennoy
Na Ri Seo
O Thellin
P Mallick
Sa Rang Kim
Seunghup Jung
T Theis
TN Villavicencio-Diaz
Tuong Vi Nguyen
WM Old
Y Perez-Riverol
Y Perez-Riverol
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/04/2015
Field of study

BACKGROUND: Comparative proteomics in bacteria are often hampered by the differential nature of dataset quality and/or inherent biological deviations. Although common practice compensates by reproducing and normalizing datasets from a single sample, the degree of certainty is limited in comparison of multiple dataset. To surmount these limitations, we introduce a two-step assessment criterion using: (1) the relative number of total spectra (R (TS)) to determine if two LC-MS/MS datasets are comparable and (2) nine glycolytic enzymes as internal standards for a more accurate calculation of relative amount of proteins. Lactococcus lactis HR279 and JHK24 strains expressing high or low levels (respectively) of green fluorescent protein (GFP) were used for the model system. GFP abundance was determined by spectral counting and direct fluorescence measurements. Statistical analysis determined relative GFP quantity obtained from our approach matched values obtained from fluorescence measurements. RESULTS: L. lactis HR279 and JHK24 demonstrates two datasets with an R (TS) value less than 1.4 accurately reflects relative differences in GFP levels between high and low expression strains. Without prior consideration of R (TS) and the use of internal standards, the relative increase in GFP calculated by spectral counting method was 3.92 ± 1.14 fold, which is not correlated with the value determined by the direct fluorescence measurement (2.86 ± 0.42 fold) with the p = 0.024. In contrast, 2.88 ± 0.92 fold was obtained by our approach showing a statistically insignificant difference (p = 0.95). CONCLUSIONS: Our two-step assessment demonstrates a useful approach to: (1) validate the comparability of two mass spectrometric datasets and (2) accurately calculate the relative amount of proteins between proteomic datasets. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0561-9) contains supplementary material, which is available to authorized users

Crossref

PubMed Central

eScholarship - University of California

A FAIR guide for data providers to maximise sharing of human genomic data

Author: A Brazma
A Brazma
Amanda McMurray
Fiona G. G. Nielsen
Francis Ouellette
H Li
HA Piwowar
HV Firth
I Hrynaszkiewicz
I Lappalainen
JA McMurry
JPA Ioannidis
KA Tryka
KM Wong
L Ohno-Machado
Manuel Corpas
MD Wilkinson
N Homer
N Kolesnikov
Nadezda V. Kovalevskaya
NV Kovalevskaya
P Danecek
P McQuilton
P Rocca-Serra
S Köhler
S Soini
SOM Dyke
TA van Schaik
Y Erlich
Y Perez-Riverol
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

It is generally acknowledged that, for reproducibility and progress of human genomic research, data sharing is critical. For every sharing transaction, a successful data exchange is produced between a data consumer and a data provider. Providers of human genomic data (e.g., publicly or privately funded repositories and data archives) fulfil their social contract with data donors when their shareable data conforms to FAIR (findable, accessible, interoperable, reusable) principles. Based on our experiences via Repositive (https://repositive.io), a leading discovery platform cataloguing all shared human genomic datasets, we propose guidelines for data providers wishing to maximise their shared data’s FAIRness. Citation: Corpas M, Kovalevskaya NV, McMurray A, Niel

Crossref

Directory of Open Access Journals

WestminsterResearch

Ensuring meiotic DNA break formation in the mouse pseudoautosomal region

Author: AH Peters
AP Arnold
B de Massy
D Kipling
D Kipling
D Thacker
D Zickler
F Papanikos
I Lam
J Lange
J Page
J Perry
J Schindelin
JM Turner
JW Bergs
K Brick
K Brick
K Brick
K Daniel
K Harbers
K Harbers
KP Kim
L Chong
L Kauppi
L Kauppi
L Kauppi
L Wojtasz
LA Bannister
LG Reinholdt
LL Tres
M Boekhout
M Stanzione
ME Karasu
ME Karasu
N Kleckner
N Kourmouli
P Soriano
PP Khil
R Kumar
R Kumar
R Kumar
S Palmer
S Panizza
T Raudsepp
T Raudsepp
V Gaysinskaya
Y Costa
Y Perez-Riverol
Y Takahashi
YH Shin
Publication venue
Publication date: 01/06/2020
Field of study

In mice, the pseudoautosomal region of the sex chromosomes undergoes a dynamic structural rearrangement to promote a high rate of DNA double-strand breaks and to ensure X-Y recombination. Sex chromosomes in males of most eutherian mammals share only a small homologous segment, the pseudoautosomal region (PAR), in which the formation of double-strand breaks (DSBs), pairing and crossing over must occur for correct meiotic segregation(1,2). How cells ensure that recombination occurs in the PAR is unknown. Here we present a dynamic ultrastructure of the PAR and identify controlling cis- and trans-acting factors that make the PAR the hottest segment for DSB formation in the male mouse genome. Before break formation, multiple DSB-promoting factors hyperaccumulate in the PAR, its chromosome axes elongate and the sister chromatids separate. These processes are linked to heterochromatic mo-2 minisatellite arrays, and require MEI4 and ANKRD31 proteins but not the axis components REC8 or HORMAD1. We propose that the repetitive DNA sequence of the PAR confers unique chromatin and higher-order structures that are crucial for recombination. Chromosome synapsis triggers collapse of the elongated PAR structure and, notably, oocytes can be reprogrammed to exhibit spermatocyte-like levels of DSBs in the PAR simply by delaying or preventing synapsis. Thus, the sexually dimorphic behaviour of the PAR is in part a result of kinetic differences between the sexes in a race between the maturation of the PAR structure, formation of DSBs and completion of pairing and synapsis. Our findings establish a mechanistic paradigm for the recombination of sex chromosomes during meiosis.Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Quantitative proteomics analysis reveals important roles of N-glycosylation on ER quality control system for development and pathogenesis in Magnaporthe oryzae

Author: A Fernández-Álvarez
A Roth
A Stewart
AD Elbein
AJ Foster
C Hammond
C Rayon
D Poulain
D Zattas
DF Zielinska
DF Zielinska
DJ Kelleher
E Hiller
EA Schoffelmeer
EW Deutsch
G Picariello
H Kaji
H Lis
H Zhang
H Zhang
HM Mora-Montes
J Motteram
J Schirawski
JC de Jong
JU Baenziger
K Koles
K Poljak
L Ellgaard
L Yu
LA Kong
M Badaruddin
M Dickman
M Guo
M Samalova
MA Breidenbach
MF Chou
MV Rubio
N Ahn
N Dean
P Fang
P Määttänen
P Neubert
P Shannon
PJ Boersema
PM Rudd
R Apweiler
RA Brodsky
RG Spiro
RJ Howard
RJ Pattison
S Baas
S Tyanova
T Kamakura
T Liu
T Liu
TA Rapoport
W Song
WL Franck
WL Franck
WS Lo
XL Chen
Y Hong
Y Oh
Y Pan
Y Perez-Riverol
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/02/2020
Field of study

The fungal pathogen Magnaporthe oryzae can cause rice blast and wheat blast diseases, which threatens worldwide food production. During infection, M. oryzae follows a sequence of distinct developmental stages adapted to survival and invasion of the host environment. M. oryzae attaches onto the host by the conidium, and then develops an appressorium to breach the host cuticle. After penetrating, it forms invasive hyphae to quickly spread in the host cells. Numerous genetic studies have focused on the mechanisms underlying each step in the infection process, but systemic approaches are needed for a broader, integrated understanding of regulatory events during M. oryzae pathogenesis. Many infection-related signaling events are regulated through post-translational protein modifications within the pathogen. N-linked glycosylation, in which a glycan moiety is added to the amide group of an asparagine residue, is an abundant modification known to be essential for M. oryzae infection. In this study, we employed a quantitative proteomics analysis to unravel the overall regulatory mechanisms of N-glycosylation at different developmental stages of M. oryzae. We detected changes in N-glycosylation levels at 559 glycosylated residues (N-glycosites) in 355 proteins during different stages, and determined that the ER quality control system is elaborately regulated by N-glycosylation. The insights gained will help us to better understand the regulatory mechanisms of infection in pathogenic fungi. These findings may be also important for developing novel strategies for fungal disease control. Genetic studies have shown essential functions of N-glycosylation during infection of the plant pathogenic fungi, however, systematic roles of N-glycosylation in fungi is still largely unknown. Biological analysis demonstrated N-glycosylated proteins were widely present at different development stages of Magnaporthe oryzae and especially increased in the appressorium and invasive hyphae. A large-scale quantitative proteomics analysis was then performed to explore the roles of N-glycosylation in M. oryzae. A total of 559 N-glycosites from 355 proteins were identified and quantified at different developmental stages. Functional classification to the N-glycosylated proteins revealed N-glycosylation can coordinate different cellular processes for mycelial growth, conidium formation, and appressorium formation. N-glycosylation can also modify key components in N-glycosylation, O-glycosylation and GPI anchor pathways, indicating intimate crosstalk between these pathways. Interestingly, we found nearly all key components of the endoplasmic reticulum quality control (ERQC) system were highly N-glycosylated in conidium and appressorium. Phenotypic analyses to the gene deletion mutants revealed four ERQC components, Gls1, Gls2, GTB1 and Cnx1, are important for mycelial growth, conidiation, and invasive hyphal growth in host cells. Subsequently, we identified the Gls1 N-glycosite N497 was important for invasive hyphal growth and partially required for conidiation, but didn't affect colony growth. Mutation of N497 resulted in reduction of Gls1 in protein level, and localization from ER into the vacuole, suggesting N497 is important for protein stability of Gls1. Our study showed a snapshot of the N-glycosylation landscape in plant pathogenic fungi, indicating functions of this modification in cellular processes, developments and pathogenesis

Crossref

Directory of Open Access Journals

University of East Anglia digital repository

Prolyl Hydroxylase Substrate Adenylosuccinate Lyase Is An Oncogenic Driver In Triple Negative Breast Cancer

Author: A Dobin
A Jurecka
A Prat
B Turriziani
C Curtis
C Liedtke
Cancer Genome Atlas Network.
D Dimartino
D-Y Zhang
ER McDonald
FR Dejure
G Bianchini
G Di Conza
G Hoxhaj
H Park
J Bazin
J Bierau
J Guo
J Rodriguez
J Rodriguez
J Xiong
J-H Yoon
L Terzuoli
M Domise
M Ivan
M Sciacovelli
M Takada
M Zikánová
ME Cockman
MI Love
N Emmanuel
P Heir
P Jaakkola
Q Zhang
R Patro
S Kmoch
SB Lee
TL Diamond
V Bardot
VL Reed
WG Kaelin Jr
X Liu
X Wang
X Zheng
Y Perez-Riverol
Yen-Chun Liu
Z Cui
ZE Stine
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/11/2019
Field of study

Maastricht University Research Portal

Crossref

Edinburgh Research Explorer