Search CORE

69 research outputs found

Accurate estimation of isoelectric point of protein and peptide based on amino acid sequences

Author: Audain Enrique
Flower Darren R.
Hermjakob Henning
Perez-Riverol Yasset
Ramos Yassel
Publication venue: 'Oxford University Press (OUP)'
Publication date: 14/11/2015
Field of study

Motivation: In any macromolecular polyprotic system - for example protein, DNA or RNA - the isoelectric point - commonly referred to as the pI - can be defined as the point of singularity in a titration curve, corresponding to the solution pH value at which the net overall surface charge - and thus the electrophoretic mobility - of the ampholyte sums to zero. Different modern analytical biochemistry and proteomics methods depend on the isoelectric point as a principal feature for protein and peptide characterization. Protein separation by isoelectric point is a critical part of 2-D gel electrophoresis, a key precursor of proteomics, where discrete spots can be digested in-gel, and proteins subsequently identified by analytical mass spectrometry. Peptide fractionation according to their pI is also widely used in current proteomics sample preparation procedures previous to the LC-MS/MS analysis. Therefore accurate theoretical prediction of pI would expedite such analysis. While such pI calculation is widely used, it remains largely untested, motivating our efforts to benchmark pI prediction methods. Results: Using data from the database PIP-DB and one publically available dataset as our reference gold standard, we have undertaken the benchmarking of pI calculation methods. We find that methods vary in their accuracy and are highly sensitive to the choice of basis set. The machine-learning algorithms, especially the SVM-based algorithm, showed a superior performance when studying peptide mixtures. In general, learning-based pI prediction methods (such as Cofactor, SVM and Branca) require a large training dataset and their resulting performance will strongly depend of the quality of that data. In contrast with Iterative methods, machine-learning algorithms have the advantage of being able to add new features to improve the accuracy of prediction. Contact: [email protected] Availability and Implementation: The software and data are freely available at https://github.com/ypriverol/pIR. Supplementary information: Supplementary data are available at Bioinformatics online

Crossref

Aston Publications Explorer

Generation of ENSEMBL-based proteogenomics databases boosts the identification of non-canonical peptides

Author: Audain Enrique
Branca Rui
Lehtiö Janne
Perez-Riverol Yasset
Pfeuffer Julianus
Sachsenberg Timo
Umer Husen M.
Zhu Yafeng
Publication venue
Publication date: 14/12/2021
Field of study

We have implemented the pypgatk package and the pgdb workflow to create proteogenomics databases based on ENSEMBL resources. The tools allow the generation of protein sequences from novel protein-coding transcripts by performing a three-frame translation of pseudogenes, lncRNAs and other non-canonical transcripts, such as those produced by alternative splicing events. It also includes exonic out-of-frame translation from otherwise canonical protein-coding mRNAs. Moreover, the tool enables the generation of variant protein sequences from multiple sources of genomic variants including COSMIC, cBioportal, gnomAD and mutations detected from sequencing of patient samples. pypgatk and pgdb provide multiple functionalities for database handling including optimized target/decoy generation by the algorithm DecoyPyrat. Finally, we have reanalyzed six public datasets in PRIDE by generating cell-type specific databases for 65 cell lines using the pypgatk and pgdb workflow, revealing a wealth of non-canonical or cryptic peptides amounting to >5% of the total number of peptides identified

Institutional Repository of the Freie Universität Berlin

PubMed Central

A proteomics sample metadata representation for multiomics integration and big data analysis

The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.publishedVersio

Bergen Open Research Archive (Univ. of Bergen)

Ghent University Academic Bibliography

Copenhagen University Research Information System

Publication Server of Zuse Institute Berlin (ZIB)

Providence St. Joseph Health Digital Commons

Syddansk Universitets Forskerportal

NORA - Norwegian Open Research Archives

HAL: Hyper Article en Ligne

Extensive Identification of Genes Involved in Congenital and Structural Heart Disorders and Cardiomyopathy

Author: Audain Enrique
Beaudet Arthur L
Bou About Ghina
Bower Lynette
Brandmaier Stefan
Braun Robert E
Brown Steve D M
Bunton-Stasyshyn Rosie K A
Cater Heather
Cho Yi-Li
Christiansen Audrey E
Christou Skevoulla
Clary David
Dickinson Mary E
Flenniken Ann M
Fobo Gisela
Frishman Goar
Fuchs Helmut
Gailus-Durner Valerie
Galter Isabella Rikarda
Gao Xiang
Genomics England Research Consortium
Gilly Arthur
Grallert Harald
Görlach Agnes
Haseli Mashhadi Hamed
Heaney Jason D
Herault Yann
Hitz Marc-Phillip
Hrabe de Angelis Martin
Hsu Chih-Wei
IMPC Consortium
Jacobs Hugues
Kelsey Lois
Leblanc Sophie
Leuchtenberger Stefanie
Lloyd K C Kent
Mallon Ann-Marie
Mammano Fabio
Marschall Susan
Mason Jeremy
Mayer-Kuckuk Philipp
McKerlie Colin
Meehan Terrence F
Meziane Hamid
Miller Gregor
Montrone Corinna
Munoz Fuentes Violeta
Murray Steven A
Nutter Lauryl M J
Oprea Tudor I
Parkinson Helen
Prochazka Jan
Rayner Nigel W
Reynolds Corey L
Roper Willson B
Rozman Jan
Ruepp Andreas
Sanz-Moreno Adrián
Sedlacek Radislav
Selloum Mohammed
Seong Je Kyung
Sharma Sapna
Shiroishi Toshihiko
Smedley Damian
Sorg Tania
Spielmann Nadine
Stewart Michelle
Svenson Karen L
Teboul Lydia
Teperino Raffaele
Tocchini-Valentini Glauco P
Wagner Matias
Ward Christopher S
Wells Sara E
Wendling Olivia
Westerberg Henrik
Westphal Dominik S
White Jaqueline K
Willett Amelia M
Wolf Cordula
Wolf Eckhard
Wotton Janine M
Wurst Wolfgang
Xu Ying
Zapf Lilly
Zeggini Eleftheria
Östereicher Manuela A
Publication venue: DigitalCommons@TMC
Publication date: 01/02/2022
Field of study

Clinical presentation of congenital heart disease is heterogeneous, making identification of the disease-causing genes and their genetic pathways and mechanisms of action challenging. By using in vivo electrocardiography, transthoracic echocardiography and microcomputed tomography imaging to screen 3,894 single-gene-null mouse lines for structural and functional cardiac abnormalities, here we identify 705 lines with cardiac arrhythmia, myocardial hypertrophy and/or ventricular dilation. Among these 705 genes, 486 have not been previously associated with cardiac dysfunction in humans, and some of them represent variants of unknown relevance (VUR). Mice with mutations in Casz1, Dnajc18, Pde4dip, Rnf38 or Tmem161b genes show developmental cardiac structural abnormalities, with their human orthologs being categorized as VUR. Using UK Biobank data, we validate the importance of the DNAJC18 gene for cardiac homeostasis by showing that its loss of function is associated with altered left ventricular systolic function. Our results identify hundreds of previously unappreciated genes with potential function in congenital heart disease and suggest causal function of five VUR in congenital heart disease

DigitalCommons@The Texas Medical Center

Extensive identification of genes involved in congenital and structural heart disorders and cardiomyopathy.

Author: Audain Enrique
Beaudet Arthur L
Bou About Ghina
Bower Lynette
Brandmaier Stefan
Braun Robert E
Brown Steve D M
Bunton-Stasyshyn Rosie K A
Cater Heather
Cho Yi-Li
Christiansen Audrey E
Christou Skevoulla
Clary David
Dickinson Mary E
Flenniken Ann M
Fobo Gisela
Frishman Goar
Fuchs Helmut
Gailus-Durner Valerie
Galter Isabella Rikarda
Gao Xiang
Genomics England Research Consortium
Gilly Arthur
Grallert Harald
Görlach Agnes
Haseli Mashhadi Hamed
Heaney Jason D
Herault Yann
Hitz Marc-Phillip
Hrabe de Angelis Martin
Hsu Chih-Wei
IMPC consortium
Jacobs Hugues
Kelsey Lois
Leblanc Sophie
Leuchtenberger Stefanie
Lloyd K C Kent
Mallon Ann-Marie
Mammano Fabio
Marschall Susan
Mason Jeremy
Mayer-Kuckuk Philipp
McKerlie Colin
Meehan Terrence F
Meziane Hamid
Miller Gregor
Montrone Corinna
Munoz Fuentes Violeta
Murray Stephen A
Nutter Lauryl M J
Oprea Tudor I
Parkinson Helen
Prochazka Jan
Rayner Nigel W
Reynolds Corey L
Roper Willson B
Rozman Jan
Ruepp Andreas
Sanz-Moreno Adrián
Sedlacek Radislav
Selloum Mohammed
Seong Je Kyung
Sharma Sapna
Shiroishi Toshihiko
Smedley Damian
Sorg Tania
Spielmann Nadine
Stewart Michelle
Svenson Karen L
Teboul Lydia
Teperino Raffaele
Tocchini-Valentini Glauco P
Wagner Matias
Ward Christopher S
Wells Sara E
Wendling Olivia
Westerberg Henrik
Westphal Dominik S
White Jacqueline K
Willett Amelia M
Wolf Cordula
Wolf Eckhard
Wotton Janine M
Wurst Wolfgang
Xu Ying
Zapf Lilly
Zeggini Eleftheria
Östereicher Manuela A
Publication venue: The Mouseion at the JAXlibrary
Publication date: 01/02/2022
Field of study

The Jackson Laboratory: The Mouseion at the JAXlibrary

Large-scale data-driven analysis to understand the genetics of Congenital Heart Disease

Author: Audain Martinez Enrique
Publication venue
Publication date: 01/01/2021
Field of study

Congenital Heart Disease (CHD) delineates a large group of structural defects, which can occur due to perturbations at some stage in the cardiac embryogenesis process. With a global incidence ranging from 7 to 9 cases per 1000 live births, CHD accounts for a significant fraction of new-borns deaths worldwide. Different studies have identified genetics as an essential factor underlying CHD, along with environmental factors. The technological advances within the last years have helped improve CHD diagnosis and understand its genetic causes. Nevertheless, despite the advances in our understanding of the disease, many molecular mechanisms underlying CHD remain uncertain. Herein I present my efforts focused on discovering new genes and biological pathways altered in patients with CHD. The work is based on large CHD patient cohorts, collected and analysed as part of an international collaboration. The adopted integrative data-driven approach in this work can roughly be grouped into two principal aims: i) the development of statistical frameworks and bioinformatics tools to analyse high-dimensional data and ii) the meta-analysis of large-scale exome sequencing data to elucidate variants and genes conferring risk of CHD. By meta-analysing copy number variations and de novo variants in CHD probands, we implicated novel genes reaching genome-wide significant association with CHD and strengthened previously described associations. We also explored the differences between non-syndromic and syndromic CHD by analysing a large-scale exome cohort of patients. In summary, our integrative approach, supported by the data analysis of ~15,000 CHD patients, allowed us to gain new insights into the genetic origin of CHD. Consequently, we present here a valuable resource to continue investigating the causes of CHD and pave the way to promote new studies in this area

MACAU: Open Access Repository of Kiel University

Estimación del punto isoeléctrico de péptidos empleando descriptores moleculares y máquinas de soporte vectorial

Author: Enrique Audain (528623)
Yasset Perez-Riverol (528620)
Publication venue
Publication date
Field of study

<p>El fraccionamiento de mezclas de péptidos utilizando geles con gradiente de pH inmovilizado se utiliza con frecuencia como el primer paso de separación en experimentos de proteómica. Esta técnica produce un incremento tanto en el rango dinámico como en la resolución de la separación de péptidos previo al análisis por Cromatografía Líquida-Espectrometría de Masas. Los valores de punto isoeléctrico (pI) experimental obtenidos en combinación con la información de los espectros de fragmentación pueden ser utilizados para mejorar las identificaciones de péptidos. Por lo tanto, la estimación precisa del valor de pI basado en la secuencia de aminoácidos constituye un punto crítico en este tipo de experimentos. En la actualidad, el pI se estima fundamentalmente mediante modelos basados en el estado de carga de la molécula, y/o el algoritmo Cofactor. Sin embargo, ninguno de estos métodos es capaz de calcular el valor de pI de péptidos básicos con precisión. En este trabajo, presentamos un enfoque nuevo que puede mejorar la estimación del pI significativamente, mediante el uso de máquinas de soporte vectorial (SVM), un descriptor experimental de aminoácidos tomado de la base de datos AAIndex y el punto isoeléctrico predicho por un modelo basado en el estado de carga. Los resultados obtenidos en dos conjuntos de datos experimentales mostraron una alta correlación (0.96-0.98) entre valores estimados y observados de pI, con una desviación estándar de 0.32-0.36 unidades de pH.</p

The Francis Crick Institute

bigbio/py-pgatk: v0.0.24

Author: Enrique Audain
Husen M. Umer
WangDong
Yasset Perez-Riverol
Publication venue: Zenodo
Publication date: 19/04/2024
Field of study

<h2>What's Changed</h2> <ul> <li>update by @ypriverol in https://github.com/bigbio/py-pgatk/pull/69</li> <li>fixed bug in gtf file name, just keep version not release string by @ypriverol in https://github.com/bigbio/py-pgatk/pull/68</li> <li>add spectrumAI by @DongdongdongW in https://github.com/bigbio/py-pgatk/pull/70</li> <li>Fix the bug of class-fdr, adjust the blast to multi-process running t… by @DongdongdongW in https://github.com/bigbio/py-pgatk/pull/71</li> <li>fixes issue #72 by @husensofteng in https://github.com/bigbio/py-pgatk/pull/73</li> <li>update validate by @DongdongdongW in https://github.com/bigbio/py-pgatk/pull/76</li> <li>spectrumAI into py-pgatk by @ypriverol in https://github.com/bigbio/py-pgatk/pull/77</li> </ul> <h2>New Contributors</h2> <ul> <li>@DongdongdongW made their first contribution in https://github.com/bigbio/py-pgatk/pull/70</li> </ul> <p><strong>Full Changelog</strong>: https://github.com/bigbio/py-pgatk/compare/v0.0.23...v0.0.24</p&gt

ZENODO

A Survey of Molecular Descriptors Used in Mass Spectrometry Based Proteomics

Author: Aniel Sanchez
Enrique Audain
Juan Vizcaíno
Yasset Perez-Riverol
Publication venue: Bentham Science Publishers Ltd.
Publication date: 31/01/2014
Field of study

Crossref

Integrative analysis of genomic variants reveals new associations of candidate haploinsufficient genes with congenital heart disease

Author: Audain Enrique
Hitz Marc-Phillip
Larsen Lars Allan
Stiller Brigitte
Wilsdon Anna
Publication venue
Publication date: 01/01/2021
Field of study

Numerous genetic studies have established a role for rare genomic variants in Congenital Heart Disease (CHD) at the copy number variation (CNV) and de novo variant (DNV) level. To identify novel haploinsufficient CHD disease genes, we performed an integrative analysis of CNVs and DNVs identified in probands with CHD including cases with sporadic thoracic aortic aneurysm. We assembled CNV data from 7,958 cases and 14,082 controls and performed a gene-wise analysis of the burden of rare genomic deletions in cases versus controls. In addition, we performed variation rate testing for DNVs identified in 2,489 parent-offspring trios. Our analysis revealed 21 genes which were significantly affected by rare CNVs and/or DNVs in probands. Fourteen of these genes have previously been associated with CHD while the remaining genes (FEZ1, MYO16, ARID1B, NALCN, WAC, KDM5B and WHSC1) have only been associated in small cases series or show new associations with CHD. In addition, a systems level analysis revealed affected protein-protein interaction networks involved in Notch signaling pathway, heart morphogenesis, DNA repair and cilia/centrosome function. Taken together, this approach highlights the importance of re-analyzing existing datasets to strengthen disease association and identify novel disease genes and pathways

FreiDok plus