50 research outputs found

    Encompassing new use cases - level 3.0 of the HUPO-PSI format for molecular interactions.

    Get PDF
    BACKGROUND: Systems biologists study interaction data to understand the behaviour of whole cell systems, and their environment, at a molecular level. In order to effectively achieve this goal, it is critical that researchers have high quality interaction datasets available to them, in a standard data format, and also a suite of tools with which to analyse such data and form experimentally testable hypotheses from them. The PSI-MI XML standard interchange format was initially published in 2004, and expanded in 2007 to enable the download and interchange of molecular interaction data. PSI-XML2.5 was designed to describe experimental data and to date has fulfilled this basic requirement. However, new use cases have arisen that the format cannot properly accommodate. These include data abstracted from more than one publication such as allosteric/cooperative interactions and protein complexes, dynamic interactions and the need to link kinetic and affinity data to specific mutational changes. RESULTS: The Molecular Interaction workgroup of the HUPO-PSI has extended the existing, well-used XML interchange format for molecular interaction data to meet new use cases and enable the capture of new data types, following extensive community consultation. PSI-MI XML3.0 expands the capabilities of the format beyond simple experimental data, with a concomitant update of the tool suite which serves this format. The format has been implemented by key data producers such as the International Molecular Exchange (IMEx) Consortium of protein interaction databases and the Complex Portal. CONCLUSIONS: PSI-MI XML3.0 has been developed by the data producers, data users, tool developers and database providers who constitute the PSI-MI workgroup. This group now actively supports PSI-MI XML2.5 as the main interchange format for experimental data, PSI-MI XML3.0 which additionally handles more complex data types, and the simpler, tab-delimited MITAB2.5, 2.6 and 2.7 for rapid parsing and download

    JAMI: a Java library for molecular interactions and data interoperability.

    Get PDF
    BACKGROUND: A number of different molecular interactions data download formats now exist, designed to allow access to these valuable data by diverse user groups. These formats include the PSI-XML and MITAB standard interchange formats developed by Molecular Interaction workgroup of the HUPO-PSI in addition to other, use-specific downloads produced by other resources. The onus is currently on the user to ensure that a piece of software is capable of read/writing all necessary versions of each format. This problem may increase, as data providers strive to meet ever more sophisticated user demands and data types. RESULTS: A collaboration between EMBL-EBI and the University of Cambridge has produced JAMI, a single library to unify standard molecular interaction data formats such as PSI-MI XML and PSI-MITAB. The JAMI free, open-source library enables the development of molecular interaction computational tools and pipelines without the need to produce different versions of software to read different versions of the data formats. CONCLUSION: Software and tools developed on top of the JAMI framework are able to integrate and support both PSI-MI XML and PSI-MITAB. The use of JAMI avoids the requirement to chain conversions between formats in order to reach a desired output format and prevents code and unit test duplication as the code becomes more modular. JAMI's model interfaces are abstracted from the underlying format, hiding the complexity and requirements of each data format from developers using JAMI as a library

    Non-coding RNA regulatory networks

    Get PDF
    It is well established that the vast majority of human RNA transcripts do not encode for proteins and that non-coding RNAs regulate cell physiology and shape cellular functions. A subset of them is involved in gene regulation at different levels, from epigenetic gene silencing to post-transcriptional regulation of mRNA stability. Notably, the aberrant expression of many non-coding RNAs has been associated with aggressive pathologies. Rapid advances in network biology indicates that the robustness of cellular processes is the result of specific properties of biological networks such as scale-free degree distribution and hierarchical modularity, suggesting that regulatory network analyses could provide new insights on gene regulation and dysfunction mechanisms. In this study we present an overview of public repositories where non-coding RNA-regulatory interactions are collected and annotated, we discuss unresolved questions for data integration and we recall existing resources to build and analyse networks

    TFLink: an integrated gateway to access transcription factor-target gene interactions for multiple species

    Get PDF
    Analysis of transcriptional regulatory interactions and their comparisons across multiple species are crucial for progress in various fields in biology, from functional genomics to the evolution of signal transduction pathways. However, despite the rapidly growing body of data on regulatory interactions in several eukaryotes, no databases exist to provide curated high-quality information on transcription factor-target gene interactions for multiple species. Here, we address this gap by introducing the TFLink gateway, which uniquely provides experimentally explored and highly accurate information on transcription factor-target gene interactions (∼12 million), nucleotide sequences and genomic locations of transcription factor binding sites (∼9 million) for human and six model organisms: mouse, rat, zebrafish, fruit fly, worm and yeast by integrating 10 resources. TFLink provides user-friendly access to data on transcription factor-target gene interactions, interactive network visualizations and transcription factor binding sites, with cross-links to several other databases. Besides containing accurate information on transcription factors, with a clear labelling of the type/volume of the experiments (small-scale or high-throughput), the source database and the original publications, TFLink also provides a wealth of standardized regulatory data available for download in multiple formats. The database offers easy access to high-quality data for wet-lab researchers, supplies data for gene set enrichment analyses and facilitates systems biology and comparative gene regulation studies. Database URL https://tflink.net/

    The IntAct database:Efficient access to fine-grained molecular interaction data

    Get PDF
    The IntAct molecular interaction database (https://www.ebi.ac.uk/intact) is a curated resource of molecular interactions, derived from the scientific literature and from direct data depositions. As of August 2021, IntAct provides more than one million binary interactions, curated by twelve global partners of the International Molecular Exchange consortium, for which the IntAct database provides a shared curation and dissemination platform. The IMEx curation policy has always emphasised a fine-grained data and curation model, aiming to capture the relevant experimental detail essential for the interpretation of the provided molecular interaction data. Here, we present recent curation focus and progress, as well as a completely redeveloped website which presents IntAct data in a much more user-friendly and detailed way

    An intrinsically disordered proteins community for ELIXIR.

    Get PDF
    Intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) are now recognised as major determinants in cellular regulation. This white paper presents a roadmap for future e-infrastructure developments in the field of IDP research within the ELIXIR framework. The goal of these developments is to drive the creation of high-quality tools and resources to support the identification, analysis and functional characterisation of IDPs. The roadmap is the result of a workshop titled "An intrinsically disordered protein user community proposal for ELIXIR" held at the University of Padua. The workshop, and further consultation with the members of the wider IDP community, identified the key priority areas for the roadmap including the development of standards for data annotation, storage and dissemination; integration of IDP data into the ELIXIR Core Data Resources; and the creation of benchmarking criteria for IDP-related software. Here, we discuss these areas of priority, how they can be implemented in cooperation with the ELIXIR platforms, and their connections to existing ELIXIR Communities and international consortia. The article provides a preliminary blueprint for an IDP Community in ELIXIR and is an appeal to identify and involve new stakeholders

    Best practices for the manual curation of Intrinsically Disordered Proteins in DisProt

    Full text link
    The DisProt database is a significant resource containing manually curated data on experimentally validated intrinsically disordered proteins (IDPs) and regions (IDRs) from the literature. Developed in 2005, its primary goal was to collect structural and functional information into proteins that lack a fixed three-dimensional (3D) structure. Today, DisProt has evolved into a major repository that not only collects experimental data but also contributes significantly to our understanding of the IDPs/IDRs roles in various biological processes, such as autophagy or the life cycle mechanisms in viruses, or their involvement in diseases (such as cancer and neurodevelopmental disorders). DisProt offers detailed information on the structural states of IDPs/IDRs, including state transitions, interactions, and their functions, all provided as curated annotations. One of the central activities of DisProt is the meticulous curation of experimental data from the literature. For this reason, to ensure that every expert and volunteer curator possesses the requisite knowledge for data evaluation, collection, and integration, training courses and curation materials are available. However, biocuration guidelines concur on the importance of developing robust guidelines that not only provide critical information about data consistency but also ensure data acquisition.This guideline aims to provide both biocurators and external users with best practices for manually curating IDPs and IDRs in DisProt. It describes every step of the literature curation process and provides use cases of IDP curation within DisProt. Database URL: https://disprot.org

    Diseño y desarrollo de una plataforma bioinformática para la integración, gestión y visualización de redes de interacción de proteínas e interactomas

    Get PDF
    [ES] El trabajo de investigación que se expone en esta tesis doctoral se centra en el ámbito de las interacciones entre proteínas y la definición global de los conjuntos de interacciones presentes en cada organismo, o interactomas, en forma de redes biomoleculares. Partiendo de las diferentes bases de datos públicas sobre interacciones entre proteínas se construye un sistema de integración de dichas interacciones y se generan interactomas con diferentes niveles de calidad en función del soporte experimental de las interacciones que contienen. Toda esta información se pone a disposición de la comunidad científica a través de una aplicación diseñada a tal efecto que, entre otras cosas, posibilita la visualización y anotación funcional de las redes de interacción generadas por el propio investigador. Dicha aplicación se ha denominado APID Interactomes y está libremente accesible en la URL http://apid.dep.usal.es

    Complex Portal 2022:New curation frontiers

    Get PDF
    International audienceThe Complex Portal (www.ebi.ac.uk/complexportal) is a manually curated, encyclopaedic database of macromolecular complexes with known function from a range of model organisms. It summarizes complex composition, topology and function along with links to a large range of domain-specific resources (i.e. wwPDB, EMDB and Reactome). Since the last update in 2019, we have produced a first draft complexome for Escherichia coli, maintained and updated that of Saccharomyces cerevisiae, added over 40 coronavirus complexes and increased the human complexome to over 1100 complexes that include approximately 200 complexes that act as targets for viral proteins or are part of the immune system. The display of protein features in ComplexViewer has been improved and the participant table is now colour-coordinated with the nodes in ComplexViewer. Community collaboration has expanded, for example by contributing to an analysis of putative transcription cofactors and providing data accessible to semantic web tools through Wikidata which is now populated with manually curated Complex Portal content through a new bot. Our data license is now CC0 to encourage data reuse. Users are encouraged to get in touch, provide us with feedback and send curation requests through the ‘Support’ link

    A high confidence, manually validated human blood plasma protein reference set

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The immense diagnostic potential of human plasma has prompted great interest and effort in cataloging its contents, exemplified by the Human Proteome Organization (HUPO) Plasma Proteome Project (PPP) pilot project. Due to challenges in obtaining a reliable blood plasma protein list, HUPO later re-analysed their own original dataset with a more stringent statistical treatment that resulted in a much reduced list of high confidence (at least 95%) proteins compared with their original findings. In order to facilitate the discovery of novel biomarkers in the future and to realize the full diagnostic potential of blood plasma, we feel that there is still a need for an ultra-high confidence reference list (at least 99% confidence) of blood plasma proteins.</p> <p>Methods</p> <p>To address the complexity and dynamic protein concentration range of the plasma proteome, we employed a linear ion-trap-Fourier transform (LTQ-FT) and a linear ion trap-Orbitrap (LTQ-Orbitrap) for mass spectrometry (MS) analysis. Both instruments allow the measurement of peptide masses in the low ppm range. Furthermore, we employed a statistical score that allows database peptide identification searching using the products of two consecutive stages of tandem mass spectrometry (MS3). The combination of MS3 with very high mass accuracy in the parent peptide allows peptide identification with orders of magnitude more confidence than that typically achieved.</p> <p>Results</p> <p>Herein we established a high confidence set of 697 blood plasma proteins and achieved a high 'average sequence coverage' of more than 14 peptides per protein and a median of 6 peptides per protein. All proteins annotated as belonging to the immunoglobulin family as well as all hypothetical proteins whose peptides completely matched immunoglobulin sequences were excluded from this protein list. We also compared the results of using two high-end MS instruments as well as the use of various peptide and protein separation approaches. Furthermore, we characterized the plasma proteins using cellular localization information, as well as comparing our list of proteins to data from other sources, including the HUPO PPP dataset.</p> <p>Conclusion</p> <p>Superior instrumentation combined with rigorous validation criteria gave rise to a set of 697 plasma proteins in which we have very high confidence, demonstrated by an exceptionally low false peptide identification rate of 0.29%.</p
    corecore