50 research outputs found
Best practices for the manual curation of Intrinsically Disordered Proteins in DisProt
The DisProt database is a significant resource containing manually curated
data on experimentally validated intrinsically disordered proteins (IDPs) and
regions (IDRs) from the literature. Developed in 2005, its primary goal was to
collect structural and functional information into proteins that lack a fixed
three-dimensional (3D) structure. Today, DisProt has evolved into a major
repository that not only collects experimental data but also contributes
significantly to our understanding of the IDPs/IDRs roles in various biological
processes, such as autophagy or the life cycle mechanisms in viruses, or their
involvement in diseases (such as cancer and neurodevelopmental disorders).
DisProt offers detailed information on the structural states of IDPs/IDRs,
including state transitions, interactions, and their functions, all provided as
curated annotations. One of the central activities of DisProt is the meticulous
curation of experimental data from the literature. For this reason, to ensure
that every expert and volunteer curator possesses the requisite knowledge for
data evaluation, collection, and integration, training courses and curation
materials are available. However, biocuration guidelines concur on the
importance of developing robust guidelines that not only provide critical
information about data consistency but also ensure data acquisition.This
guideline aims to provide both biocurators and external users with best
practices for manually curating IDPs and IDRs in DisProt. It describes every
step of the literature curation process and provides use cases of IDP curation
within DisProt.
Database URL: https://disprot.org
TFLink: an integrated gateway to access transcription factor-target gene interactions for multiple species
Analysis of transcriptional regulatory interactions and their comparisons across multiple species are crucial for progress in various fields in biology, from functional genomics to the evolution of signal transduction pathways. However, despite the rapidly growing body of data on regulatory interactions in several eukaryotes, no databases exist to provide curated high-quality information on transcription factor-target gene interactions for multiple species. Here, we address this gap by introducing the TFLink gateway, which uniquely provides experimentally explored and highly accurate information on transcription factor-target gene interactions (âŒ12 million), nucleotide sequences and genomic locations of transcription factor binding sites (âŒ9 million) for human and six model organisms: mouse, rat, zebrafish, fruit fly, worm and yeast by integrating 10 resources. TFLink provides user-friendly access to data on transcription factor-target gene interactions, interactive network visualizations and transcription factor binding sites, with cross-links to several other databases. Besides containing accurate information on transcription factors, with a clear labelling of the type/volume of the experiments (small-scale or high-throughput), the source database and the original publications, TFLink also provides a wealth of standardized regulatory data available for download in multiple formats. The database offers easy access to high-quality data for wet-lab researchers, supplies data for gene set enrichment analyses and facilitates systems biology and comparative gene regulation studies. Database URL https://tflink.net/
International Research Infrastucture Landscape 2019: A European Perspective
The book 'International Research Infrastucture Landscape 2019: A European Perspective' provides the final report of the RISCAPE-project, supported by the European Commission's Horizon 2020-project. The RISCAPE-project aims to provide a systematic, focused, high-quality, comprehensive, consistent and peer-reviewed international landscape analysis report on the position and complementarities of the major European RIs in the international Research Infrastructure landscape.University of Turku has contributed with the domain report on international Energy Research Infrastructures, which forms chapter 6 of the final book.</p
The IntAct database:Efficient access to fine-grained molecular interaction data
The IntAct molecular interaction database (https://www.ebi.ac.uk/intact) is a curated resource of molecular interactions, derived from the scientific literature and from direct data depositions. As of August 2021, IntAct provides more than one million binary interactions, curated by twelve global partners of the International Molecular Exchange consortium, for which the IntAct database provides a shared curation and dissemination platform. The IMEx curation policy has always emphasised a fine-grained data and curation model, aiming to capture the relevant experimental detail essential for the interpretation of the provided molecular interaction data. Here, we present recent curation focus and progress, as well as a completely redeveloped website which presents IntAct data in a much more user-friendly and detailed way
Complex Portal 2022:New curation frontiers
International audienceThe Complex Portal (www.ebi.ac.uk/complexportal) is a manually curated, encyclopaedic database of macromolecular complexes with known function from a range of model organisms. It summarizes complex composition, topology and function along with links to a large range of domain-specific resources (i.e. wwPDB, EMDB and Reactome). Since the last update in 2019, we have produced a first draft complexome for Escherichia coli, maintained and updated that of Saccharomyces cerevisiae, added over 40 coronavirus complexes and increased the human complexome to over 1100 complexes that include approximately 200 complexes that act as targets for viral proteins or are part of the immune system. The display of protein features in ComplexViewer has been improved and the participant table is now colour-coordinated with the nodes in ComplexViewer. Community collaboration has expanded, for example by contributing to an analysis of putative transcription cofactors and providing data accessible to semantic web tools through Wikidata which is now populated with manually curated Complex Portal content through a new bot. Our data license is now CC0 to encourage data reuse. Users are encouraged to get in touch, provide us with feedback and send curation requests through the âSupportâ link
A Computational Framework for Host-Pathogen Protein-Protein Interactions
Infectious diseases cause millions of illnesses and deaths every year, and raise great health concerns world widely. How to monitor and cure the infectious diseases has become a prevalent and intractable problem. Since the host-pathogen interactions are considered as the key infection processes at the molecular level for infectious diseases, there have been a large amount of researches focusing on the host-pathogen interactions towards the understanding of infection mechanisms and the development of novel therapeutic solutions. For years, the continuously development of technologies in biology has benefitted the wet lab-based experiments, such as small-scale biochemical, biophysical and genetic experiments and large-scale methods (for example yeast-two-hybrid analysis and cryogenic electron microscopy approach). As a result of past decades of efforts, there has been an exploded accumulation of biological data, which includes multi omics data, for example, the genomics data and proteomics data.
Thus, an initiative review of omics data has been conducted in Chapter 2, which has exclusively demonstrated the recent update of âomicsâ study, particularly focusing on proteomics and genomics. With the high-throughput technologies, the increasing amount of âomicsâ data, including genomics and proteomics, has even further boosted. An upsurge of interest for data analytics in bioinformatics comes as no surprise to the researchers from a variety of disciplines. Specifically, the astonishing rate at which genomics and proteomics data are generated leads the researchers into the realm of âBig Dataâ research. Chapter 2 is thus developed to providing an update of the omics background and the state-of-the-art developments in the omics area, with a focus on genomics data, from the perspective of big data analytics..
Non-coding RNA regulatory networks
It is well established that the vast majority of human RNA transcripts do not encode for proteins and that non-coding RNAs regulate cell physiology and shape cellular functions. A subset of them is involved in gene regulation at different levels, from epigenetic gene silencing to post-transcriptional regulation of mRNA stability. Notably, the aberrant expression of many non-coding RNAs has been associated with aggressive pathologies. Rapid advances in network biology indicates that the robustness of cellular processes is the result of specific properties of biological networks such as scale-free degree distribution and hierarchical modularity, suggesting that regulatory network analyses could provide new insights on gene regulation and dysfunction mechanisms. In this study we present an overview of public repositories where non-coding RNA-regulatory interactions are collected and annotated, we discuss unresolved questions for data integration and we recall existing resources to build and analyse networks
Developing a framework for semi-automated rule-based modelling for neuroscience research
Dynamic modelling has significantly improved our understanding of the complex
molecular mechanisms underpinning neurobiological processes. The detailed
mechanistic insights these models offer depend on the availability of
a diverse range of experimental observations. Despite the huge increase in
biomolecular data generation from novel high-throughput technologies and
extensive research in bioinformatics and dynamical modelling, efficient creation
of accurate dynamical models remains highly challenging. To study this
problem, three perspectives are considered: comparison of modelling methods,
prioritisation of results and analysis of primary data sets. Firstly, I compare two
models of the DARPP-32 signalling network: a classically defined model with
ordinary differential equations (ODE) and its equivalent, defined using a novel
rule-based (RB) paradigm. The RB model recapitulates the results of the ODE
model, but offers a more expressive and flexible syntax that can efficiently handle
the âcombinatorial complexityâ commonly found in signalling networks,
and allows ready access to fine-grain details of the emerging system. RB modelling
is particularly well suited to encoding protein-centred features such as
domain information and post-translational modification sites. Secondly, I propose
a new pipeline for prioritisation of molecular species that arise during
model simulation using a recently developed algorithm based on multivariate
mutual information (CorEx) coupled with global sensitivity analysis (GSA) using
the RKappa package. To efficiently evaluate the importance of parameters,
Hilber-Schmidt Independence Criterion (HSIC)-based indices are aggregated
into a weighted network that allows compact analysis of the model across conditions.
Finally, I describe an approach for the development of disease-specific
dynamical models using genes known to be associated with Attention Deficit
Hyperactivity Disorder (ADHD) as an exemplar. Candidate disease genes are
mapped to a selection of datasets that are potentially relevant to the modelling
process (e.g. interactions between proteins and domains, protein-domain and
kinase-substrates mappings) and these are jointly analysed using network clustering
and pathway enrichment analyses to evaluate their coverage and utility
in developing rule-based models
International Research Infrastructure Landscape 2019 : A European Perspective
The report is the final product of the RISCAPE project, funded by the European Commission H2020 programme
An intrinsically disordered proteins community for ELIXIR.
Intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) are now recognised as major determinants in cellular regulation. This white paper presents a roadmap for future e-infrastructure developments in the field of IDP research within the ELIXIR framework. The goal of these developments is to drive the creation of high-quality tools and resources to support the identification, analysis and functional characterisation of IDPs. The roadmap is the result of a workshop titled "An intrinsically disordered protein user community proposal for ELIXIR" held at the University of Padua. The workshop, and further consultation with the members of the wider IDP community, identified the key priority areas for the roadmap including the development of standards for data annotation, storage and dissemination; integration of IDP data into the ELIXIR Core Data Resources; and the creation of benchmarking criteria for IDP-related software. Here, we discuss these areas of priority, how they can be implemented in cooperation with the ELIXIR platforms, and their connections to existing ELIXIR Communities and international consortia. The article provides a preliminary blueprint for an IDP Community in ELIXIR and is an appeal to identify and involve new stakeholders