17 research outputs found

    The Human Plasma Membrane Peripherome: Visualization and Analysis of Interactions

    Full text link
    A major part of membrane function is conducted by proteins, both integral and peripheral. Peripheral membrane proteins temporarily adhere to biological membranes, either to the lipid bilayer or to integral membrane proteins with non-covalent interactions. The aim of this study was to construct and analyze the interactions of the human plasma membrane peripheral proteins (peripherome hereinafter). For this purpose, we collected a dataset of peripheral proteins of the human plasma membrane. We also collected a dataset of experimentally verified interactions for these proteins. The interaction network created from this dataset has been visualized using Cytoscape. We grouped the proteins based on their subcellular location and clustered them using the MCL algorithm in order to detect functional modules. Moreover, functional and graph theory based analyses have been performed to assess biological features of the network. Interaction data with drug molecules show that ~10% of peripheral membrane proteins are targets for approved drugs, suggesting their potential implications in disease. In conclusion, we reveal novel features and properties regarding the protein-protein interaction network created by peripheral proteins of the human plasma membrane.Comment: 39 pages, 5 figures, 3 supplement figures, under review in BMR

    The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest

    Full text link
    Much of the complexity within cells arises from functional and regulatory interactions among proteins. The core of these interactions is increasingly known, but novel interactions continue to be discovered, and the information remains scattered across different database resources, experimental modalities and levels of mechanistic detail. The STRING database (https://string-db.org/) systematically collects and integrates protein-protein interactions-both physical interactions as well as functional associations. The data originate from a number of sources: automated text mining of the scientific literature, computational interaction predictions from co-expression, conserved genomic context, databases of interaction experiments and known complexes/pathways from curated sources. All of these interactions are critically assessed, scored, and subsequently automatically transferred to less well-studied organisms using hierarchical orthology information. The data can be accessed via the website, but also programmatically and via bulk downloads. The most recent developments in STRING (version 12.0) are: (i) it is now possible to create, browse and analyze a full interaction network for any novel genome of interest, by submitting its complement of encoded proteins, (ii) the co-expression channel now uses variational auto-encoders to predict interactions, and it covers two new sources, single-cell RNA-seq and experimental proteomics data and (iii) the confidence in each experimentally derived interaction is now estimated based on the detection method used, and communicated to the user in the web-interface. Furthermore, STRING continues to enhance its facilities for functional enrichment analysis, which are now fully available also for user-submitted genomes

    The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets.

    Full text link
    Cellular life depends on a complex web of functional associations between biomolecules. Among these associations, protein-protein interactions are particularly important due to their versatility, specificity and adaptability. The STRING database aims to integrate all known and predicted associations between proteins, including both physical interactions as well as functional associations. To achieve this, STRING collects and scores evidence from a number of sources: (i) automated text mining of the scientific literature, (ii) databases of interaction experiments and annotated complexes/pathways, (iii) computational interaction predictions from co-expression and from conserved genomic context and (iv) systematic transfers of interaction evidence from one organism to another. STRING aims for wide coverage; the upcoming version 11.5 of the resource will contain more than 14 000 organisms. In this update paper, we describe changes to the text-mining system, a new scoring-mode for physical interactions, as well as extensive user interface features for customizing, extending and sharing protein networks. In addition, we describe how to query STRING with genome-wide, experimental data, including the automated detection of enriched functionalities and potential biases in the user's query data. The STRING resource is available online, at https://string-db.org/

    The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets

    Get PDF
    Cellular life depends on a complex web of functional associations between biomolecules. Among these associations, protein–protein interactions are particularly important due to their versatility, specificity and adaptability. The STRING database aims to integrate all known and predicted associations between proteins, including both physical interactions as well as functional associations. To achieve this, STRING collects and scores evidence from a number of sources: (i) automated text mining of the scientific literature, (ii) databases of interaction experiments and annotated complexes/pathways, (iii) computational interaction predictions from co-expression and from conserved genomic context and (iv) systematic transfers of interaction evidence from one organism to another. STRING aims for wide coverage; the upcoming version 11.5 of the resource will contain more than 14 000 organisms. In this update paper, we describe changes to the text-mining system, a new scoring-mode for physical interactions, as well as extensive user interface features for customizing, extending and sharing protein networks. In addition, we describe how to query STRING with genome-wide, experimental data, including the automated detection of enriched functionalities and potential biases in the user's query data. The STRING resource is available online, at https://string-db.org/.</p

    katnastou/BioBERT-based-entity-type-classifier: Release for submission

    No full text
    &lt;p&gt;This is the first release of the repository&lt;/p&gt

    Blocklist expansion with deep learning-based models

    No full text
    &lt;h2&gt;Files in this repository&lt;/h2&gt;&lt;h3&gt;Tagger related files&lt;/h3&gt;&lt;p&gt;&lt;a href="https://zenodo.org/api/records/10008720/files/dictionary-files-tagger-STRINGv12.zip/content"&gt;dictionary-files-tagger-STRINGv12.zip&lt;/a&gt;: this directory contains all the dictionary files used for the tagger runs for all Jensenlab resources (including STRINGv12). It also contains files only relevant for this work, namely: curated_local.tsv &amp; curated_global.tsv which are used for running tagger only with curated blocklists, blacklist_terms_over_10M.txt &amp; blacklist_terms_over_10M+auto_only_list.txt which are used for the automatic blocklist runs only, and empty_global.tsv which is used for the run without blocklists and regex.&lt;/p&gt;&lt;p&gt;&lt;a href="https://zenodo.org/api/records/10008720/files/tagger-no-regex.tar.gz/content"&gt;tagger-no-regex.tar.gz : &lt;/a&gt;this directory contains the &lt;a href="https://github.com/larsjuhljensen/tagger "&gt;Jensenlab tagger &lt;/a&gt;with the tagger_core.h header updated so as not to use a regex to block things. This version of the tagger has been used only for the no-blocklist run of the paper. In order to set it up one needs to do the following (for more details check the &lt;a href="https://github.com/larsjuhljensen/tagger"&gt;original repo&lt;/a&gt;):&lt;/p&gt;&lt;blockquote&gt;&lt;p&gt;wget https://zenodo.org/api/records/10008720/files/tagger-no-regex.tar.gz&lt;/p&gt;&lt;p&gt;tar -xzvf tagger-no-regex.tar.gz&lt;/p&gt;&lt;p&gt;cd tagger-no-regex&lt;/p&gt;&lt;p&gt;make tagcorpus&lt;/p&gt;&lt;/blockquote&gt;&lt;p&gt;&nbsp;&lt;/p&gt;&lt;h3&gt;Training, development and test sets&lt;/h3&gt;&lt;p&gt;&lt;a href="https://zenodo.org/api/records/10008720/files/125k-w100_grid_search_set.tar.gz/content"&gt;125k-w100_grid_search_set.tar.gz&lt;/a&gt;: A &lt;strong&gt;small dataset&lt;/strong&gt; with 125,000 training and 62,500 development examples used to perform a grid search to detect the best set of hyperparameters&lt;/p&gt;&lt;p&gt;&lt;a href="https://zenodo.org/api/records/10008720/files/12.5M-w100_train_test_set.tar.gz/content"&gt;12.5M-w100_train_test_set.tar.gz&lt;/a&gt;: A &lt;strong&gt;large dataset&lt;/strong&gt; of 12.5 million training and 62,500 testing examples to train the model used for prediction with the set of best hyperparameters identified above&nbsp;&lt;/p&gt;&lt;p&gt;&nbsp;&lt;/p&gt;&lt;h3&gt;Transformer-based model&lt;/h3&gt;&lt;p&gt;&lt;a href="https://zenodo.org/api/records/10008720/files/bert-base-finetuned-large-set.tar.gz/content"&gt;bert-base-finetuned-large-set.tar.gz&lt;/a&gt;: this is the &lt;strong&gt;TensorFlow&lt;/strong&gt; model fine-tuned on the large dataset (12.5M) that is used for all the prediction runs. The model is finetuned starting from the &lt;a href="http://nlp.dmis.korea.edu/projects/biobert-2020-checkpoints/biobert_v1.1_pubmed.tar.gz"&gt;BioBERT base v1.1 model&lt;/a&gt;.&nbsp;&lt;/p&gt;&lt;p&gt;&nbsp;&lt;/p&gt;&lt;h3&gt;Prediction runs input generation&lt;/h3&gt;&lt;p&gt;&lt;br&gt;The command used to run tagger before running predictions using the &lt;a href="https://zenodo.org/api/records/10008720/files/bert-base-finetuned-large-set.tar.gz/content"&gt;bert-base-finetuned-large-set&lt;/a&gt; model:&lt;/p&gt;&lt;blockquote&gt;&lt;p&gt;gzip -cd `ls -1 pmc/*.en.merged.filtered.tsv.gz` `ls -1r pubmed/*.tsv.gz` | cat dictionary/excluded_documents.txt - | tagger/tagcorpus --threads=16 --autodetect --types=dictionary/curated_types.tsv --entities=dictionary/all_entities.tsv --names=dictionary/all_names_textmining.tsv --groups=dictionary/all_groups.tsv --stopwords=dictionary/curated_global.tsv --local-stopwords=dictionary/curated_local.tsv --type-pairs=dictionary/all_type_pairs.tsv --out-matches=all_matches.tsv&lt;/p&gt;&lt;/blockquote&gt;&lt;p&gt;&lt;strong&gt;Input documents &lt;/strong&gt;for large-scale execution: all &lt;a href="https://a3s.fi/s1000/PubMed-input.tar.gz"&gt;PubMed abstracts&lt;/a&gt; (as of August 2022) and all full-texts available in the &lt;a href="https://a3s.fi/s1000/PMC-OA-input.tar.gz"&gt;PubmedCentral BioC&lt;/a&gt; text mining collection (as of April 2022). The files are converted to a &lt;a href="https://a3s.fi/s1000/database_documents.tsv.gz"&gt;tab-delimited &lt;/a&gt;format in order to convert the output to a format compatible with the RE system (see below).&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Input dictionary files&lt;/strong&gt;: all the files necessary to execute the command above are available in &lt;a href="https://zenodo.org/api/records/10008720/files/dictionary-files-tagger-STRINGv12.zip/content"&gt;dictionary-files-tagger-STRINGv12.zip&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Tagger output&lt;/strong&gt;: we filter the results of the tagger run down to gene or gene products, species, diseases, and chemicals with the process described in our &lt;a href="https://github.com/katnastou/BioBERT-based-entity-type-classifier/tree/main/generate_prediction_inputs"&gt;GitHub repository&nbsp;&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&nbsp;&lt;/p&gt;&lt;h3&gt;Blocklist files&lt;/h3&gt;&lt;p&gt;&lt;a href="https://zenodo.org/uploads/10008720"&gt;curated+auto-blocklists.tar.gz&lt;/a&gt;: a combined automatically generated and manually curated blocklist used for the &lt;strong&gt;curated+auto&lt;/strong&gt; runs in the paper. This is the list currently used for the tagger runs for all Jensenlab resources&lt;/p&gt;&lt;p&gt;&lt;a href="https://zenodo.org/uploads/10008720"&gt;curated-only-blocklists.tar.gz&lt;/a&gt;: the manually curated blocklist used for the &lt;strong&gt;curated_only&lt;/strong&gt; runs in the paper&lt;/p&gt;&lt;p&gt;&lt;a href="https://zenodo.org/uploads/10008720"&gt;auto-only-blocklists.tar.gz&lt;/a&gt;: the automatically generated blocklist used for the &lt;strong&gt;auto_only&lt;/strong&gt; runs in the paper&lt;/p&gt

    Cytoscape stringApp 2.0: Analysis and Visualization of Heterogeneous Biological Networks.

    No full text
    Biological networks are often used to represent complex biological systems, which can contain several types of entities. Analysis and visualization of such networks is supported by the Cytoscape software tool and its many apps. While earlier versions of stringApp focused on providing intraspecies protein-protein interactions from the STRING database, the new stringApp 2.0 greatly improves the support for heterogeneous networks. Here, we highlight new functionality that makes it possible to create networks that contain proteins and interactions from STRING as well as other biological entities and associations from other sources. We exemplify this by complementing a published SARS-CoV-2 interactome with interactions from STRING. We have also extended stringApp with new data and query functionality for protein-protein interactions between eukaryotic parasites and their hosts. We show how this can be used to retrieve and visualize a cross-species network for a malaria parasite, its host, and its vector. Finally, the latest stringApp version has an improved user interface, allows retrieval of both functional associations and physical interactions, and supports group-wise enrichment analysis of different parts of a network to aid biological interpretation. stringApp is freely available at https://apps.cytoscape.org/apps/stringapp

    The amyloid interactome: Exploring protein aggregation

    No full text
    <div><p>Protein-protein interactions are the quintessence of physiological activities, but also participate in pathological conditions. Amyloid formation, an abnormal protein-protein interaction process, is a widespread phenomenon in divergent proteins and peptides, resulting in a variety of aggregation disorders. The complexity of the mechanisms underlying amyloid formation/amyloidogenicity is a matter of great scientific interest, since their revelation will provide important insight on principles governing protein misfolding, self-assembly and aggregation. The implication of more than one protein in the progression of different aggregation disorders, together with the cited synergistic occurrence between amyloidogenic proteins, highlights the necessity for a more universal approach, during the study of these proteins. In an attempt to address this pivotal need we constructed and analyzed the human amyloid interactome, a protein-protein interaction network of amyloidogenic proteins and their experimentally verified interactors. This network assembled known interconnections between well-characterized amyloidogenic proteins and proteins related to amyloid fibril formation. The consecutive extended computational analysis revealed significant topological characteristics and unraveled the functional roles of all constituent elements. This study introduces a detailed protein map of amyloidogenicity that will aid immensely towards separate intervention strategies, specifically targeting sub-networks of significant nodes, in an attempt to design possible novel therapeutics for aggregation disorders.</p></div

    Cytoscape stringApp 2.0: Analysis and Visualization of Heterogeneous Biological Networks

    Full text link
    Biological networks are often used to represent complex biological systems, which can contain several types of entities. Analysis and visualization of such networks is supported by the Cytoscape software tool and its many apps. While earlier versions of stringApp focused on providing intraspecies protein-protein interactions from the STRING database, the new stringApp 2.0 greatly improves the support for heterogeneous networks. Here, we highlight new functionality that makes it possible to create networks that contain proteins and interactions from STRING as well as other biological entities and associations from other sources. We exemplify this by complementing a published SARS-CoV-2 interactome with interactions from STRING. We have also extended stringApp with new data and query functionality for protein-protein interactions between eukaryotic parasites and their hosts. We show how this can be used to retrieve and visualize a cross-species network for a malaria parasite, its host, and its vector. Finally, the latest stringApp version has an improved user interface, allows retrieval of both functional associations and physical interactions, and supports group-wise enrichment analysis of different parts of a network to aid biological interpretation. stringApp is freely available at https://apps.cytoscape.org/apps/stringapp
    corecore