12 research outputs found

    The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest

    Full text link
    Much of the complexity within cells arises from functional and regulatory interactions among proteins. The core of these interactions is increasingly known, but novel interactions continue to be discovered, and the information remains scattered across different database resources, experimental modalities and levels of mechanistic detail. The STRING database (https://string-db.org/) systematically collects and integrates protein-protein interactions-both physical interactions as well as functional associations. The data originate from a number of sources: automated text mining of the scientific literature, computational interaction predictions from co-expression, conserved genomic context, databases of interaction experiments and known complexes/pathways from curated sources. All of these interactions are critically assessed, scored, and subsequently automatically transferred to less well-studied organisms using hierarchical orthology information. The data can be accessed via the website, but also programmatically and via bulk downloads. The most recent developments in STRING (version 12.0) are: (i) it is now possible to create, browse and analyze a full interaction network for any novel genome of interest, by submitting its complement of encoded proteins, (ii) the co-expression channel now uses variational auto-encoders to predict interactions, and it covers two new sources, single-cell RNA-seq and experimental proteomics data and (iii) the confidence in each experimentally derived interaction is now estimated based on the detection method used, and communicated to the user in the web-interface. Furthermore, STRING continues to enhance its facilities for functional enrichment analysis, which are now fully available also for user-submitted genomes

    Bioinformatics approaches for the analysis and visualization of biological networks

    No full text
    Τα δίκτυα είναι ένας από τους πιο συνηθισμένους τρόπους για την αναπαράσταση των βιολογικών συστημάτων ως σύνθετων συνόλων δυαδικών αλληλεπιδράσεων ή σχέσεων μεταξύ διαφορετικών βιοοντοτήτων. Σε αυτή τη διατριβή, συζητάμε τις βασικές έννοιες της θεωρίας γραφημάτων και τους διάφορους τύπους γραφημάτων, καθώς και τις διαθέσιμες δομές δεδομένων για αποθήκευση και ανάγνωση γράφων. Επιπλέον, περιγράφουμε αρκετές ιδιότητες δικτύων και επισημαίνουμε μερικές από τις ευρέως χρησιμοποιούμενες τοπολογικές ιδιότητες. Αναφέρουμε εν συντομία τα μοτίβα και τα μοντέλα των γράφων και σχολιάζουμε περαιτέρω τους τύπους βιολογικών και βιοϊατρικών δικτύων μαζί με τις αντίστοιχες μορφές αρχείων που μπορούν να διαβαστούν από τον υπολογιστή και από τον άνθρωπο. Επιπλέον, συζητάμε μια ποικιλία αλγορίθμων και μετρήσεων για αναλύσεις δικτύων σχετικά με τη οπτικοποίηση γράφων και την ομαδοποίηση δεδομένων, καθώς και τα τρέχοντα σύγχρονα εργαλεία. Ακόμα, παρέχουμε παραδείγματα επίδειξης για τις βάσεις δεδομένων STRING και Reactome και επισημαίνουμε τις δυνατότητες και τη λειτουργικότητά τους και δείχνουμε μια απλή μελέτη περίπτωσης για το πώς αυτά τα εργαλεία μπορούν να συνδυαστούν για βαθύτερη βιολογική ανάλυση και λειτουργικό εμπλουτισμό. Τέλος, προτείνουμε τη NORMA και το NAP, δύο διαδικτυακά εργαλεία για διαδραστικό χειρισμό πολλαπλών δικτύων παράλληλα. Συγκεκριμένα, η NORMA επικεντρώνεται στην οπτικοποίηση περιοχών δικτύων που συνοδεύονται με σχολιασμό και την τοπολογική ανάλυση και είναι σε θέση να χειρίζεται πολλαπλά δίκτυα και σχολιασμούς ταυτόχρονα. Οι προϋπολογισμένοι σχολιασμοί (π.χ. γονιδιακές οντολογίες / βιολογικά μονοπάτια ή ομαδοποίηση αποτελεσμάτων) μπορούν να απεικονιστούν σε ένα δίκτυο είτε ως χρωματιστοί κόμβοι γραφημάτων πίτας είτε ως κυρτά σχήματα με χρώμα σε στυλ διαγράμματος Venn. Σε περίπτωση που δεν υπάρχει σχολιασμός, προσφέρονται αλγόριθμοι για αυτοματοποιημένο εντοπισμό κοινοτήτων. Εν συντομία, με τη NORMA, οι χρήστες μπορούν να κωδικοποιήσουν ταυτόχρονα τρεις τύπους πληροφορίας. Αυτά είναι: i) το δίκτυο, ii) τις κοινότητες ή τους σχολιασμούς και iii) τις κατηγορίες κόμβων ή τις τιμές έκφρασης. Τέλος, η NORMA προσφέρει βασική τοπολογική ανάλυση και άμεση τοπολογική σύγκριση σε οποιοδήποτε από τα επιλεγμένα δίκτυα. Η υπηρεσία της NORMA είναι διαθέσιμη στη διεύθυνση: http://bib.fleming.gr:3838/NORMA. Ο κώδικας διατίθεται στη διεύθυνση: https://github.com/PavlopoulosLab/NORMA. Από την άλλη πλευρά, το NAP συγκρίνει άμεσα τα τοπολογικά χαρακτηριστικά πολλαπλών δικτύων ταυτόχρονα και προς το παρόν προσφέρει οπτικοποίηση δικτύων σε 2D και 3D, καθώς και οπτικές συγκρίσεις τοπικών χαρακτηριστικών που βασίζονται σε κόμβους και ακμές, τόσο ως ραβδογράμματα είτε ως πίνακες. Το NAP είναι πλήρως διαδραστικό και οι χρήστες μπορούν εύκολα να εξάγουν και να οπτικοποιήσουν τα κοινά σημεία μεταξύ οποιουδήποτε ζεύγους δικτύων. Το NAP υποστηρίζει σταθμισμένους, μη σταθμισμένους, κατευθυνόμενους, μη κατευθυνόμενους και διμερείς γράφους και διατίθεται στη διεύθυνση: http://bib.fleming.gr:3838/NAP/. Ο κώδικας του βρίσκεται στη διεύθυνση: https://github.com/PavlopoulosLab/NAP.Networks are one of the most common ways to represent biological systems as complex sets of binary interactions or relations between different bioentities. In this dissertation thesis, we discuss the basic graph theory concepts and the various graph types, as well as the available data structures for storing and reading graphs. In addition, we describe several global network properties and we highlight some of the widely used network topological features. We briefly mention the network patterns and models and we further comment on the types of biological and biomedical networks along with their corresponding computer- and human-readable file formats. In addition, we discuss a variety of algorithms and metrics for network analyses regarding graph drawing and clustering as well as the current state-of-the-art tools. Furthermore, we provide demonstration examples for the STRING and Reactome databases and we highlight their features and functionality, discuss their front- and back-end interfaces and show a simple case study of how these tools can be combined for deeper biological analysis and functional enrichment. Finally, we propose NORMA and NAP, two web tools for interactive handling of multiple networks. Specifically, NORMA focuses on interactive network annotation visualization and topological analysis and it is able to handle multiple networks and annotations simultaneously. Precalculated annotations (e.g. Gene Ontology/Pathway enrichment or clustering results) can be uploaded and visualized in a network either as colored pie-chart nodes or as color-filled convex hulls in a Venn-diagram-like style. In the case where no annotation exists, algorithms for automated community detection are offered. Briefly, with NORMA, users can encode three types of information simultaneously. These are: i) the network, ii) the communities or annotations and iii) node categories or expression values. Finally, NORMA offers basic topological analysis and direct topological comparison across any of the selected networks. NORMA service is available at: http://bib.fleming.gr:3838/NORMA. Code is available at: https://github.com/PavlopoulosLab/NORMA. On the other hand, NAP directly compares the topological features of multiple networks simultaneously and currently offers both 2D and 3D network visualization as well as visual comparisons of node- and edge-based topological features both as bar charts or as a scatterplot matrix. NAP is fully interactive and users can easily export and visualize the intersection between any pair of networks using Venn diagrams or a 2D and a 3D multi-layer graph-based visualization. NAP supports weighted, unweighted, directed, undirected and bipartite graphs and is available at: http://bib.fleming.gr:3838/NAP/. Its code can be found at: https://github.com/PavlopoulosLab/NAP

    U-CIE [/juː 'siː/]: Color encoding of high-dimensional data.

    No full text
    Data visualization is essential to discover patterns and anomalies in large high-dimensional datasets. New dimensionality reduction techniques have thus been developed for visualizing omics data, in particular from single-cell studies. However, jointly showing several types of data, for example, single-cell expression and gene networks, remains a challenge. Here, we present 'U-CIE, a visualization method that encodes arbitrary high-dimensional data as colors using a combination of dimensionality reduction and the CIELAB color space to retain the original structure to the extent possible. U-CIE first uses UMAP to reduce high-dimensional data to three dimensions, partially preserving distances between entities. Next, it embeds the resulting three-dimensional representation within the CIELAB color space. This color model was designed to be perceptually uniform, meaning that the Euclidean distance between any two points should correspond to their relative perceptual difference. Therefore, the combination of UMAP and CIELAB thus results in a color encoding that captures much of the structure of the original high-dimensional data. We illustrate its broad applicability by visualizing single-cell data on a protein network and metagenomic data on a world map and on scatter plots

    NORMA: The Network Makeup Artist — A Web Tool for Network Annotation Visualization

    No full text
    The Network Makeup Artist (NORMA) is a web tool for interactive network annotation visualization and topological analysis, able to handle multiple networks and annotations simultaneously. Precalculated annotations (e.g., Gene Ontology, Pathway enrichment, community detection, or clustering results) can be uploaded and visualized in a network, either as colored pie-chart nodes or as color-filled areas in a 2D/3D Venn-diagram-like style. In the case where no annotation exists, algorithms for automated community detection are offered. Users can adjust the network views using standard layout algorithms or allow NORMA to slightly modify them for visually better group separation. Once a network view is set, users can interactively select and highlight any group of interest in order to generate publication-ready figures. Briefly, with NORMA, users can encode three types of information simultaneously. These are 1) the network, 2) the communities or annotations of interest, and 3) node categories or expression values. Finally, NORMA offers basic topological analysis and direct topological comparison across any of the selected networks. NORMA service is available at http://norma.pavlopouloslab.info, whereas the code is available at https://github.com/PavlopoulosLab/NORMA

    FAVA: High-quality functional association networks inferred from scRNA-seq and proteomics data

    No full text
    AbstractProtein networks are commonly used for understanding the interplay between proteins in the cell as well as for visualizing omics data. Unfortunately, most existing high-quality networks are heavily biased by data availability, in the sense that well-studied proteins have many more interactions than understudied proteins. To create networks that can help elucidate functions for the latter, we must start from data that are not affected by this literature bias, in other words, from omics data such as single cell RNA-seq (scRNA-seq) and proteomics. While networks can be inferred from such data through simple co-expression analysis, this approach does not work well due to high sparseness (many transcripts/proteins are not consistently observed in each cell/sample) and redundancy (many similar cells/samples are analyzed) of such data. We have therefore developed FAVA, Functional Associations using Variational Autoencoders, which deals with both issues by compressing these high-dimensional data into a dense, low-dimensional latent space. We demonstrate that calculating correlations in this latent space results in much improved networks compared to the original representation for large-scale scRNA-seq and proteomics data from the Human Protein Atlas, and from PRIDE, respectively. We show that these networks, which given the nature of the input data should be free of literature bias, indeed have much better coverage of understudied proteins than existing networks.</jats:p

    scverse/spatialdata: v0.0.14

    No full text
    Added Major get_extent() function to compute bounding box of the data Minor testing against pre-release packages Fixed Fixed bug with get_values(): ignoring background channel in label

    scverse/spatialdata: v0.0.15

    No full text
    &lt;h3&gt;Fixed&lt;/h3&gt; &lt;ul&gt; &lt;li&gt;Fixed data type error with SpatialImage and MultiscaleSpatialImage&lt;/li&gt; &lt;/ul&gt

    scverse/spatialdata: v0.0.13

    No full text
    Added polygon_query() support for images #358 Fixed Fix missing c_coords argument in blobs multiscale #342 Replaced hardcoded string with instance_key #34
    corecore