18 research outputs found

    Refining Ensembles of Predicted Gene Regulatory Networks Based on Characteristic Interaction Sets

    Get PDF
    Different ensemble voting approaches have been successfully applied for reverse-engineering of gene regulatory networks. They are based on the assumption that a good approximation of true network structure can be derived by considering the frequencies of individual interactions in a large number of predicted networks. Such approximations are typically superior in terms of prediction quality and robustness as compared to considering a single best scoring network only. Nevertheless, ensemble approaches only work well if the predicted gene regulatory networks are sufficiently similar to each other. If the topologies of predicted networks are considerably different, an ensemble of all networks obscures interesting individual characteristics. Instead, networks should be grouped according to local topological similarities and ensemble voting performed for each group separately. We argue that the presence of sets of co-occurring interactions is a suitable indicator for grouping predicted networks. A stepwise bottom-up procedure is proposed, where first mutual dependencies between pairs of interactions are derived from predicted networks. Pairs of co-occurring interactions are subsequently extended to derive characteristic interaction sets that distinguish groups of networks. Finally, ensemble voting is applied separately to the resulting topologically similar groups of networks to create distinct group-ensembles. Ensembles of topologically similar networks constitute distinct hypotheses about the reference network structure. Such group-ensembles are easier to interpret as their characteristic topology becomes clear and dependencies between interactions are known. The availability of distinct hypotheses facilitates the design of further experiments to distinguish between plausible network structures. The proposed procedure is a reasonable refinement step for non-deterministic reverse-engineering applications that produce a large number of candidate predictions for a gene regulatory network, e. g. due to probabilistic optimization or a cross-validation procedure

    An end-to-end workflow for multiplexed image processing and analysis

    Full text link
    Simultaneous profiling of the spatial distributions of multiple biological molecules at single-cell resolution has recently been enabled by the development of highly multiplexed imaging technologies. Extracting and analyzing biologically relevant information contained in complex imaging data requires the use of a diverse set of computational tools and algorithms. Here, we report the development of a user-friendly, customizable, and interoperable workflow for processing and analyzing data generated by highly multiplexed imaging technologies. The steinbock framework supports image pre-processing, segmentation, feature extraction, and standardized data export. Each step is performed in a reproducible fashion. The imcRtools R/Bioconductor package forms the bridge between image processing and single-cell analysis by directly importing data generated by steinbock. The package further supports spatial data analysis and integrates with tools developed within the Bio-conductor project. Together, the tools described in this workflow facilitate analyses of multiplexed imaging raw data at the single-cell and spatial level

    Entropy and AUPRC evaluation results.

    No full text
    <p>(A) The entropy of group-ensembles is on average decreased to 45% as compared to the entropy of the ensemble of all networks (full-ensemble). This is caused by the reduced fraction of low confidence interactions. (B) AUPRCs of group-ensembles are increased if their characteristic sets are present in the reference. Characteristic set precision ranges between 1 (all interactions are present in the reference) and 0 (no interaction is present in the reference). A small amount of horizontal jitter (<0.02) was added to the precision values for better visualization. The red lines indicate identity. (C) Rejecting alternative hypothesis by testing for the presence of characteristic set interactions (white boxplots) in general increases AUPRC, while testing for other low confidence interactions (gray boxplots) has a less pronounced or even negative effect. Thus, interactions that are predicted to be co-occurring with other interactions are preferred targets of further experimental verification. The full-ensemble AUPRC distributions are shown as dark-gray boxplots.</p

    Performance of reverse-engineering for varying network sizes and experimental settings.

    No full text
    <p>For each of the twelve combinations of size and experimental setting, 300 random reference networks were created. For each reference, a wild-type time-series and a varying number of knockout perturbations were simulated. A) Number of genes in networks. B) Number of different random single knockout experiments simulated. C) Number of different random double knockout experiments simulated. D) Percentage of cases where a predicted network was identical to the reference. E) AUPRC of the ensemble of all networks (mean/standard deviation). F) Percentage of cases where characteristic interaction sets have been identified. G) Number of runs where 2/3/4/5 characteristic sets were identified. More than 5 sets were not observed.</p

    Reasons for ensemble averaging and its drawback.

    No full text
    <p>A) The hypotheses space <i>H</i> (black shape) contains all networks that can be represented by the applied mathematical framework. There might be no single optimum, as several different network structures might score equally well and thus are equally valid (blue area). Additionally, optimization procedures starting from different initial parameterization might get stuck at local optima and create suboptimal predictions (red dots). If the applied framework is adequate, the reference structure is included in <i>H</i> (green square) and could be predicted by the optimization procedure. Otherwise, predicted high scoring networks should be at least similar to the reference. Here, all high scoring predicted networks are very similar to each other and to the reference. In such a case, the frequency of an interaction in all networks is a reliable indicator for the confidence of an effector-target gene relation, thus applying ensemble voting is advisable. B) Depending on the reference structure, the applied mathematical framework, and the available experimental data, several groups of topologically different high scoring networks might be predicted by a probabilistic reverse-engineering algorithm (blue areas <i>I</i>, <i>II</i> and <i>III</i>). Combining all of these structurally strongly different networks by ensemble voting would obscure characteristics of individual groups of networks. We suggest the presence of sets of co-occurring interactions as a reasonable criterion for identification and delimitation of these groups.</p

    Illustration of an ensemble.

    No full text
    <p>An ensemble of several hundred predicted networks is created by calculating the frequencies of interactions. Reverse-engineering algorithms may produce suboptimal predictions, thus a certain amount of (random) variations in network topologies has to be expected. A) Ensemble average with annotated relative frequencies for activating (blue) and inhibiting (red) interactions. High confidence interactions (bold) are present in nearly all networks. High confidence non-interactions (dotted) are missing in most. Interactions present in subsets of networks have intermediate frequencies and are considered as low confidence interactions. B) Interactions connecting genes 3 to 6 constitute two characteristic sets (low confidence interactions). Either the left set of interactions or the right one is realized in predicted networks, but no mixture of sets or subsets. The co-occurrence of these interactions is not apparent in the ensemble. C) Interactions affecting gene 3 are mutually exclusive, but do not co-occur with other interactions. They can occur in combination with any of the characteristic sets and constitute an unspecific, highly variable sub-region of predicted networks.</p

    Example for extracted group-ensembles.

    No full text
    <p>Using simulated data from a random reference network (A), the applied reverse-engineering algorithm created a set of networks which were combined to an ensemble (B). Two group-ensembles (C,D) were derived using the described characteristic interaction set extraction approach. Both group-ensembles explain the simulated data very well (average RMSD 0.075 and 0.081) but effector-target relations differ strongly (AUPRC to reference 0.898 and 0.311). Blue lines: activating interactions. Red lines: inhibiting interactions.</p

    An end-to-end workflow for multiplexed image processing and analysis

    Full text link
    Multiplexed imaging enables the simultaneous spatial profiling of dozens of biological molecules in tissues at single-cell resolution. Extracting biologically relevant information, such as the spatial distribution of cell phenotypes from multiplexed tissue imaging data, involves a number of computational tasks, including image segmentation, feature extraction and spatially resolved single-cell analysis. Here, we present an end-to-end workflow for multiplexed tissue image processing and analysis that integrates previously developed computational tools to enable these tasks in a user-friendly and customizable fashion. For data quality assessment, we highlight the utility of napari-imc for interactively inspecting raw imaging data and the cytomapper R/Bioconductor package for image visualization in R. Raw data preprocessing, image segmentation and feature extraction are performed using the steinbock toolkit. We showcase two alternative approaches for segmenting cells on the basis of supervised pixel classification and pretrained deep learning models. The extracted single-cell data are then read, processed and analyzed in R. The protocol describes the use of community-established data containers, facilitating the application of R/Bioconductor packages for dimensionality reduction, single-cell visualization and phenotyping. We provide instructions for performing spatially resolved single-cell analysis, including community analysis, cellular neighborhood detection and cell-cell interaction testing using the imcRtools R/Bioconductor package. The workflow has been previously applied to imaging mass cytometry data, but can be easily adapted to other highly multiplexed imaging technologies. This protocol can be implemented by researchers with basic bioinformatics training, and the analysis of the provided dataset can be completed within 5-6 h. An extended version is available at https://bodenmillergroup.github.io/IMCDataAnalysis/

    A Single-Cell Atlas of the Tumor and Immune Ecosystem of Human Breast Cancer

    Full text link
    Breast cancer is a heterogeneous disease. Tumor cells and associated healthy cells form ecosystems that determine disease progression and response to therapy. To characterize features of breast cancer ecosystems and their associations with clinical data, we analyzed 144 human breast tumor and 50 non-tumor tissue samples using mass cytometry. The expression of 73 proteins in 26 million cells was evaluated using tumor and immune cell-centric antibody panels. Tumors displayed individuality in tumor cell composition, including phenotypic abnormalities and phenotype dominance. Relationship analyses between tumor and immune cells revealed characteristics of ecosystems related to immunosuppression and poor prognosis. High frequencies of PD-L1+ tumor-associated macrophages and exhausted T cells were found in high-grade ER+ and ER− tumors. This large-scale, single-cell atlas deepens our understanding of breast tumor ecosystems and suggests that ecosystem-based patient classification will facilitate identification of individuals for precision medicine approaches targeting the tumor and its immunoenvironment
    corecore