13 research outputs found
escheR: unified multi-dimensional visualizations with Gestalt principles
SUMMARY: The creation of effective visualizations is a fundamental component of data analysis. In biomedical research, new challenges are emerging to visualize multi-dimensional data in a 2D space, but current data visualization tools have limited capabilities. To address this problem, we leverage Gestalt principles to improve the design and interpretability of multi-dimensional data in 2D data visualizations, layering aesthetics to display multiple variables. The proposed visualization can be applied to spatially-resolved transcriptomics data, but also broadly to data visualized in 2D space, such as embedding visualizations. We provide an open source R package escheR, which is built off of the state-of-the-art ggplot2 visualization framework and can be seamlessly integrated into genomics toolboxes and workflows. AVAILABILITY AND IMPLEMENTATION: The open source R package escheR is freely available on Bioconductor (https://bioconductor.org/packages/escheR)
A data-driven single-cell and spatial transcriptomic map of the human prefrontal cortex
The molecular organization of the human neocortex historically has been studied in the context of its histological layers. However, emerging spatial transcriptomic technologies have enabled unbiased identification of transcriptionally defined spatial domains that move beyond classic cytoarchitecture. We used the Visium spatial gene expression platform to generate a data-driven molecular neuroanatomical atlas across the anterior-posterior axis of the human dorsolateral prefrontal cortex. Integration with paired single-nucleus RNA-sequencing data revealed distinct cell type compositions and cell-cell interactions across spatial domains. Using PsychENCODE and publicly available data, we mapped the enrichment of cell types and genes associated with neuropsychiatric disorders to discrete spatial domains
LieberInstitute/deconvo_review-paper: LieberInstitute/deconvo_review-paper: v0_preprint
<p>updated release to reflect the addition of MIT license.</p>
Somatic evolutionary timings of driver mutations
Abstract Background A unified analysis of DNA sequences from hundreds of tumors concluded that the driver mutations primarily occur in the earliest stages of cancer formation, with relatively few driver mutation events detected in the late-arising subclones. However, emerging evidence from the sequencing of multiple tumors and tumor regions per individual suggests that late-arising subclones with additional driver mutations are underestimated in single-sample analyses. Methods To test whether driver mutations generally map to early tumor development, we examined multi-regional tumor sequencing data from 101 individuals reported in 11 published studies. Following previous studies, we annotated mutations as early-arising when all tumors/regions had those mutations (ubiquitous). We then inferred the fraction of mutations occurring early and compared it with late-arising mutations that were found in only single tumors/regions. Results While a large fraction of driver mutations in tumors occurred relatively early in cancers, later driver mutations occurred at least as frequently as the early drivers in a substantial number of patients. This result was robust to many different approaches to annotate driver mutations. The relative frequency of early and late driver mutations varied among patients of the same cancer type and in different cancer types. We found that previous reports of the preponderance of early driver mutations were primarily informed by analysis of single tumor variant allele profiles, with which it is challenging to clearly distinguish between early and late drivers. Conclusions The origin and preponderance of new driver mutations are not limited to early stages of tumor evolution, with different tumors and regions showing distinct driver mutations and, consequently, distinct characteristics. Therefore, tumors with extensive intratumor heterogeneity appear to have many newly acquired drivers
LieberInstitute/Habenula_Pilot: v0_preprint
<p>Pre-print initial release to create a Zenodo DOI + badge.</p>
A new method for inferring timetrees from temporally sampled molecular sequences.
Pathogen timetrees are phylogenies scaled to time. They reveal the temporal history of a pathogen spread through the populations as captured in the evolutionary history of strains. These timetrees are inferred by using molecular sequences of pathogenic strains sampled at different times. That is, temporally sampled sequences enable the inference of sequence divergence times. Here, we present a new approach (RelTime with Dated Tips [RTDT]) to estimating pathogen timetrees based on a relative rate framework underlying the RelTime approach that is algebraic in nature and distinct from all other current methods. RTDT does not require many of the priors demanded by Bayesian approaches, and it has light computing requirements. In analyses of an extensive collection of computer-simulated datasets, we found the accuracy of RTDT time estimates and the coverage probabilities of their confidence intervals (CIs) to be excellent. In analyses of empirical datasets, RTDT produced dates that were similar to those reported in the literature. In comparative benchmarking with Bayesian and non-Bayesian methods (LSD, TreeTime, and treedater), we found that no method performed the best in every scenario. So, we provide a brief guideline for users to select the most appropriate method in empirical data analysis. RTDT is implemented for use via a graphical user interface and in high-throughput settings in the newest release of cross-platform MEGA X software, freely available from http://www.megasoftware.net
Challenges and opportunities to computationally deconvolve heterogeneous tissue with varying cell sizes using single cell RNA-sequencing datasets
Deconvolution of cell mixtures in "bulk" transcriptomic samples from
homogenate human tissue is important for understanding the pathologies of
diseases. However, several experimental and computational challenges remain in
developing and implementing transcriptomics-based deconvolution approaches,
especially those using a single cell/nuclei RNA-seq reference atlas, which are
becoming rapidly available across many tissues. Notably, deconvolution
algorithms are frequently developed using samples from tissues with similar
cell sizes. However, brain tissue or immune cell populations have cell types
with substantially different cell sizes, total mRNA expression, and
transcriptional activity. When existing deconvolution approaches are applied to
these tissues, these systematic differences in cell sizes and transcriptomic
activity confound accurate cell proportion estimates and instead may quantify
total mRNA content. Furthermore, there is a lack of standard reference atlases
and computational approaches to facilitate integrative analyses, including not
only bulk and single cell/nuclei RNA-seq data, but also new data modalities
from spatial -omic or imaging approaches. New multi-assay datasets need to be
collected with orthogonal data types generated from the same tissue block and
the same individual, to serve as a "gold standard" for evaluating new and
existing deconvolution methods. Below, we discuss these key challenges and how
they can be addressed with the acquisition of new datasets and approaches to
analysis.Comment: 28 pages; 4 figure
Recommended from our members
Systems biology dissection of PTSD and MDD across brain regions, cell types, and blood
The molecular pathology of stress-related disorders remains elusive. Our brain multiregion, multiomic study of posttraumatic stress disorder (PTSD) and major depressive disorder (MDD) included the central nucleus of the amygdala, hippocampal dentate gyrus, and medial prefrontal cortex (mPFC). Genes and exons within the mPFC carried most disease signals replicated across two independent cohorts. Pathways pointed to immune function, neuronal and synaptic regulation, and stress hormones. Multiomic factor and gene network analyses provided the underlying genomic structure. Single nucleus RNA sequencing in dorsolateral PFC revealed dysregulated (stress-related) signals in neuronal and non-neuronal cell types. Analyses of brain-blood intersections in >50,000 UK Biobank participants were conducted along with fine-mapping of the results of PTSD and MDD genome-wide association studies to distinguish risk from disease processes. Our data suggest shared and distinct molecular pathology in both disorders and propose potential therapeutic targets and biomarkers