15 research outputs found

    V-SVA: an R Shiny application for detecting and annotating hidden sources of variation in single-cell RNA-seq data.

    Get PDF
    SUMMARY: Single-cell RNA-sequencing (scRNA-seq) technology enables studying gene expression programs from individual cells. However, these data are subject to diverse sources of variation, including \u27unwanted\u27 variation that needs to be removed in downstream analyses (e.g. batch effects) and \u27wanted\u27 or biological sources of variation (e.g. variation associated with a cell type) that needs to be precisely described. Surrogate variable analysis (SVA)-based algorithms, are commonly used for batch correction and more recently for studying \u27wanted\u27 variation in scRNA-seq data. However, interpreting whether these variables are biologically meaningful or stemming from technical reasons remains a challenge. To facilitate the interpretation of surrogate variables detected by algorithms including IA-SVA, SVA or ZINB-WaVE, we developed an R Shiny application [Visual Surrogate Variable Analysis (V-SVA)] that provides a web-browser interface for the identification and annotation of hidden sources of variation in scRNA-seq data. This interactive framework includes tools for discovery of genes associated with detected sources of variation, gene annotation using publicly available databases and gene sets, and data visualization using dimension reduction methods. AVAILABILITY AND IMPLEMENTATION: The V-SVA Shiny application is publicly hosted at https://vsva.jax.org/ and the source code is freely available at https://github.com/nlawlor/V-SVA. CONTACT: [email protected] or [email protected]. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    BiFET: sequencing Bias-free transcription factor Footprint Enrichment Test.

    Get PDF
    Transcription factor (TF) footprinting uncovers putative protein-DNA binding via combined analyses of chromatin accessibility patterns and their underlying TF sequence motifs. TF footprints are frequently used to identify TFs that regulate activities of cell/condition-specific genomic regions (target loci) in comparison to control regions (background loci) using standard enrichment tests. However, there is a strong association between the chromatin accessibility level and the GC content of a locus and the number and types of TF footprints that can be detected at this site. Traditional enrichment tests (e.g. hypergeometric) do not account for this bias and inflate false positive associations. Therefore, we developed a novel post-processing method, Bias-free Footprint Enrichment Test (BiFET), that corrects for the biases arising from the differences in chromatin accessibility levels and GC contents between target and background loci in footprint enrichment analyses. We applied BiFET on TF footprint calls obtained from EndoC-βH1 ATAC-seq samples using three different algorithms (CENTIPEDE, HINT-BC and PIQ) and showed BiFET\u27s ability to increase power and reduce false positive rate when compared to hypergeometric test. Furthermore, we used BiFET to study TF footprints from human PBMC and pancreatic islet ATAC-seq samples to show its utility to identify putative TFs associated with cell-type-specific loci

    Multiomic Profiling Identifies cis-Regulatory Networks Underlying Human Pancreatic β Cell Identity and Function.

    Get PDF
    EndoC-βH1 is emerging as a critical human β cell model to study the genetic and environmental etiologies of β cell (dys)function and diabetes. Comprehensive knowledge of its molecular landscape is lacking, yet required, for effective use of this model. Here, we report chromosomal (spectral karyotyping), genetic (genotyping), epigenomic (ChIP-seq and ATAC-seq), chromatin interaction (Hi-C and Pol2 ChIA-PET), and transcriptomic (RNA-seq and miRNA-seq) maps of EndoC-βH1. Analyses of these maps define known (e.g., PDX1 and ISL1) and putative (e.g., PCSK1 and mir-375) β cell-specific transcriptional cis-regulatory networks and identify allelic effects on cis-regulatory element use. Importantly, comparison with maps generated in primary human islets and/or β cells indicates preservation of chromatin looping but also highlights chromosomal aberrations and fetal genomic signatures in EndoC-βH1. Together, these maps, and a web application we created for their exploration, provide important tools for the design of experiments to probe and manipulate the genetic programs governing β cell identity and (dys)function in diabetes

    A General Framework for Inferring the Developmental Causes of Modularity of Morphological Variation with Applications to the Craniomandibular Complex in Morphological Variation with Applications to the Craniomandibular Complex in Rodents.

    Full text link
    Modularity is a principle of construction whereby individual units are internally cohesive and relatively autonomous from other such units. Modularity thus confers a degree of evolutionary autonomy to the sets of traits integrating a module, a feature hypothesized to enhance evolvability by allowing selection to optimize individual parts without interfering with others. Detecting modularity in morphological traits requires analyzing the structure of covariation because traits integrated by development into modules are expected to show stronger mutual covariation. However, unambiguous patterns of modularity are rare. That is because the developmental processes underlying most phenotypic traits share regulatory elements and/or have spatially overlapping effects. Pervasive interactions can produce the appearance of statistical integration among biologically modular traits. Herein, a statistical framework is provided that confronts these limitations on methods for inferring modularity from morphological data. The theoretical basis of this new method states that modules are subsets of dimensions embedded in phenotypic space, an approach that differs from previous ones by not defining modules as anatomical parts but rather as aspects of the variation of these parts. This abstraction allows traits to be integrated into more than one module and also suggests a natural approach for testing a priori hypotheses of modularity by fitting competing hypotheses to observed covariance matrices, searching for the best-supported causal explanations. A comprehensive method is developed and tested using simulated data, then used to address a major outstanding issue in evolutionary biology: whether the developmental processes that structure variation within populations bias the direction of long-term divergence. This hypothesis is tested by fitting multiple developmental models to both intraspecific and interspecific craniomandibular data obtained from a clade of ecologically diverse rodents. Results reveal a remarkable congruence among patterns within and between species, and they also suggest that there are different mechanisms by which modular variation arises within different parts of the skull, i.e., cranium and mandible. That these structures have different dynamics both within and among species suggests that whether intraspecific variation constrains the direction of divergence may depend on mechanisms structuring modularity within populations.Ph.D.BiologyUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/63699/1/emarquez_1.pd

    Data from: Linkage disequilibrium and inversion-typing of the Drosophila melanogaster Genome Reference Panel

    No full text
    We calculated the linkage disequilibrium between all pairs of variants in the Drosophila Genome Reference Panel with minor allele count ≥5. We used r2 ≥ 0.5 as the cutoff for a highly correlated SNP. We make available the list of all highly correlated SNPs for use in association studies. Seventy-six percent of variant SNPs are highly correlated with at least one other SNP, and the mean number of highly correlated SNPs per variant over the whole genome is 83.9. Disequilibrium between distant SNPs is also common when minor allele frequency (MAF) is low: 37% of SNPs with MAF < 0.1 are highly correlated with SNPs more than 100 kb distant. Although SNPs within regions with polymorphic inversions are highly correlated with somewhat larger numbers of SNPs, and these correlated SNPs are on average farther away, the probability that a SNP in such regions is highly correlated with at least one other SNP is very similar to SNPs outside inversions. Previous karyotyping of the DGRP lines has been inconsistent, and we used LD and genotype to investigate these discrepancies. When previous studies agreed on inversion karyotype, our analysis was almost perfectly concordant with those assignments. In discordant cases, and for inversion heterozygotes, our results suggest errors in two previous analyses or discordance between genotype and karyotype. Heterozygosities of chromosome arms are, in many cases, surprisingly highly correlated, suggesting strong epsistatic selection during the inbreeding and maintenance of the DGRP lines

    HouleMarquezSASfiles

    No full text
    This zip archive contains: Gcorrlimits.SAS. A SAS file that calculates limits on which range of minor allele frequencies can be correlated at r^2>=0.5. CalcHighCorr.SAS A SAS file that calculates the genotypic r^2 between sites that could be correlated at a particular r^2 value. gcorrexampledata.sas7bdat - as SAS format data set with sample data for CalcHighCorr.SAS

    HouleMarquezF3_PCscores

    No full text
    Supplemental File S3. Inversion-typing of DGRP lines for the three common inversions In(2L)t, In(2R)NS, and In(3R)Mo, and heterozygosity of chromosome regions. For each inversion: the column beginning PC1sc is the score on Principal component 1 for variants between inversion breakpoints and in disequilibrium with large number of distant sites; (see text for explanation) . Pred is the inversion type predicted based on the PC1 scores (1=homozygous inversion, 0.5 Standard/inversion heterozygote, 0= homozygous Standard). CDL is the combined assignments of inversion-type from Langley et al. 2012 and Corbett-Detig & Hartl 2012 (1=homozygous inversion, 0= not a homozygote for the inversion, blank=not scored). Huang indicates the assignements from Huang et al. 2014 (0= homozygous Standard, 0.5= homozygous ST/INV, 1=homozygous inversion), blank= not scored), Heterozygosity columns are the average heterozygosity for chromsome segments defined by inversion breakpoints, plus the X and arm 3L. Heterozygosities were calculated from sites with no more than 5 missing calls in the 205 lines, and omit regions within 10kb of inversion breakpoints

    LD205results.zip

    No full text
    List of all SNP pairs in Freeze 2 of the Drosophila Genome Reference Panel with linkage disequilbirium r^2>0.5. Minor allele count of the focal SNP must be >=5

    Chromatin interaction networks revealed unique connectivity patterns of broad H3K4me3 domains and super enhancers in 3D chromatin.

    No full text
    Broad domain promoters and super enhancers are regulatory elements that govern cell-specific functions and harbor disease-associated sequence variants. These elements are characterized by distinct epigenomic profiles, such as expanded deposition of histone marks H3K27ac for super enhancers and H3K4me3 for broad domains, however little is known about how they interact with each other and the rest of the genome in three-dimensional chromatin space. Using network theory methods, we studied chromatin interactions between broad domains and super enhancers in three ENCODE cell lines (K562, MCF7, GM12878) obtained via ChIA-PET, Hi-C, and Hi-CHIP assays. In these networks, broad domains and super enhancers interact more frequently with each other compared to their typical counterparts. Network measures and graphlets revealed distinct connectivity patterns associated with these regulatory elements that are robust across cell types and alternative assays. Machine learning models showed that these connectivity patterns could effectively discriminate broad domains from typical promoters and super enhancers from typical enhancers. Finally, targets of broad domains in these networks were enriched in disease-causing SNPs of cognate cell types. Taken together these results suggest a robust and unique organization of the chromatin around broad domains and super enhancers: loci critical for pathologies and cell-specific functions. Sci Rep 2017 Oct 31; 7(1):1446

    Leukaemia cell of origin identified by chromatin landscape of bulk tumour cells.

    No full text
    The precise identity of a tumour\u27s cell of origin can influence disease prognosis and outcome. Methods to reliably define tumour cell of origin from primary, bulk tumour cell samples has been a challenge. Here we use a well-defined model of MLL-rearranged acute myeloid leukaemia (AML) to demonstrate that transforming haematopoietic stem cells (HSCs) and multipotent progenitors results in more aggressive AML than transforming committed progenitor cells. Transcriptome profiling reveals a gene expression signature broadly distinguishing stem cell-derived versus progenitor cell-derived AML, including genes involved in immune escape, extravasation and small GTPase signal transduction. However, whole-genome profiling of open chromatin reveals precise and robust biomarkers reflecting each cell of origin tested, from bulk AML tumour cell sampling. We find that bulk AML tumour cells exhibit distinct open chromatin loci that reflect the transformed cell of origin and suggest that open chromatin patterns may be leveraged as prognostic signatures in human AML. Nat Commun 2016 Jul 11; 7:1216
    corecore