162 research outputs found

    AuPairWise: A Method to Estimate RNA-Seq Replicability through Co-expression

    Get PDF
    In addition to detecting novel transcripts and higher dynamic range, a principal claim for RNA-sequencing has been greater replicability, typically measured in sample-sample correlations of gene expression levels. Through a re-analysis of ENCODE data, we show that replicability of transcript abundances will provide misleading estimates of the replicability of conditional variation in transcript abundances (i.e., most expression experiments). Heuristics which implicitly address this problem have emerged in quality control measures to obtain 'good' differential expression results. However, these methods involve strict filters such as discarding low expressing genes or using technical replicates to remove discordant transcripts, and are costly or simply ad hoc. As an alternative, we model gene-level replicability of differential activity using co-expressing genes. We find that sets of housekeeping interactions provide a sensitive means of estimating the replicability of expression changes, where the co-expressing pair can be regarded as pseudo-replicates of one another. We model the effects of noise that perturbs a gene's expression within its usual distribution of values and show that perturbing expression by only 5% within that range is readily detectable (AUROC~0.73). We have made our method available as a set of easily implemented R scripts

    Using predictive specificity to determine when gene set analysis is biologically meaningful

    Get PDF
    Gene set analysis, which translates gene lists into enriched functions, is among the most common bioinformatic methods. Yet few would advocate taking the results at face value. Not only is there no agreement on the algorithms themselves, there is no agreement on how to benchmark them. In this paper, we evaluate the robustness and uniqueness of enrichment results as a means of assessing methods even where correctness is unknown. We show that heavily annotated ('multifunctional') genes are likely to appear in genomics study results and drive the generation of biologically non-specific enrichment results as well as highly fragile significances. By providing a means of determining where enrichment analyses report non-specific and non-robust findings, we are able to assess where we can be confident in their use. We find significant progress in recent bias correction methods for enrichment and provide our own software implementation. Our approach can be readily adapted to any pre-existing package

    Pan-human consensus genome significantly improves the accuracy of RNA-seq analyses

    Get PDF
    The Human Reference Genome serves as the foundation for modern genomic analyses. However, in its present form, it does not adequately represent the vast genetic diversity of the human population. In this study, we explored the consensus genome as a potential successor of the current reference genome and assessed its effect on the accuracy of RNA-seq read alignment. To find the best haploid genome representation, we constructed consensus genomes at the pan-human, superpopulation, and population levels, using variant information from The 1000 Genomes Project Consortium. Using personal haploid genomes as the ground truth, we compared mapping errors for real RNA-seq reads aligned to the consensus genomes versus the reference genome. For reads overlapping homozygous variants, we found that the mapping error decreased by a factor of approximately two to three when the reference was replaced with the pan-human consensus genome. We also found that using more population-specific consensuses resulted in little to no increase over using the pan-human consensus, suggesting a limit in the utility of incorporating a more specific genomic variation. Replacing the reference with consensus genomes impacts functional analyses, such as differential expressions of isoforms, genes, and splice junctions

    Variability of cross-tissue X-chromosome inactivation characterizes timing of human embryonic lineage specification events

    Get PDF
    X-chromosome inactivation (XCI) is a random, permanent, and developmentally early epigenetic event that occurs during mammalian embryogenesis. We harness these features to investigate characteristics of early lineage specification events during human development. We initially assess the consistency of X-inactivation and establish a robust set of XCI-escape genes. By analyzing variance in XCI ratios across tissues and individuals, we find that XCI is shared across all tissues, suggesting that XCI is completed in the epiblast (in at least 6–16 cells) prior to specification of the germ layers. Additionally, we exploit tissue-specific variability to characterize the number of cells present during tissue-lineage commitment, ranging from approximately 20 cells in liver and whole blood tissues to 80 cells in brain tissues. By investigating the variability of XCI ratios using adult tissue, we characterize embryonic features of human XCI and lineage specification that are otherwise difficult to ascertain experimentally

    CoCoCoNet: conserved and comparative co-expression across a diverse set of species

    Get PDF
    Co-expression analysis has provided insight into gene function in organisms from Arabidopsis to zebrafish. Comparison across species has the potential to enrich these results, for example by prioritizing among candidate human disease genes based on their network properties or by finding alternative model systems where their co-expression is conserved. Here, we present CoCoCoNet as a tool for identifying conserved gene modules and comparing co-expression networks. CoCoCoNet is a resource for both data and methods, providing gold standard networks and sophisticated tools for on-the-fly comparative analyses across 14 species. We show how CoCoCoNet can be used in two use cases. In the first, we demonstrate deep conservation of a nucleolus gene module across very divergent organisms, and in the second, we show how the heterogeneity of autism mechanisms in humans can be broken down by functional groups and translated to model organisms. CoCoCoNet is free to use and available to all at https://milton.cshl.edu/CoCoCoNet, with data and R scripts available at ftp://milton.cshl.edu/data

    Ligand Similarity Complements Sequence, Physical Interaction, and Co-Expression for Gene Function Prediction

    Get PDF
    The expansion of protein-ligand annotation databases has enabled large-scale networking of proteins by ligand similarity. These ligand-based protein networks, which implicitly predict the ability of neighboring proteins to bind related ligands, may complement biologically-oriented gene networks, which are used to predict functional or disease relevance. To quantify the degree to which such ligand-based protein associations might complement functional genomic associations, including sequence similarity, physical protein-protein interactions, co-expression, and disease gene annotations, we calculated a network based on the Similarity Ensemble Approach (SEA: sea.docking.org), where protein neighbors reflect the similarity of their ligands. We also measured the similarity with functional genomic networks over a common set of 1,131 genes, and found that the networks had only small overlaps, which were significant only due to the large scale of the data. Consistent with the view that the networks contain different information, combining them substantially improved Molecular Function prediction within GO (from AUROC~0.63-0.75 for the individual data modalities to AUROC~0.8 in the aggregate). We investigated the boost in guilt-by-association gene function prediction when the networks are combined and describe underlying properties that can be further exploited

    Hubble space telescope STIS spectroscopy of the peculiar nova-like variables BK Lyn, V751 Cygni, and V380 Oph

    Get PDF
    We obtained Hubble STIS spectra of three nova-like variables: V751 Cygni, V380 Oph, and—the only confirmed nova-like variable known to be below the period gap—BK Lyn. In all three systems, the spectra were taken during high optical brightness state, and a luminous accretion disk dominates their far-ultraviolet (FUV) light. We assessed a lower limit of the distances by applying the infrared photometric method of Knigge. Within the limitations imposed by the poorly known system parameters (such as the inclination, white dwarf mass, and the applicability of steady state accretion disks) we obtained satisfactory fits to BK Lyn using optically thick accretion disk models with an accretion rate of for a white dwarf mass of Mwd = 1.2M and for Mwd = 0.4M. However, for the VY Scl-type nova-like variable V751 Cygni and for the SW Sex star V380 Oph, we are unable to obtain satisfactory synthetic spectral fits to the high state FUV spectra using optically thick steady state accretion disk models. The lack of FUV spectra information down to the Lyman limit hinders the extraction of information about the accreting white dwarf during the high states of these nova-like systems

    Exploiting single-cell expression to characterize co-expression replicability

    Get PDF
    BACKGROUND: Co-expression networks have been a useful tool for functional genomics, providing important clues about the cellular and biochemical mechanisms that are active in normal and disease processes. However, co-expression analysis is often treated as a black box with results being hard to trace to their basis in the data. Here, we use both published and novel single-cell RNA sequencing (RNA-seq) data to understand fundamental drivers of gene-gene connectivity and replicability in co-expression networks. RESULTS: We perform the first major analysis of single-cell co-expression, sampling from 31 individual studies. Using neighbor voting in cross-validation, we find that single-cell network connectivity is less likely to overlap with known functions than co-expression derived from bulk data, with functional variation within cell types strongly resembling that also occurring across cell types. To identify features and analysis practices that contribute to this connectivity, we perform our own single-cell RNA-seq experiment of 126 cortical interneurons in an experimental design targeted to co-expression. By assessing network replicability, semantic similarity and overall functional connectivity, we identify technical factors influencing co-expression and suggest how they can be controlled for. Many of the technical effects we identify are expression-level dependent, making expression level itself highly predictive of network topology. We show this occurs generally through re-analysis of the BrainSpan RNA-seq data. CONCLUSIONS: Technical properties of single-cell RNA-seq data create confounds in co-expression networks which can be identified and explicitly controlled for in any supervised analysis. This is useful both in improving co-expression performance and in characterizing single-cell data in generally applicable terms, permitting cross-laboratory comparison within a common framework

    Creep stability of the proposed AIDA mission target 65803 Didymos: I. Discrete cohesionless granular physics model

    Full text link
    As the target of the proposed Asteroid Impact & Deflection Assessment (AIDA) mission, the near-Earth binary asteroid 65803 Didymos represents a special class of binary asteroids, those whose primaries are at risk of rotational disruption. To gain a better understanding of these binary systems and to support the AIDA mission, this paper investigates the creep stability of the Didymos primary by representing it as a cohesionless self-gravitating granular aggregate subject to rotational acceleration. To achieve this goal, a soft-sphere discrete element model (SSDEM) capable of simulating granular systems in quasi-static states is implemented and a quasi-static spin-up procedure is carried out. We devise three critical spin limits for the simulated aggregates to indicate their critical states triggered by reshaping and surface shedding, internal structural deformation, and shear failure, respectively. The failure condition and mode, and shear strength of an aggregate can all be inferred from the three critical spin limits. The effects of arrangement and size distribution of constituent particles, bulk density, spin-up path, and interparticle friction are numerically explored. The results show that the shear strength of a spinning self-gravitating aggregate depends strongly on both its internal configuration and material parameters, while its failure mode and mechanism are mainly affected by its internal configuration. Additionally, this study provides some constraints on the possible physical properties of the Didymos primary based on observational data and proposes a plausible formation mechanism for this binary system. With a bulk density consistent with observational uncertainty and close to the maximum density allowed for the asteroid, the Didymos primary in certain configurations can remain geo-statically stable without including cohesion.Comment: 66 pages, 24 figures, submitted to Icarus on 25/Aug/201

    The fractured landscape of RNA-seq alignment: the default in our STARs

    Get PDF
    Many tools are available for RNA-seq alignment and expression quantification, with comparative value being hard to establish. Benchmarking assessments often highlight methods' good performance, but are focused on either model data or fail to explain variation in performance. This leaves us to ask, what is the most meaningful way to assess different alignment choices? And importantly, where is there room for progress? In this work, we explore the answers to these two questions by performing an exhaustive assessment of the STAR aligner. We assess STAR's performance across a range of alignment parameters using common metrics, and then on biologically focused tasks. We find technical metrics such as fraction mapping or expression profile correlation to be uninformative, capturing properties unlikely to have any role in biological discovery. Surprisingly, we find that changes in alignment parameters within a wide range have little impact on both technical and biological performance. Yet, when performance finally does break, it happens in difficult regions, such as X-Y paralogs and MHC genes. We believe improved reporting by developers will help establish where results are likely to be robust or fragile, providing a better baseline to establish where methodological progress can still occur
    • …
    corecore