474 research outputs found

    Sustained-input switches for transcription factors and microRNAs are central building blocks of eukaryotic gene circuits

    Get PDF
    WaRSwap is a randomization algorithm that for the first time provides a practical network motif discovery method for large multi-layer networks, for example those that include transcription factors, microRNAs, and non-regulatory protein coding genes. The algorithm is applicable to systems with tens of thousands of genes, while accounting for critical aspects of biological networks, including self-loops, large hubs, and target rearrangements. We validate WaRSwap on a newly inferred regulatory network from Arabidopsis thaliana, and compare outcomes on published Drosophila and human networks. Specifically, sustained input switches are among the few over-represented circuits across this diverse set of eukaryotes

    Learning the Solution Operator of Boundary Value Problems using Graph Neural Networks

    Full text link
    As an alternative to classical numerical solvers for partial differential equations (PDEs) subject to boundary value constraints, there has been a surge of interest in investigating neural networks that can solve such problems efficiently. In this work, we design a general solution operator for two different time-independent PDEs using graph neural networks (GNNs) and spectral graph convolutions. We train the networks on simulated data from a finite elements solver on a variety of shapes and inhomogeneities. In contrast to previous works, we focus on the ability of the trained operator to generalize to previously unseen scenarios. Specifically, we test generalization to meshes with different shapes and superposition of solutions for a different number of inhomogeneities. We find that training on a diverse dataset with lots of variation in the finite element meshes is a key ingredient for achieving good generalization results in all cases. With this, we believe that GNNs can be used to learn solution operators that generalize over a range of properties and produce solutions much faster than a generic solver. Our dataset, which we make publicly available, can be used and extended to verify the robustness of these models under varying conditions

    Genome-wide identification and predictive modeling of tissue-specific alternative polyadenylation

    Get PDF
    MOTIVATION: Pre-mRNA cleavage and polyadenylation are essential steps for 3'-end maturation and subsequent stability and degradation of mRNAs. This process is highly controlled by cis-regulatory elements surrounding the cleavage/polyadenylation sites (polyA sites), which are frequently constrained by sequence content and position. More than 50% of human transcripts have multiple functional polyA sites, and the specific use of alternative polyA sites (APA) results in isoforms with variable 3'-untranslated regions, thus potentially affecting gene regulation. Elucidating the regulatory mechanisms underlying differential polyA preferences in multiple cell types has been hindered both by the lack of suitable data on the precise location of cleavage sites, as well as of appropriate tests for determining APAs with significant differences across multiple libraries. RESULTS: We applied a tailored paired-end RNA-seq protocol to specifically probe the position of polyA sites in three human adult tissue types. We specified a linear-effects regression model to identify tissue-specific biases indicating regulated APA; the significance of differences between tissue types was assessed by an appropriately designed permutation test. This combination allowed to identify highly specific subsets of APA events in the individual tissue types. Predictive models successfully classified constitutive polyA sites from a biologically relevant background (auROC = 99.6%), as well as tissue-specific regulated sets from each other. We found that the main cis-regulatory elements described for polyadenylation are a strong, and highly informative, hallmark for constitutive sites only. Tissue-specific regulated sites were found to contain other regulatory motifs, with the canonical polyadenylation signal being nearly absent at brain-specific polyA sites. Together, our results contribute to the understanding of the diversity of post-transcriptional gene regulation. AVAILABILITY: Raw data are deposited on SRA, accession numbers: brain SRX208132, kidney SRX208087 and liver SRX208134. Processed datasets as well as model code are published on our website: http://www.genome.duke.edu/labs/ohler/research/UTR/

    Towards Learning Self-Organized Criticality of Rydberg Atoms using Graph Neural Networks

    Full text link
    Self-Organized Criticality (SOC) is a ubiquitous dynamical phenomenon believed to be responsible for the emergence of universal scale-invariant behavior in many, seemingly unrelated systems, such as forest fires, virus spreading or atomic excitation dynamics. SOC describes the buildup of large-scale and long-range spatio-temporal correlations as a result of only local interactions and dissipation. The simulation of SOC dynamics is typically based on Monte-Carlo (MC) methods, which are however numerically expensive and do not scale beyond certain system sizes. We investigate the use of Graph Neural Networks (GNNs) as an effective surrogate model to learn the dynamics operator for a paradigmatic SOC system, inspired by an experimentally accessible physics example: driven Rydberg atoms. To this end, we generalize existing GNN simulation approaches to predict dynamics for the internal state of the node. We show that we can accurately reproduce the MC dynamics as well as generalize along the two important axes of particle number and particle density. This paves the way to model much larger systems beyond the limits of traditional MC methods. While the exact system is inspired by the dynamics of Rydberg atoms, the approach is quite general and can readily be applied to other systems

    Orthologous Transcription Factors in Bacteria Have Different Functions and Regulate Different Genes

    Get PDF
    Transcription factors (TFs) form large paralogous gene families and have complex evolutionary histories. Here, we ask whether putative orthologs of TFs, from bidirectional best BLAST hits (BBHs), are evolutionary orthologs with conserved functions. We show that BBHs of TFs from distantly related bacteria are usually not evolutionary orthologs. Furthermore, the false orthologs usually respond to different signals and regulate distinct pathways, while the few BBHs that are evolutionary orthologs do have conserved functions. To test the conservation of regulatory interactions, we analyze expression patterns. We find that regulatory relationships between TFs and their regulated genes are usually not conserved for BBHs in Escherichia coli K12 and Bacillus subtilis. Even in the much more closely related bacteria Vibrio cholerae and Shewanella oneidensis MR-1, predicting regulation from E. coli BBHs has high error rates. Using gene–regulon correlations, we identify genes whose expression pattern differs between E. coli and S. oneidensis. Using literature searches and sequence analysis, we show that these changes in expression patterns reflect changes in gene regulation, even for evolutionary orthologs. We conclude that the evolution of bacterial regulation should be analyzed with phylogenetic trees, rather than BBHs, and that bacterial regulatory networks evolve more rapidly than previously thought

    Cluster-independent marker feature identification from single-cell omics data using SEMITONES

    Get PDF
    Identification of cell identity markers is an essential step in single-cell omics data analysis. Current marker identification strategies typically rely on cluster assignments of cells. However, cluster assignment, particularly for developmental data, is nontrivial, potentially arbitrary, and commonly relies on prior knowledge. In response, we present SEMITONES, a principled method for cluster-free marker identification. We showcase and evaluate its application for marker gene and regulatory region identification from single-cell data of the human haematopoietic system. Additionally, we illustrate its application to spatial transcriptomics data and show how SEMITONES can be used for the annotation of cells given known marker genes. Using several simulated and curated data sets, we demonstrate that SEMITONES qualitatively and quantitatively outperforms existing methods for the retrieval of cell identity markers from single-cell omics data

    Global identification of functional microRNA-mRNA interactions in Drosophila

    Get PDF
    MicroRNAs (miRNAs) are key mediators of post-transcriptional gene expression silencing. So far, no comprehensive experimental annotation of functional miRNA target sites exists in Drosophila. Here, we generated a transcriptome-wide in vivo map of miRNA-mRNA interactions in Drosophila melanogaster, making use of single nucleotide resolution in Argonaute1 (AGO1) crosslinking and immunoprecipitation (CLIP) data. Absolute quantification of cellular miRNA levels presents the miRNA pool in Drosophila cell lines to be more diverse than previously reported. Benchmarking two CLIP approaches, we identify a similar predictive potential to unambiguously assign thousands of miRNA-mRNA pairs from AGO1 interaction data at unprecedented depth, achieving higher signal-to-noise ratios than with computational methods alone. Quantitative RNA-seq and sub-codon resolution ribosomal footprinting data upon AGO1 depletion enabled the determination of miRNA-mediated effects on target expression and translation. We thus provide the first comprehensive resource of miRNA target sites and their quantitative functional impact in Drosophila

    The mRNA-bound proteome of the early fly embryo

    Get PDF
    Early embryogenesis is characterized by the maternal to zygotic transition (MZT), in which maternally deposited messenger RNAs are degraded while zygotic transcription begins. Before the MZT, post-transcriptional gene regulation by RNA-binding proteins (RBPs) is the dominant force in embryo patterning. We used two mRNA interactome capture methods to identify RBPs bound to polyadenylated transcripts within the first two hours of D. melanogaster embryogenesis. We identified a high-confidence set of 476 putative RBPs and confirmed RNA-binding activities for most of 24 tested candidates. Most proteins in the interactome are known RBPs or harbor canonical RBP features, but 99 exhibited previously uncharacterized RNA-binding activity. mRNA-bound RBPs and TFs exhibit distinct expression dynamics, in which the newly identified RBPs dominate the first two hours of embryonic development. Integrating our resource with in situ hybridization data from existing databases showed that mRNAs encoding RBPs are enriched in posterior regions of the early embryo, suggesting their general importance in posterior patterning and germ cell maturation

    Electric fields and valence band offsets at strained [111] heterojunctions

    Full text link
    [111] ordered common atom strained layer superlattices (in particular the common anion GaSb/InSb system and the common cation InAs/InSb system) are investigated using the ab initio full potential linearized augmented plane wave (FLAPW) method. We have focused our attention on the potential line-up at the two sides of the homopolar isovalent heterojunctions considered, and in particular on its dependence on the strain conditions and on the strain induced electric fields. We propose a procedure to locate the interface plane where the band alignment could be evaluated; furthermore, we suggest that the polarization charges, due to piezoelectric effects, are approximately confined to a narrow region close to the interface and do not affect the potential discontinuity. We find that the interface contribution to the valence band offset is substantially unaffected by strain conditions, whereas the total band line-up is highly tunable, as a function of the strain conditions. Finally, we compare our results with those obtained for [001] heterojunctions.Comment: 18 pages, Latex-file, to appear in Phys.Rev.
    corecore