474 research outputs found
Sustained-input switches for transcription factors and microRNAs are central building blocks of eukaryotic gene circuits
WaRSwap is a randomization algorithm that for the first time provides a practical network motif discovery method for large multi-layer networks, for example those that include transcription factors, microRNAs, and non-regulatory protein coding genes. The algorithm is applicable to systems with tens of thousands of genes, while accounting for critical aspects of biological networks, including self-loops, large hubs, and target rearrangements. We validate WaRSwap on a newly inferred regulatory network from Arabidopsis thaliana, and compare outcomes on published Drosophila and human networks. Specifically, sustained input switches are among the few over-represented circuits across this diverse set of eukaryotes
Learning the Solution Operator of Boundary Value Problems using Graph Neural Networks
As an alternative to classical numerical solvers for partial differential
equations (PDEs) subject to boundary value constraints, there has been a surge
of interest in investigating neural networks that can solve such problems
efficiently. In this work, we design a general solution operator for two
different time-independent PDEs using graph neural networks (GNNs) and spectral
graph convolutions. We train the networks on simulated data from a finite
elements solver on a variety of shapes and inhomogeneities. In contrast to
previous works, we focus on the ability of the trained operator to generalize
to previously unseen scenarios. Specifically, we test generalization to meshes
with different shapes and superposition of solutions for a different number of
inhomogeneities. We find that training on a diverse dataset with lots of
variation in the finite element meshes is a key ingredient for achieving good
generalization results in all cases. With this, we believe that GNNs can be
used to learn solution operators that generalize over a range of properties and
produce solutions much faster than a generic solver. Our dataset, which we make
publicly available, can be used and extended to verify the robustness of these
models under varying conditions
Genome-wide identification and predictive modeling of tissue-specific alternative polyadenylation
MOTIVATION: Pre-mRNA cleavage and polyadenylation are essential steps for 3'-end maturation and subsequent stability and degradation of mRNAs. This process is highly controlled by cis-regulatory elements surrounding the cleavage/polyadenylation sites (polyA sites), which are frequently constrained by sequence content and position. More than 50% of human transcripts have multiple functional polyA sites, and the specific use of alternative polyA sites (APA) results in isoforms with variable 3'-untranslated regions, thus potentially affecting gene regulation. Elucidating the regulatory mechanisms underlying differential polyA preferences in multiple cell types has been hindered both by the lack of suitable data on the precise location of cleavage sites, as well as of appropriate tests for determining APAs with significant differences across multiple libraries. RESULTS: We applied a tailored paired-end RNA-seq protocol to specifically probe the position of polyA sites in three human adult tissue types. We specified a linear-effects regression model to identify tissue-specific biases indicating regulated APA; the significance of differences between tissue types was assessed by an appropriately designed permutation test. This combination allowed to identify highly specific subsets of APA events in the individual tissue types. Predictive models successfully classified constitutive polyA sites from a biologically relevant background (auROC = 99.6%), as well as tissue-specific regulated sets from each other. We found that the main cis-regulatory elements described for polyadenylation are a strong, and highly informative, hallmark for constitutive sites only. Tissue-specific regulated sites were found to contain other regulatory motifs, with the canonical polyadenylation signal being nearly absent at brain-specific polyA sites. Together, our results contribute to the understanding of the diversity of post-transcriptional gene regulation. AVAILABILITY: Raw data are deposited on SRA, accession numbers: brain SRX208132, kidney SRX208087 and liver SRX208134. Processed datasets as well as model code are published on our website: http://www.genome.duke.edu/labs/ohler/research/UTR/
Towards Learning Self-Organized Criticality of Rydberg Atoms using Graph Neural Networks
Self-Organized Criticality (SOC) is a ubiquitous dynamical phenomenon
believed to be responsible for the emergence of universal scale-invariant
behavior in many, seemingly unrelated systems, such as forest fires, virus
spreading or atomic excitation dynamics. SOC describes the buildup of
large-scale and long-range spatio-temporal correlations as a result of only
local interactions and dissipation. The simulation of SOC dynamics is typically
based on Monte-Carlo (MC) methods, which are however numerically expensive and
do not scale beyond certain system sizes. We investigate the use of Graph
Neural Networks (GNNs) as an effective surrogate model to learn the dynamics
operator for a paradigmatic SOC system, inspired by an experimentally
accessible physics example: driven Rydberg atoms. To this end, we generalize
existing GNN simulation approaches to predict dynamics for the internal state
of the node. We show that we can accurately reproduce the MC dynamics as well
as generalize along the two important axes of particle number and particle
density. This paves the way to model much larger systems beyond the limits of
traditional MC methods. While the exact system is inspired by the dynamics of
Rydberg atoms, the approach is quite general and can readily be applied to
other systems
Orthologous Transcription Factors in Bacteria Have Different Functions and Regulate Different Genes
Transcription factors (TFs) form large paralogous gene families and have complex evolutionary histories. Here, we ask whether putative orthologs of TFs, from bidirectional best BLAST hits (BBHs), are evolutionary orthologs with conserved functions. We show that BBHs of TFs from distantly related bacteria are usually not evolutionary orthologs. Furthermore, the false orthologs usually respond to different signals and regulate distinct pathways, while the few BBHs that are evolutionary orthologs do have conserved functions. To test the conservation of regulatory interactions, we analyze expression patterns. We find that regulatory relationships between TFs and their regulated genes are usually not conserved for BBHs in Escherichia coli K12 and Bacillus subtilis. Even in the much more closely related bacteria Vibrio cholerae and Shewanella oneidensis MR-1, predicting regulation from E. coli BBHs has high error rates. Using gene–regulon correlations, we identify genes whose expression pattern differs between E. coli and S. oneidensis. Using literature searches and sequence analysis, we show that these changes in expression patterns reflect changes in gene regulation, even for evolutionary orthologs. We conclude that the evolution of bacterial regulation should be analyzed with phylogenetic trees, rather than BBHs, and that bacterial regulatory networks evolve more rapidly than previously thought
Cluster-independent marker feature identification from single-cell omics data using SEMITONES
Identification of cell identity markers is an essential step in single-cell omics data analysis. Current marker identification strategies typically rely on cluster assignments of cells. However, cluster assignment, particularly for developmental data, is nontrivial, potentially arbitrary, and commonly relies on prior knowledge. In response, we present SEMITONES, a principled method for cluster-free marker identification. We showcase and evaluate its application for marker gene and regulatory region identification from single-cell data of the human haematopoietic system. Additionally, we illustrate its application to spatial transcriptomics data and show how SEMITONES can be used for the annotation of cells given known marker genes. Using several simulated and curated data sets, we demonstrate that SEMITONES qualitatively and quantitatively outperforms existing methods for the retrieval of cell identity markers from single-cell omics data
Global identification of functional microRNA-mRNA interactions in Drosophila
MicroRNAs (miRNAs) are key mediators of post-transcriptional gene expression silencing. So far, no comprehensive experimental annotation of functional miRNA target sites exists in Drosophila. Here, we generated a transcriptome-wide in vivo map of miRNA-mRNA interactions in Drosophila melanogaster, making use of single nucleotide resolution in Argonaute1 (AGO1) crosslinking and immunoprecipitation (CLIP) data. Absolute quantification of cellular miRNA levels presents the miRNA pool in Drosophila cell lines to be more diverse than previously reported. Benchmarking two CLIP approaches, we identify a similar predictive potential to unambiguously assign thousands of miRNA-mRNA pairs from AGO1 interaction data at unprecedented depth, achieving higher signal-to-noise ratios than with computational methods alone. Quantitative RNA-seq and sub-codon resolution ribosomal footprinting data upon AGO1 depletion enabled the determination of miRNA-mediated effects on target expression and translation. We thus provide the first comprehensive resource of miRNA target sites and their quantitative functional impact in Drosophila
The mRNA-bound proteome of the early fly embryo
Early embryogenesis is characterized by the maternal to zygotic transition (MZT), in which maternally deposited messenger RNAs are degraded while zygotic transcription begins. Before the MZT, post-transcriptional gene regulation by RNA-binding proteins (RBPs) is the dominant force in embryo patterning. We used two mRNA interactome capture methods to identify RBPs bound to polyadenylated transcripts within the first two hours of D. melanogaster embryogenesis. We identified a high-confidence set of 476 putative RBPs and confirmed RNA-binding activities for most of 24 tested candidates. Most proteins in the interactome are known RBPs or harbor canonical RBP features, but 99 exhibited previously uncharacterized RNA-binding activity. mRNA-bound RBPs and TFs exhibit distinct expression dynamics, in which the newly identified RBPs dominate the first two hours of embryonic development. Integrating our resource with in situ hybridization data from existing databases showed that mRNAs encoding RBPs are enriched in posterior regions of the early embryo, suggesting their general importance in posterior patterning and germ cell maturation
Electric fields and valence band offsets at strained [111] heterojunctions
[111] ordered common atom strained layer superlattices (in particular the
common anion GaSb/InSb system and the common cation InAs/InSb system) are
investigated using the ab initio full potential linearized augmented plane wave
(FLAPW) method. We have focused our attention on the potential line-up at the
two sides of the homopolar isovalent heterojunctions considered, and in
particular on its dependence on the strain conditions and on the strain induced
electric fields. We propose a procedure to locate the interface plane where the
band alignment could be evaluated; furthermore, we suggest that the
polarization charges, due to piezoelectric effects, are approximately confined
to a narrow region close to the interface and do not affect the potential
discontinuity. We find that the interface contribution to the valence band
offset is substantially unaffected by strain conditions, whereas the total band
line-up is highly tunable, as a function of the strain conditions. Finally, we
compare our results with those obtained for [001] heterojunctions.Comment: 18 pages, Latex-file, to appear in Phys.Rev.
- …