232 research outputs found
Automated Problem Decomposition for the Boolean Domain with Genetic Programming
Researchers have been interested in exploring the regularities and modularity of the problem space in genetic programming (GP) with the aim of decomposing the original problem into several smaller subproblems. The main motivation is to allow GP to deal with more complex problems. Most previous works on modularity in GP emphasise the structure of modules used to encapsulate code and/or promote code reuse, instead of in the decomposition of the original problem. In this paper we propose a problem decomposition strategy that allows the use of a GP search to find solutions for subproblems and combine the individual solutions into the complete solution to the problem
Recommended from our members
Integrated Genome Analysis Suggests that Most Conserved Non-Coding Sequences are Regulatory Factor Binding Sites
More than 98% of a typical vertebrate genome does not code for proteins. Although non-coding regions are sprinkled with short (<200 bp) islands of evolutionarily conserved sequences, the function of most of these unannotated conserved islands remains unknown. One possibility is that unannotated conserved islands could encode non-coding RNAs (ncRNAs); alternatively, unannotated conserved islands could serve as promoter-distal regulatory factor binding sites (RFBSs) like enhancers. Here we assess these possibilities by comparing unannotated conserved islands in the human and mouse genomes to transcribed regions and to RFBSs, relying on a detailed case study of one human and one mouse cell type. We define transcribed regions by applying a novel transcript-calling algorithm to RNA-Seq data obtained from total cellular RNA, and we define RFBSs using ChIP-Seq and DNAse-hypersensitivity assays. We find that unannotated conserved islands are four times more likely to coincide with RFBSs than with unannotated ncRNAs. Thousands of conserved RFBSs can be categorized as insulators based on the presence of CTCF or as enhancers based on the presence of p300/CBP and H3K4me1. While many unannotated conserved RFBSs are transcriptionally active to some extent, the transcripts produced tend to be unspliced, non-polyadenylated and expressed at levels 10 to 100-fold lower than annotated coding or ncRNAs. Extending these findings across multiple cell types and tissues, we propose that most conserved non-coding genomic DNA in vertebrate genomes corresponds to promoter-distal regulatory elements
Recommended from our members
Souporcell: robust clustering of single-cell RNA-seq data by genotype without reference genotypes.
Methods to deconvolve single-cell RNA-sequencing (scRNA-seq) data are necessary for samples containing a mixture of genotypes, whether they are natural or experimentally combined. Multiplexing across donors is a popular experimental design that can avoid batch effects, reduce costs and improve doublet detection. By using variants detected in scRNA-seq reads, it is possible to assign cells to their donor of origin and identify cross-genotype doublets that may have highly similar transcriptional profiles, precluding detection by transcriptional profile. More subtle cross-genotype variant contamination can be used to estimate the amount of ambient RNA. Ambient RNA is caused by cell lysis before droplet partitioning and is an important confounder of scRNA-seq analysis. Here we develop souporcell, a method to cluster cells using the genetic variants detected within the scRNA-seq reads. We show that it achieves high accuracy on genotype clustering, doublet detection and ambient RNA estimation, as demonstrated across a range of challenging scenarios.We acknowledge the Wellcome Sanger Institute’s DNA Pipelines for construction of the 10x sequencing libraries. We thank Allan Muhwezi and Andrew Russell for assistance with parasite culture and 10x Single-cell 3’ RNA-seq respectively. In addition, we would like to thank Matthew Young for useful conversations about ambient RNA, Mirjana Efremova for providing information about the maternal/fetal data, and Katie Gray for assistance in interpreting the previously unannotated cluster. The Wellcome Sanger Institute is funded by the Wellcome Trust (grant 206194/Z/17/Z), which supports MKNL
and MH. This work was supported by an MRC Career Development Award (G1100339) to MKNL. We would like to acknowledge the Wellcome Trust Sanger Institute as the source of the human induced pluripotent cell lines that were generated under the Human Induced Pluripotent Stem Cell Initiative funded by a grant from the Wellcome Trust and Medical Research Council, supported by the Wellcome Trust (WT098051) and the NIHR/Wellcome Trust Clinical Research Facility, and acknowledges Life Science Technologies Corporation as the provider of Cytotune (HipSci.org). The Cardiovascular Epidemiology Unit is supported by core funding from the UK Medical Research Council (MR/L003120/1), the British Heart Foundation (RG/13/13/30194; RG/18/13/33946) and the National Institute for Health Research [Cambridge Biomedical Research Centre at the Cambridge University Hospital’s NHS Foundation Trust]. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care
Genome-Wide Analysis of MEF2 Transcriptional Program Reveals Synaptic Target Genes and Neuronal Activity-Dependent Polyadenylation Site Selection
Although many transcription factors are known to control important aspects of neural development, the genome-wide programs that are directly regulated by these factors are not known. We have characterized the genetic program that is activated by MEF2, a key regulator of activity-dependent synapse development. These MEF2 target genes have diverse functions at synapses, revealing a broad role for MEF2 in synapse development. Several of the MEF2 targets are mutated in human neurological disorders including epilepsy and autism spectrum disorders, suggesting that these disorders may be caused by disruption of an activity-dependent gene program that controls synapse development. Our analyses also reveal that neuronal activity promotes alternative polyadenylation site usage at many of the MEF2 target genes, leading to the production of truncated mRNAs that may have different functions than their full-length counterparts. Taken together, these analyses suggest that the ubiquitously expressed transcription factor MEF2 regulates an intricate transcriptional program in neurons that controls synapse development
Computational Stem Cell Biology: Open Questions and Guiding Principles
Computational biology is enabling an explosive growth in our understanding of stem cells and our ability to use them for disease modeling, regenerative medicine, and drug discovery. We discuss four topics that exemplify applications of computation to stem cell biology: cell typing, lineage tracing, trajectory inference, and regulatory networks. We use these examples to articulate principles that have guided computational biology broadly and call for renewed attention to these principles as computation becomes increasingly important in stem cell biology. We also discuss important challenges for this field with the hope that it will inspire more to join this exciting area
Temporal Tracking of Microglia Activation in Neurodegeneration at Single-Cell Resolution
Microglia, the tissue-resident macrophages in the brain, are damage sensors that react to nearly any perturbation, including neurodegenerative diseases such as Alzheimer's disease (AD). Here, using single-cell RNA sequencing, we determined the transcriptome of more than 1,600 individual microglia cells isolated from the hippocampus of a mouse model of severe neurodegeneration with AD-like phenotypes and of control mice at multiple time points during progression of neurodegeneration. In this neurodegeneration model, we discovered two molecularly distinct reactive microglia phenotypes that are typified by modules of co-regulated type I and type II interferon response genes, respectively. Furthermore, our work identified previously unobserved heterogeneity in the response of microglia to neurodegeneration, discovered disease stage-specific microglia cell states, revealed the trajectory of cellular reprogramming of microglia in response to neurodegeneration, and uncovered the underlying transcriptional programs. Mathys et al. use single-cell RNA sequencing to determine the phenotypic heterogeneity of microglia during the progression of neurodegeneration. They identify multiple disease stage-specific cell states, including two molecularly distinct reactive microglia phenotypes that are typified by modules of co-regulated type I and type II interferon response genes, respectively.National Institutes of Health (U.S.) (Grant RF1 AG054321
A Dominated Coupling From The Past algorithm for the stochastic simulation of networks of biochemical reactions
<p>Abstract</p> <p>Background</p> <p>In recent years, stochastic descriptions of biochemical reactions based on the Master Equation (ME) have become widespread. These are especially relevant for models involving gene regulation. Gillespie’s Stochastic Simulation Algorithm (SSA) is the most widely used method for the numerical evaluation of these models. The SSA produces exact samples from the distribution of the ME for finite times. However, if the stationary distribution is of interest, the SSA provides no information about convergence or how long the algorithm needs to be run to sample from the stationary distribution with given accuracy. </p> <p>Results</p> <p>We present a proof and numerical characterization of a Perfect Sampling algorithm for the ME of networks of biochemical reactions prevalent in gene regulation and enzymatic catalysis. Our algorithm combines the SSA with Dominated Coupling From The Past (DCFTP) techniques to provide guaranteed sampling from the stationary distribution. The resulting DCFTP-SSA is applicable to networks of reactions with uni-molecular stoichiometries and sub-linear, (anti-) monotone propensity functions. We showcase its applicability studying steady-state properties of stochastic regulatory networks of relevance in synthetic and systems biology.</p> <p>Conclusion</p> <p>The DCFTP-SSA provides an extension to Gillespie’s SSA with guaranteed sampling from the stationary solution of the ME for a broad class of stochastic biochemical networks.</p
Genomic positional conservation identifies topological anchor point (tap)RNAs linked to developmental loci
The mammalian genome is transcribed into large numbers of long noncoding RNAs (lncRNAs), but the definition of functional lncRNA groups has proven difficult, partly due to their low sequence conservation and lack of identified shared properties. Here we consider positional conservation across mammalian genomes as an indicator of functional commonality. We identify 665 conserved lncRNA promoters in mouse and human genomes that are preserved in genomic position relative to orthologous coding genes. The identified positionally conserved lncRNA genes are primarily associated with developmental transcription factor loci with which they are co-expressed in a tissue-specific manner. Strikingly, over half of all positionally conserved RNAs in this set are linked to distinct chromatin organization structures, overlapping the binding sites for the CTCF chromatin organizer and located at chromatin loop anchor points and borders of topologically associating domains (TADs). These topological anchor point (tap)RNAs possess conserved sequence domains that are enriched in potential recognition motifs for Zinc Finger proteins. Characterization of these non-coding RNAs and their associated coding genes shows that they are functionally connected: they regulate each other ′s expression and influence the metastatic phenotype of cancer cells in vitro in a similar fashion. Thus, interrogation of positionally conserved lncRNAs identifies a new subset of tapRNAs with shared functional properties. These results provide a large dataset of lncRNAs that conform to the ″extended gene″ model, in which conserved developmental genes are genomically and functionally linked to regulatory lncRNA loci across mammalian evolution
The Helicase Aquarius/EMB-4 Is Required to Overcome Intronic Barriers to Allow Nuclear RNAi Pathways to Heritably Silence Transcription
Small RNAs play a crucial role in genome defense against transposable elements and guide Argonaute proteins to nascent RNA transcripts to induce co-transcriptional gene silencing. However, the molecular basis of this process remains unknown. Here, we identify the conserved RNA helicase Aquarius/EMB-4 as a direct and essential link between small RNA pathways and the transcriptional machinery in . Aquarius physically interacts with the germline Argonaute HRDE-1. Aquarius is required to initiate small-RNA-induced heritable gene silencing. HRDE-1 and Aquarius silence overlapping sets of genes and transposable elements. Surprisingly, removal of introns from a target gene abolishes the requirement for Aquarius, but not HRDE-1, for small RNA-dependent gene silencing. We conclude that Aquarius allows small RNA pathways to compete for access to nascent transcripts undergoing co-transcriptional splicing in order to detect and silence transposable elements. Thus, Aquarius and HRDE-1 act as gatekeepers coordinating gene expression and genome defense.A.C.B. was supported by an HFSP grant to E.A.M. (RPG0014/2015). This work was supported by Cancer Research UK (C13474/A18583, C6946/A14492), the Wellcome Trust (104640/Z/14/Z, 092096/Z/10/Z), and The European Research Council (ERC, grant 260688). The work of P.M. and X.Z. is supported by NIH grant R01GM113242 and NIH grant R01GM122080. R.M. was a Commonwealth Scholar, funded by the UK Government. J.M.C., A.N., and C.J.W. were supported by the CIHR (MOP-274660) and the Canada Research Chairs Program. A.I.L. was supported by a Wellcome Trust Programme Grant (108058/Z/15/Z) and M.L was supported by 2013/RSE/SCOTGOV/ MARIECURIE
The Malaria Cell Atlas: single parasite transcriptomes across the complete Plasmodium life cycle
Malaria parasites adopt a remarkable variety of morphological life stages as they transition through multiple mammalian host and mosquito vector environments. We profiled the single-cell transcriptomes of thousands of individual parasites, deriving the first high-resolution transcriptional atlas of the entire life cycle. We then used our atlas to precisely define developmental stages of single cells from three different human malaria parasite species, including parasites isolated directly from infected individuals. The Malaria Cell Atlas provides both a comprehensive view of gene usage in a eukaryotic parasite and an open-access reference dataset for the study of malaria parasites
- …