112 research outputs found
Stochastic kinetics of viral capsid assembly based on detailed protein structures
We present a generic computational framework for the simulation of viral
capsid assembly which is quantitative and specific. Starting from PDB files
containing atomic coordinates, the algorithm builds a coarse grained
description of protein oligomers based on graph rigidity. These reduced protein
descriptions are used in an extended Gillespie algorithm to investigate the
stochastic kinetics of the assembly process. The association rates are obtained
from a diffusive Smoluchowski equation for rapid coagulation, modified to
account for water shielding and protein structure. The dissociation rates are
derived by interpreting the splitting of oligomers as a process of graph
partitioning akin to the escape from a multidimensional well. This modular
framework is quantitative yet computationally tractable, with a small number of
physically motivated parameters. The methodology is illustrated using two
different viruses which are shown to follow quantitatively different assembly
pathways. We also show how in this model the quasi-stationary kinetics of
assembly can be described as a Markovian cascading process in which only a few
intermediates and a small proportion of pathways are present. The observed
pathways and intermediates can be related a posteriori to structural and
energetic properties of the capsid oligomers
Genomic positional conservation identifies topological anchor point (tap)RNAs linked to developmental loci
The mammalian genome is transcribed into large numbers of long noncoding RNAs (lncRNAs), but the definition of functional lncRNA groups has proven difficult, partly due to their low sequence conservation and lack of identified shared properties. Here we consider positional conservation across mammalian genomes as an indicator of functional commonality. We identify 665 conserved lncRNA promoters in mouse and human genomes that are preserved in genomic position relative to orthologous coding genes. The identified positionally conserved lncRNA genes are primarily associated with developmental transcription factor loci with which they are co-expressed in a tissue-specific manner. Strikingly, over half of all positionally conserved RNAs in this set are linked to distinct chromatin organization structures, overlapping the binding sites for the CTCF chromatin organizer and located at chromatin loop anchor points and borders of topologically associating domains (TADs). These topological anchor point (tap)RNAs possess conserved sequence domains that are enriched in potential recognition motifs for Zinc Finger proteins. Characterization of these non-coding RNAs and their associated coding genes shows that they are functionally connected: they regulate each other ′s expression and influence the metastatic phenotype of cancer cells in vitro in a similar fashion. Thus, interrogation of positionally conserved lncRNAs identifies a new subset of tapRNAs with shared functional properties. These results provide a large dataset of lncRNAs that conform to the ″extended gene″ model, in which conserved developmental genes are genomically and functionally linked to regulatory lncRNA loci across mammalian evolution
Genomic positional conservation identifies topological anchor point RNAs linked to developmental loci
BACKGROUND: The mammalian genome is transcribed into large numbers of long noncoding RNAs (lncRNAs), but the definition of functional lncRNA groups has proven difficult, partly due to their low sequence conservation and lack of identified shared properties. Here we consider promoter conservation and positional conservation as indicators of functional commonality. RESULTS: We identify 665 conserved lncRNA promoters in mouse and human that are preserved in genomic position relative to orthologous coding genes. These positionally conserved lncRNA genes are primarily associated with developmental transcription factor loci with which they are coexpressed in a tissue-specific manner. Over half of positionally conserved RNAs in this set are linked to chromatin organization structures, overlapping binding sites for the CTCF chromatin organiser and located at chromatin loop anchor points and borders of topologically associating domains (TADs). We define these RNAs as topological anchor point RNAs (tapRNAs). Characterization of these noncoding RNAs and their associated coding genes shows that they are functionally connected: they regulate each other’s expression and influence the metastatic phenotype of cancer cells in vitro in a similar fashion. Furthermore, we find that tapRNAs contain conserved sequence domains that are enriched in motifs for zinc finger domain-containing RNA-binding proteins and transcription factors, whose binding sites are found mutated in cancers. CONCLUSIONS: This work leverages positional conservation to identify lncRNAs with potential importance in genome organization, development and disease. The evidence that many developmental transcription factors are physically and functionally connected to lncRNAs represents an exciting stepping-stone to further our understanding of genome regulation
Genome-Wide Identification of Calcium-Response Factor (CaRF) Binding Sites Predicts a Role in Regulation of Neuronal Signaling Pathways
Calcium-Response Factor (CaRF) was first identified as a transcription factor based on its affinity for a neuronal-selective calcium-response element (CaRE1) in the gene encoding Brain-Derived Neurotrophic Factor (BDNF). However, because CaRF shares no homology with other transcription factors, its properties and gene targets have remained unknown. Here we show that the DNA binding domain of CaRF has been highly conserved across evolution and that CaRF binds DNA directly in a sequence-specific manner in the absence of other eukaryotic cofactors. Using a binding site selection screen we identify a high-affinity consensus CaRF response element (cCaRE) that shares significant homology with the CaRE1 element of Bdnf. In a genome-wide chromatin immunoprecipitation analysis (ChIP-Seq), we identified 176 sites of CaRF-specific binding (peaks) in neuronal genomic DNA. 128 of these peaks are within 10kB of an annotated gene, and 60 are within 1kB of an annotated transcriptional start site. At least 138 of the CaRF peaks contain a common 10-bp motif with strong statistical similarity to the cCaRE, and we provide evidence predicting that CaRF can bind independently to at least 64.5% of these motifs in vitro. Analysis of this set of putative CaRF targets suggests the enrichment of genes that regulate intracellular signaling cascades. Finally we demonstrate that expression of a subset of these target genes is altered in the cortex of Carf knockout (KO) mice. Together these data strongly support the characterization of CaRF as a unique transcription factor and provide the first insight into the program of CaRF-regulated transcription in neurons
Computation of Steady-State Probability Distributions in Stochastic Models of Cellular Networks
Cellular processes are “noisy”. In each cell, concentrations of molecules are subject to random fluctuations due to the small numbers of these molecules and to environmental perturbations. While noise varies with time, it is often measured at steady state, for example by flow cytometry. When interrogating aspects of a cellular network by such steady-state measurements of network components, a key need is to develop efficient methods to simulate and compute these distributions. We describe innovations in stochastic modeling coupled with approaches to this computational challenge: first, an approach to modeling intrinsic noise via solution of the chemical master equation, and second, a convolution technique to account for contributions of extrinsic noise. We show how these techniques can be combined in a streamlined procedure for evaluation of different sources of variability in a biochemical network. Evaluation and illustrations are given in analysis of two well-characterized synthetic gene circuits, as well as a signaling network underlying the mammalian cell cycle entry
The Human Cell Atlas.
The recent advent of methods for high-throughput single-cell molecular profiling has catalyzed a growing sense in the scientific community that the time is ripe to complete the 150-year-old effort to identify all cell types in the human body. The Human Cell Atlas Project is an international collaborative effort that aims to define all human cell types in terms of distinctive molecular profiles (such as gene expression profiles) and to connect this information with classical cellular descriptions (such as location and morphology). An open comprehensive reference map of the molecular state of cells in healthy human tissues would propel the systematic study of physiological states, developmental trajectories, regulatory circuitry and interactions of cells, and also provide a framework for understanding cellular dysregulation in human disease. Here we describe the idea, its potential utility, early proofs-of-concept, and some design considerations for the Human Cell Atlas, including a commitment to open data, code, and community
SC3: consensus clustering of single-cell RNA-seq data
Single-cell RNA-seq enables the quantitative characterization of cell types based on global transcriptome profiles. We present single-cell consensus clustering (SC3), a user-friendly tool for unsupervised clustering, which achieves high accuracy and robustness by combining multiple clustering solutions through a consensus approach (http://bioconductor.org/packages/SC3). We demonstrate that SC3 is capable of identifying subclones from the transcriptomes of neoplastic cells collected from patients.V.Y.K., T.A., A.Y. and M.H. are supported by Wellcome Trust Grants. K.N.N. is supported by the Wellcome Trust Strategic Award 'Single cell genomics of mouse gastrulation'. M.T.S. acknowledges support from FRS-FNRS; the Belgian Network DYSCO (Dynamical Systems, Control and Optimisation), funded by the Interuniversity Attraction Poles Programme initiated by the Belgian State Science Policy Office; and the ARC (Action de Recherche Concerte) on Mining and Optimization of Big Data Models, funded by the Wallonia-Brussels Federation. M.B. acknowledges support from EPSRC (grant EP/N014529/1). T.C. was funded through a core funded fellowship by the Sanger Institute and a Chancellor′s fellowship from the University of Edinburgh. K.K. and A.R.G. are supported by Bloodwise (grant ref. 13003), the Wellcome Trust (grant ref. 104710/Z/14/Z), the Medical Research Council, the Kay Kendall Leukaemia Fund, the Cambridge NIHR Biomedical Research Center, the Cambridge Experimental Cancer Medicine Centre, the Leukemia and Lymphoma Society of America (grant ref. 07037) and a core support grant from the Wellcome Trust and MRC to the Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute. W.R. was supported by BBSRC (grant ref. BB/K010867/1), the Wellcome Trust (grant ref. 095645/Z/11/Z), EU BLUEPRINT and EpiGeneSys
Applications of Genetic Programming to Finance and Economics: Past, Present, Future
While the origins of Genetic Programming (GP) stretch back over fifty years, the field of GP was invigorated by John Koza’s popularisation of the methodology in the 1990s. A particular feature of the GP literature since then has been a strong interest in the application of GP to real-world problem domains. One application domain which has attracted significant attention is that of finance and economics, with several hundred papers from this subfield being listed in the Genetic Programming Bibliography. In this article we outline why finance and economics has been a popular application area for GP and briefly indicate the wide span of this work. However, despite this research effort there is relatively scant evidence of the usage of GP by the mainstream finance community in academia or industry. We speculate why this may be the case, describe what is needed to make this research more relevant from a finance perspective, and suggest some future directions for the application of GP in finance and economics
- …