234 research outputs found

    Engineering Conditional Guide RNAs for Cell-Selective Regulation of CRISPR/Cas9

    Get PDF
    CRISPR/Cas9 is a versatile platform for implementing diverse modes of genetic perturbation such as gene silencing, induction, deletion, or replacement. This technology is popularly used in developmental biology to probe genetic circuitry via constitutive gene knockdown. Global gene silencing could introduce artifacts in the study of developmental regulatory pathways, and this motivates the development of cell-selective gene editing. Our lab has recently created conditional guide RNAs (cgRNA) that enable CRISPR/Cas9 systems to silence a desired gene Y conditioned on the detection of an RNA transcript X inside of a cell. cgRNA systems were discovered via insertion and deletion mutations that systematically explored the structure function of the guide RNA. Nucleic acid engineering software (NUPACK) was used to generate orthogonal libraries of cgRNA molecules that executed both ON → OFF logic (conditional inactivation by an RNA trigger) and OFF → ON logic (conditional activation by an RNA trigger). A dCas9-based RFP silencing assay in bacteria was developed and used to show these cgRNA sequences were functional and could detect short exogenous trigger sequences in an orthogonal and doseresponsive manner. Subsequent studies on cgRNA structure and function enabled us to engineer next-generation systems that have fewer constraints on the trigger sequence or structure. These next-generation cgRNAs were tested against short synthetic mRNA transcripts, truncated sub-sequences of endogenous mRNAs, and full-length endogenous mRNAs. Synthetic mRNA transcripts were used to study the effect of protein translation on trigger RNA binding. cgRNAs were capable of detecting synthetic sequences embedded in the 3′ UTR of fluorescent protein mRNAs. cgRNAs could also detect short synthetic mRNAs or truncated subsequences from endogenous mRNAs. However, the detection of native full-length endogenous mRNAs remained challenging because we cannot reliably predict the local structure of sub-sequences within a long RNA transcript. High-throughput cgRNAscreening may prove necessary for finding accessible binding sites onmRNA transcripts. Nevertheless, cgRNA functionalities could be useful in developmental biology by enabling precision perturbation of regulatory events, linking guide RNA activity to an RNA marker X correlated to a specific cell type or temporal expression pattern. This work opens the possibility for future applications such as cell-selective gene therapies.</p

    SparseRNAFolD: Sparse RNA Pseudoknot-Free Folding Including Dangles

    Get PDF

    Specificity of the innate immune responses to different classes of non-tuberculous mycobacteria

    Get PDF
    Mycobacterium avium is the most common nontuberculous mycobacterium (NTM) species causing infectious disease. Here, we characterized a M. avium infection model in zebrafish larvae, and compared it to M. marinum infection, a model of tuberculosis. M. avium bacteria are efficiently phagocytosed and frequently induce granuloma-like structures in zebrafish larvae. Although macrophages can respond to both mycobacterial infections, their migration speed is faster in infections caused by M. marinum. Tlr2 is conservatively involved in most aspects of the defense against both mycobacterial infections. However, Tlr2 has a function in the migration speed of macrophages and neutrophils to infection sites with M. marinum that is not observed with M. avium. Using RNAseq analysis, we found a distinct transcriptome response in cytokine-cytokine receptor interaction for M. avium and M. marinum infection. In addition, we found differences in gene expression in metabolic pathways, phagosome formation, matrix remodeling, and apoptosis in response to these mycobacterial infections. In conclusion, we characterized a new M. avium infection model in zebrafish that can be further used in studying pathological mechanisms for NTM-caused diseases

    RFold: RNA Secondary Structure Prediction with Decoupled Optimization

    Full text link
    The secondary structure of ribonucleic acid (RNA) is more stable and accessible in the cell than its tertiary structure, making it essential for functional prediction. Although deep learning has shown promising results in this field, current methods suffer from poor generalization and high complexity. In this work, we present RFold, a simple yet effective RNA secondary structure prediction in an end-to-end manner. RFold introduces a decoupled optimization process that decomposes the vanilla constraint satisfaction problem into row-wise and column-wise optimization, simplifying the solving process while guaranteeing the validity of the output. Moreover, RFold adopts attention maps as informative representations instead of designing hand-crafted features. Extensive experiments demonstrate that RFold achieves competitive performance and about eight times faster inference efficiency than the state-of-the-art method. The code and Colab demo are available in \href{http://github.com/A4Bio/RFold}{http://github.com/A4Bio/RFold}

    From RNA folding to inverse folding: a computational study: Folding and design of RNA molecules

    Get PDF
    Since the discovery of the structure of DNA in the early 1953s and its double-chained complement of information hinting at its means of replication, biologists have recognized the strong connection between molecular structure and function. In the past two decades, there has been a surge of research on an ever-growing class of RNA molecules that are non-coding but whose various folded structures allow a diverse array of vital functions. From the well-known splicing and modification of ribosomal RNA, non-coding RNAs (ncRNAs) are now known to be intimately involved in possibly every stage of DNA translation and protein transcription, as well as RNA signalling and gene regulation processes. Despite the rapid development and declining cost of modern molecular methods, they typically can only describe ncRNA's structural conformations in vitro, which differ from their in vivo counterparts. Moreover, it is estimated that only a tiny fraction of known ncRNAs has been documented experimentally, often at a high cost. There is thus a growing realization that computational methods must play a central role in the analysis of ncRNAs. Not only do computational approaches hold the promise of rapidly characterizing many ncRNAs yet to be described, but there is also the hope that by understanding the rules that determine their structure, we will gain better insight into their function and design. Many studies revealed that the ncRNA functions are performed by high-level structures that often depend on their low-level structures, such as the secondary structure. This thesis studies the computational folding mechanism and inverse folding of ncRNAs at the secondary level. In this thesis, we describe the development of two bioinformatic tools that have the potential to improve our understanding of RNA secondary structure. These tools are as follows: (1) RAFFT for efficient prediction of pseudoknot-free RNA folding pathways using the fast Fourier transform (FFT)}; (2) aRNAque, an evolutionary algorithm inspired by Lévy flights for RNA inverse folding with or without pseudoknot (A secondary structure that often poses difficulties for bio-computational detection). The first tool, RAFFT, implements a novel heuristic to predict RNA secondary structure formation pathways that has two components: (i) a folding algorithm and (ii) a kinetic ansatz. When considering the best prediction in the ensemble of 50 secondary structures predicted by RAFFT, its performance matches the recent deep-learning-based structure prediction methods. RAFFT also acts as a folding kinetic ansatz, which we tested on two RNAs: the CFSE and a classic bi-stable sequence. In both test cases, fewer structures were required to reproduce the full kinetics, whereas known methods (such as Treekin) required a sample of 20,000 structures and more. The second tool, aRNAque, implements an evolutionary algorithm (EA) inspired by the Lévy flight, allowing both local global search and which supports pseudoknotted target structures. The number of point mutations at every step of aRNAque's EA is drawn from a Zipf distribution. Therefore, our proposed method increases the diversity of designed RNA sequences and reduces the average number of evaluations of the evolutionary algorithm. The overall performance showed improved empirical results compared to existing tools through intensive benchmarks on both pseudoknotted and pseudoknot-free datasets. In conclusion, we highlight some promising extensions of the versatile RAFFT method to RNA-RNA interaction studies. We also provide an outlook on both tools' implications in studying evolutionary dynamics

    Towards Parsimonious Generative Modeling of RNA Families

    Full text link
    Generative probabilistic models emerge as a new paradigm in data-driven, evolution-informed design of biomolecular sequences. This paper introduces a novel approach, called Edge Activation Direct Coupling Analysis (eaDCA), tailored to the characteristics of RNA sequences, with a strong emphasis on simplicity, efficiency, and interpretability. eaDCA explicitly constructs sparse coevolutionary models for RNA families, achieving performance levels comparable to more complex methods while utilizing a significantly lower number of parameters. Our approach demonstrates efficiency in generating artificial RNA sequences that closely resemble their natural counterparts in both statistical analyses and SHAPE-MaP experiments, and in predicting the effect of mutations. Notably, eaDCA provides a unique feature: estimating the number of potential functional sequences within a given RNA family. For example, in the case of cyclic di-AMP riboswitches (RF00379), our analysis suggests the existence of approximately 1039\mathbf{10^{39}} functional nucleotide sequences. While huge compared to the known <4,000< \mathbf{4,000} natural sequences, this number represents only a tiny fraction of the vast pool of nearly 1082\mathbf{10^{82}} possible nucleotide sequences of the same length (136 nucleotides). These results underscore the promise of sparse and interpretable generative models, such as eaDCA, in enhancing our understanding of the expansive RNA sequence space.Comment: 33 pages (including SI

    Clustering and analysis of g quadruplex sequences.

    Get PDF
    G quadruplex structures are secondary structures located throughout the genome of various organisms with involvement in regulatory functions in different transcription, translation, genome stability, epigenetic regulation as well as cell division. Even with the diverse acknowledgement of G4 structure in vivo, there are no current search tools for G quadruplexes based on already identified G quadruplexes and identified families across different genomes based on sequence diversity. Construction of families of G4 sequences and identifying their polymorphisms within disease and disorders will lead to a better understanding of their functional roles and will further research into the biophysical modeling of interactions with oligonucleotide treatments of disease. The first project aims to develop a framework for clustering G quadruplex (G4) sequences into families based on sequence, structure, and thermodynamic properties. No current search tools exist to filter G4s based on their properties, and the diversity of G4 sequences across the genome is not fully understood. To address this gap, we utilized a combination of clustering and annotation methods to identify 95 families of G4 sequences within the human genome. Profiles for each family were created using hidden Markov models, and their thermodynamic properties, functional annotations, and transcription factor binding motifs were analyzed. The second project aims to investigate the effect of single nucleotide variations (SNVs) on G4 structures in disease contexts. Although the role of G4s in cancer and metabolic disorders are well-established, the effect of SNVs on G4s has not been extensively studied. Using the COSMIC and CLINVAR databases, we identified over 37,000 G4 SNVs and analyzed their effects on G4 secondary structures. We found that a significant proportion of SNVs result in G4 loss or gain, and we identified genes enriched for destabilizing SNVs in G4-forming regions. We also analyzed mutational patterns in the G4 structure and found a higher selective pressure on the coding region of the template strand. Our findings provide insights into the effects of SNVs on G4 structures and highlight potential targets for therapeutic intervention in diseases associated with G4 dysregulation
    corecore