312 research outputs found

    In situ functional dissection of RNA cis-regulatory elements by multiplex CRISPR-Cas9 genome engineering.

    Get PDF
    RNA regulatory elements (RREs) are an important yet relatively under-explored facet of gene regulation. Deciphering the prevalence and functional impact of this post-transcriptional control layer requires technologies for disrupting RREs without perturbing cellular homeostasis. Here we describe genome-engineering based evaluation of RNA regulatory element activity (GenERA), a clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9 platform for in situ high-content functional analysis of RREs. We use GenERA to survey the entire regulatory landscape of a 3'UTR, and apply it in a multiplex fashion to analyse combinatorial interactions between sets of miRNA response elements (MREs), providing strong evidence for cooperative activity. We also employ this technology to probe the functionality of an entire MRE network under cellular homeostasis, and show that high-resolution analysis of the GenERA dataset can be used to extract functional features of MREs. This study provides a genome editing-based multiplex strategy for direct functional interrogation of RNA cis-regulatory elements in a native cellular environment

    Model-driven analysis of gene expression control

    Get PDF
    During this PhD, I worked on three different aspects in the broad field of experimental and theoretical analysis of gene regulation. The first part, "Quantifying the strength of miRNA-target interactions", addresses the problem of predicting mRNA targets of miRNAs. I show that biochemical measurements of miRNA-mRNA interactions can be used to optimise the parameter inference of a pre-existing model of miRNA target prediction. This model named MIRZA, predicts miRNA-mRNA binding using 25 energy parameters that describe the miRNA-mRNA hybrid structure, with 2 base pairing parameters for the AU and GC pairs, 3 configuration parameters for the symmetric and asymmetric loops, and 21 positional parameters for the 21 nucleotides of the miRNA sequence. MIRZA was built to infer these parameters from Argonaute protein CLIP data, which captures potential targets of miRNAs. Upon the publication of precise measurements of chemical kinetic constants of miRNA-mRNA binding interactions between a mRNA target and a set of systematically mutated miRNA sequences, we reasoned that such data could be used to improve the parameters inference of the MIRZA model. After showing that the prediction of the existing model on the set of measured miRNA-mRNA pairs shows high correlation with the binding energy calculated from the measurements, I used simulations as a proof of principle of the inference procedure and to design measurements that would be needed to infer the parameters of the MIRZA model. Staying in the field of miRNA, in "Single cell mRNA profiling reveals the hierarchical response of miRNA targets to miRNA induction", I developed an approach to infer miRNA targets based on scRNA-seq data from cells that express the miRNA at different levels. A miRNA can target several hundreds of different mRNAs and is present in the cell in limited quantities, implying that the interaction of a target mRNA with a specific miRNA depends on its concentration and on the interactions of the miRNA with its other targets. In other words, since miRNA binding is exclusive, mRNA targets compete for the same miRNA pool. Therefore, the concentrations of the thereby coupled mRNAs depend not only on the miRNA concentration but also on the concentration of every competing mRNA that is targeted by the same miRNA. To study this, HEK 293 cell lines were constructed to inducibly express a miRNA (hsa-miR-199a) as well as the mRNA encoding a green fluorescent protein. Express from the same promoter as the miRNA, this mRNA allows the monitoring of the miRNA concentration. The study aimed not only to determine the parameters of individual mRNA-mRNA interactions, but also to assess the degree to which mRNAs act in a competitive manner to influence each other's expression. scRNA-seq was chosen to bring the resolution needed to reach these goals. The effect of the miRNA on a bound target is to increase its decay rate, hence the expression levels of the targets depends on the miRNA concentration and their binding energy. To gain insight into the target binding energy, we constructed a model considering mRNA transcription rate, the miRNA-mRNA binding/unbinding rate, the mRNA decay rates in the bound and unbound state, and the free/bound concentration of miRNA. We showed that the model can be factored in terms of the miRNA concentrations in individual cells and the miRNA-mRNA target interaction parameters and we solved the model to obtain estimates of miRNA-mRNA interaction parameters, which we showed explain the mRNA levels in cells more accurately than the sequence-based computationally predicted interaction energies. Finally, in "Bayesian inference of the gene expression states from single-cell RNA-seq data" I carried out fundamental technical work on the normalisation of count data obtained in scRNA-seq experiments. As introduced above, multiple strategies have been developed with the aim of reducing the high level of noise present on such data, and estimating a 'true' biological state of expression for each gene in each cell. While the project aimed to reconstruct the Waddington landscape of regulator activity based on the single cell gene expression measurements, at the start of the project we realised that there is no satisfactory solution to gene expression normalisation in single cells in the literature. Thus, we tackled this problem with a Bayesian model, considering each gene independently and inferring a posterior probability of gene expression in each cell. Our model assumes a log-normal distribution of gene expression across cells and additional Poisson noise caused by the stochastic process of gene expression and the sampling process introduced by the mRNA capture in experimental protocols. These normalised gene expression values are the basis of a motif-activity response based approach for inferring the activity of TFs and miRNAs in individual cells, and for reconstructing the underlying landscape. The application of this normalisation algorithm to reconstruct a landscape is presented in the last part, "Realizing Waddington’s metaphor: Inferring regulatory landscapes from single-cell gene expression data". There I present the mathematical principles needed to formally define a landscape following the idea of Waddington from 1957, and I propose two applications of the landscape. First I show that it defines cell types as local minima, and secondly, in the case of cells undergoing differentiation, I show how the landscape can be used to find developmental path and the transcription factors associated with the differentiation process

    The role of miRNA-mediated cis-regulation in breast cancer susceptibility

    Get PDF
    Cis-regulation of gene expression is believed to be central in breast cancer (BC) predisposition. Here we aimed to unravel the contribution of allele-specific miRNA regulation to BC risk. We screened the effect of 223 published BC genome wide association studies (GWAS) -significant single nucleotide polymorphisms (SNPs) (and their 2668 unique proxies in high linkage disequilibrium) on differential miRNA-regulation. We filtered these SNPs based on location in miRNA genes and/or messenger RNA (mRNA) of protein-coding genes. Selected SNPs were then evaluated for putative differential miRNA-binding using TargetScan and miRanda, two distinct miRNA-target prediction algorithms, modified to analyse sequences carrying SNP alleles. Results were filtered for miRNAs with evidence of expression in breast tissue, and for genes displaying differential allelic expression (DAE), a hallmark of cis-regulation. To validate our findings, we prioritized the candidate SNPs for functional characterization, by combining TargetScan’ and miRanda’ predictions. Interestingly, none of the SNPs mapped to miRNA genes, thus suggesting that miRNA biogenesis and target-binding alteration, via seed sequence modification, are mechanisms unlikely to be involved in BC risk. Of the SNPs located in mRNA sequences we found 93 out of 3891 that were predicted to alter the miRNA-mRNA binding in 27 BC-associated risk loci. From our predictions, we found rs4245739 in MDM4 and rs11540855 in ABHD8, already functionally validated by others to cause allele-specific miRNA-binding. We carried in vitro functional characterization of rs6884232 in ATG10, one of the best candidates identified by both TargetScan and miRanda algorithms. The predicted specific binding of hsa-miR-21-3p to the G allele of this SNP was evaluated using a dual-luciferase system, with constructs carrying either the A or the G allele, and in combination with miRNA mimics and inhibitors in a breast adenocarcinoma cell line. However, no allele-specific specific differences in luciferase activity were observed. To our knowledge, this is the first study looking into the global role of miRNA regulation in BC risk, further improved by the integration of DAE data from normal breast samples.A cis-regulação é um dos mecanismos pelo qual a expressão génica é maioritariamente controlada. Para além disso, postula-se que a cis-regulação tenha um papel fundamental para o risco de doenças complexas, onde se inclui o cancro da mama. Sendo o cancro da mama o tipo de cancro mais comum entre mulheres a nível mundial, o benefício em identificar marcadores de risco e de os usar na clínica, para a identificação precoce da população em risco, é indiscutível. Nos últimos dez anos, estudos de associação genómica (do inglês genome-wide association studies, GWAS) têm vindo a identificar um largo número de variantes genéticas comuns que conferem baixos níveis de risco para cancro da mama. Estas variantes são polimorfismos de nucleótido único (SNPs) e estão maioritariamente localizadas em regiões não codificantes, o que sugere que estas poderão conferir risco através da regulação dos níveis de expressão génica. Dos estudos funcionais já efetuados para algumas destas variantes de risco, confirmou-se que estas são cis-reguladoras e que conferem risco para cancro da mama através da ligação diferencial de fatores de transcrição. No entanto, existem muitos outros mecanismos biológicos de cis-regulação para além da ligação de fatores de transcrição, tais como a regulação pós-transcricional por microARNs (miARNs) ou alteração do processamento do ARN. Particularmente, os miARNs são pequenas moléculas de ARN de cadeia simples, capazes de induzirem o silenciamento de genes alvo ao se ligarem por complementaridade, principalmente, à região 3’ não traduzida (UTR) do ARN mensageiro. Os miARNs podem também levar ao silenciamento de genes ao ligar-se à sua sequência codificante (CDS) ou 5’UTR. Cis-regulação via miARNs pode ser afectada por SNPs localizados em genes codificantes para miARNs, afetando a sua biogénese ou alterando os seus genes alvo (através da alteração da sequência de ligação dos miARNs), ou podem estar localizadas nos genes alvo, alterando a estabilidade de ligação dos miARNs. A presente dissertação de mestrado teve como objetivo desvendar a contribuição de variantes genéticas cis-reguladoras que afetam a regulação de miARNs para o risco de cancro da mama. Como tal, investigou-se o efeito de 223 SNPs, identificados por GWAS para risco para cancro da mama, na regulação diferencial por miARNs. Da mesma forma, também se avaliou o efeito de SNPs que se encontravam em elevado desequilíbrio de ligação com estes, ou seja em estrita ligação genética. Primeiro, filtrou-se estes SNPs pela sua localização em genes de miARNs e/ou 5’UTR, CDS e 3’UTRs de genes codificantes para proteínas. Posteriormente, avaliou-se o potencial dos SNPs selecionados para alterarem a ligação de miARNs. Uma vez que não se encontravam disponíveis ferramentas que o efetuassem, procedeu-se à modificação de dois algoritmos de previsão de ligação de miARNs já existentes, TargetScan e miRanda, para que estes permitissem a análise dos diferentes alelos de SNPs. Estes algoritmos foram selecionados não só pela sua metodologia, como também pela sua capacidade de serem aplicados em R, um ambiente de programação de livre-acesso para computação estatística e gráfica. Deste modo, foi possível não só modificação dos algoritmos para analisarem sequências contendo alelos de SNPs, como também foi possível a sistematização do processo. De seguida, efetuou-se um filtro para miARNs expressos em tecido mamário, por estes possuírem expressão específica de tecidos. Para além disso, filtraram-se os resultados por genes que possuíssem expressão alélica diferencial (DAE), uma característica da cis-regulação, em tecido mamário normal. De modo a validar as nossas descobertas, procedeu-se à priorização dos SNPs candidatos para validação funcional. Para isso, combinaram-se as previsões efetuadas tanto pelo TargetScan como pelo miRanda, de modo a aumentar a probabilidade de selecionar um SNP que tivesse um efeito real. Curiosamente, nenhum dos SNPs associados com risco para cancro da mama se encontrava em genes de miARNs. Isto sugere que tanto a alteração da biogénese de miARNs, como a alteração dos genes alvo por modificação da sequência de ligação dos miARNs, são mecanismos improváveis de estarem a contribuir para o risco para cancro da mama. Dos SNPs localizados em genes codificantes para proteínas, encontraram-se 93 (dos 3891 SNPs iniciais) previstos de estarem a afetar a ligação de miARNs em 27 loci associados com risco para cancro da mama. Isto sugere que cerca de um quarto dos loci já associados com risco para cancro da mama podem ser explicados pela regulação diferencial por miARNs. Destes SNPs, dois já se encontram validados funcionalmente noutros estudos, como estando a causar a ligação específica de miARNs para diferentes alelos: rs4245739 no gene MDM4 e rs11540855 no gene ABHD8. Isto valida a nossa análise e sugere que outras das previsões efetuadas também poderão ser funcionais. Finalmente, a priorização dos SNPs de risco, associados com cancro da mama, efetuada através da combinação das previsões obtidas tanto pelo TargetScan, como pelo miRanda, resultou na identificação de seis candidatos com maior probabilidade de estarem a afetar a ligação de miARNs: rs1573 (localizado no gene ASB13), rs2385088 (localizado no gene ISYNA1), rs1019806 e rs6884232 (localizados no gene ATG10), rs4808616 (localizado no gene ABHD8), e ainda rs3734805 (localizado no gene CCDC170). De seguida, procedeu-se à caracterização funcional in vitro do rs6884232 localizado no gene ATG10. Para este SNP, previu-se a ligação específica do hsa-miR-21-3p ao alelo G (context++ score = -0.169), o que resultaria na diminuição da expressão deste gene. Primeiro, efetuou-se um ensaio de luciferase em células de adenocarcinoma mamário (MCF-7) usando plasmídeos com genes repórter de luciferase contendo, ou o alelo A, ou o alelo G do SNP. No entanto, não foram observadas diferenças na atividade da luciferase entre ambos os alelos. De seguida repetiu-se o ensaio, desta vez em combinação com mímicos e inibidores do hsa-miR-21-3p e ainda com os seus respetivos controlos negativos. Mais uma vez, não se obteviveram diferenças na atividade da luciferase entre ambos os alelos, sugerindo que este SNP não causa a ligação diferencial do hsa-miR-21-3p aos seus alelos. Porém, a elevada variabilidade obtida entre replicados biológicos, assim como efeitos não esperados em condições controlo, não nos permite ainda retirar conclusões definitivas, sendo necessário repetir o ensaio. Tanto quanto se sabe, o presente trabalho é o primeiro estudo a avaliar o papel global da regulação por miARNs no risco para cancro da mama e que engloba dados de DAE em tecido mamário normal. No futuro, esperamos complementar esta abordagem ao determinar e caracterizar a importância clínica do efeito de variantes genéticas cis-reguladoras mediadas por miARNs em cancro da mama. Isto permitirá melhorar a caracterização dos loci já associados com risco para cancro da mama e ainda melhorar o conhecimento da etiologia do cancro da mama

    Inference of biomolecular interactions from sequence data

    Get PDF
    This thesis describes our work on the inference of biomolecular interactions from sequence data. In particular, the first part of the thesis focuses on proteins and describes computational methods that we have developed for the inference of both intra- and inter-protein interactions from genomic data. The second part of the thesis centers around protein-RNA interactions and describes a method for the inference of binding motifs of RNA-binding proteins from high-throughput sequencing data. The thesis is organized as follows. In the first part, we start by introducing a novel mathematical model for the characterization of protein sequences (chapter 1). We then show how, using genomic data, this model can be successfully applied to two different problems, namely to the inference of interacting amino acid residues in the tertiary structure of protein domains (chapter 2) and to the prediction of protein-protein interactions in large paralogous protein families (chapters 3 and 4). We conclude the first part by a discussion of potential extensions and generalizations of the methods presented (chapter 5). In the second part of this thesis, we first give a general introduction about RNA- binding proteins (chapter 6). We then describe a novel experimental method for the genome-wide identification of target RNAs of RNA-binding proteins and show how this method can be used to infer the binding motifs of RNA-binding proteins (chapter 7). Finally, we discuss a potential mechanism by which KH domain-containing RNA- binding proteins could achieve the specificity of interaction with their target RNAs and conclude the second part of the thesis by proposing a novel type of motif finding algorithm tailored for the inference of their recognition elements (chapter 8)

    Characterization of post-transcriptional regulatory network of RNA-binding proteins using computational predictions and deep sequencing data

    Get PDF
    This report is divided into three parts: Data Analysis, Mathematical Modeling and Conclusion and future directions. In the Data Analysis part, various methods and tools for characterizing the post-transcriptional regulatory networks of RNA-binding proteins are discussed and applied. Chapter 2 introduces PAR-CLIP, a method for transcriptomewide identification of RNA binding proteins at nucleotide resolution. PAR-CLIP was successfully applied on RNA binding proteins and their binding specificity was characterized. Partly due to their vast volume, the data that were so far generated in CLIP experiments have not been put in a form that enables fast and interactive exploration of binding sites. To address this need, Chapter 3 presents CLIPZ, which is a database and analysis environment for various kinds of deep sequencing (and in particular CLIP) data, that aims to provide an open-access repository of information for post-transcriptional regulatory elements. Chapter 4 revisits various CLIP methods. A set of ideas in terms of both experimental protocols and data analysis are presented to improve the quality and reproducibility of such experiments. In general, cytoplasmic RNAs are isolated in CLIP experiments. Like many high-throughput experiments, CLIP has a certain amount of isolated RNAs which do not represent regulatory binding sites. To improve the quality of the obtained RNAs, a set of novel methods for data analysis are also suggested. These methods are added as new tools to the CLIPZ analysis platform. Argonaute CLIP data could in principle be beneficial in improving the microRNA target site predictions. However, several questions still remain which cannot be addressed using CLIP methods. For example: • Argonaute CLIP data by default does not reveal which microRNAs are more likely to interact to the mRNA binding site at the time of cross-linking. • As mentioned earlier, biochemical and structural studies of Thermus thermophilus Argonaute protein suggest that the protein-RNA interaction between microRNA and the Argonaute protein forms a physical structure that only some positions in the microRNA become accessible to the target binding site. Having inferred the interacting microRNA, it is also interesting to predict the most plausible secondary structure of the hybridized microRNA-mRNA complex. Mathematical Modeling part of the report contains Chapter 5. This chapter presents a novel mathematical model called MIRZA to address the above mentioned questions. An in-depth introduction to MIRZA is presented and its performance in terms of identifying functionally relevant targets of microRNAs is discussed. Finally, Conclusion and future directions part of the report contains Chapter 6 in which discusses the main findings of the projects and gives an outlook of where future work could be taken up

    Artificial intelligence in cancer target identification and drug discovery

    Get PDF
    Artificial intelligence is an advanced method to identify novel anticancer targets and discover novel drugs from biology networks because the networks can effectively preserve and quantify the interaction between components of cell systems underlying human diseases such as cancer. Here, we review and discuss how to employ artificial intelligence approaches to identify novel anticancer targets and discover drugs. First, we describe the scope of artificial intelligence biology analysis for novel anticancer target investigations. Second, we review and discuss the basic principles and theory of commonly used network-based and machine learning-based artificial intelligence algorithms. Finally, we showcase the applications of artificial intelligence approaches in cancer target identification and drug discovery. Taken together, the artificial intelligence models have provided us with a quantitative framework to study the relationship between network characteristics and cancer, thereby leading to the identification of potential anticancer targets and the discovery of novel drug candidates

    COMPUTATIONAL MODELING OF RNA-SMALL MOLECULE AND RNA-PROTEIN INTERACTIONS

    Get PDF
    The past decade has witnessed an era of RNA biology; despite the considerable discoveries nowadays, challenges still remain when one aims to screen RNA-interacting small molecule or RNA-interacting protein. These challenges imply an immediate need for cost-efficient while predictive computational tools capable of generating insightful hypotheses to discover novel RNA-interacting small molecule or RNA-interacting protein. Thus, we implemented novel computational models in this dissertation to predict RNA-ligand interactions (Chapter 1) and RNA-protein interactions (Chapter 2). Targeting RNA has not garnered comparable interest as protein, and is restricted by lack of computational tools for structure-based drug design. To test the potential of translating molecular docking tools designed for protein to RNA-ligand docking and virtual screening, we benchmarked 5 docking software and 11 scoring functions to assess their performances in pose reproduction, pose ranking, score-RMSD correlation and virtual screening. From this benchmark, we proposed a three-step docking pipelines optimized for virtual screening against RNAs with different flexibility properties. Using this pipeline, we have successfully identified a selective compound binding to GA:UU motif. Both NMR and the subsequent MD simulation proved its selective binding to GA:UU motif flanked by two tandem flexible base pairs next to GA. Consistent to the 3D model, SAR analysis revealed that any R-group substitution would abolish the binding. Current computational methods for RNA-protein interaction prediction (sequence-based or structure-based) are either short of interpretability or robustness. Aware of these pitfalls, we implemented RNA-Protein interaction prediction through Interface Threading (RPIT), which identifies and references a known RNA-protein interface as the template to infer the region where the interaction occurs and predict the interacting propensity based on the interface profiles. To estimate the propensity more accurately, we implemented five statistical scoring functions based our unique collection of non-redundant protein-RNA interaction database. Our benchmark using leave-protein-out cross validation and two external validation sets resulted in overall 70%-80% accuracy of RPIT. Compared with other methods, RPIT offers an inexpensive but robust method for in silico prediction of RNA-protein interaction networks, and for prioritizing putative RNA-protein pairs using virtual screening

    The role of miRNA regulation in cancer progression and drug resistance

    Get PDF
    corecore