12 research outputs found

    Engineering a Mastoparan Peptide Concatemer Prodrug From CircRNA for Cancer Therapy

    Get PDF
    CircRNAs are covalently closed loops of RNA formed as products of RNA backsplicing in mammalian cells. Engineered circRNAs containing a desired coding sequence have been produced using self-splicing introns. Translatable circRNAs require an internal ribosomal entry site or m6A methylation site for translation initiation. CircRNAs with a nucleotide length a multiple of three, a start codon, and no stop codon in the same frame have an infinite open reading frame. This project aimed to produce a mastoparan peptide concatemer prodrug from circRNA for treatment in cancer therapeutics. Anabaena group I self-splicing introns were used to circularise a mastoparan prodrug containing a metalloproteinase cleavage site for activation (construct named Anabaena Mastoparan). RNA circularisation was achieved in vitro but not in mammalian cells, indicating that group I Anabaena introns do not have the catalytic ability to splice in mammalian cells. Mastoparan peptides were detected in vitro and in vivo after adding a Flag tag to the Anabaena Mastoparan construct. However, only peptides produced from unspliced RNA translation were detected. Mastoparan peptides extracted from Anabaena Mastoparan transfected cells caused cytotoxicity when added to the culture medium of MDA-MB-231 and MCF-7 cells. Anabaena Mastoparan transfection did not directly lead to cytotoxicity, demonstrating the effectiveness of mastoparan as a prodrug, only being activated by metalloproteinase cleavage in the extracellular environment. This project aimed to identify endogenous circRNAs that have the coding potential to produce a peptide with a different biological function to their parent gene. Using a Bioinformatics approach, circRNAs containing an ORF through the circular junction were identified. Their ORF through junction peptides were investigated for differences in predicted function to their parent gene using InterProScan and Protein Homology/analogY Recognition (Phyre2). Using this approach, four candidate circRNAs were identified that encode a predicted peptide with a different biological function to their parent gene. The four candidate circRNAs contain either a predicted m6A or an internal ribosomal entry site for translation initiation, and have a codon adaption index score (CAI) between 0.781 and 0.821, comparable to the 75th percentile of ORFs through the circular junction (079), and the mean CAI score of coding sequence mRNA. This project demonstrates that the circular junction of circRNAs can provide the coding potential to produce unique peptides with a different function to their parent gene

    Systems Analytics and Integration of Big Omics Data

    Get PDF
    A “genotype"" is essentially an organism's full hereditary information which is obtained from its parents. A ""phenotype"" is an organism's actual observed physical and behavioral properties. These may include traits such as morphology, size, height, eye color, metabolism, etc. One of the pressing challenges in computational and systems biology is genotype-to-phenotype prediction. This is challenging given the amount of data generated by modern Omics technologies. This “Big Data” is so large and complex that traditional data processing applications are not up to the task. Challenges arise in collection, analysis, mining, sharing, transfer, visualization, archiving, and integration of these data. In this Special Issue, there is a focus on the systems-level analysis of Omics data, recent developments in gene ontology annotation, and advances in biological pathways and network biology. The integration of Omics data with clinical and biomedical data using machine learning is explored. This Special Issue covers new methodologies in the context of gene–environment interactions, tissue-specific gene expression, and how external factors or host genetics impact the microbiome

    Spectral Learning of Binomial HMMs for DNA Methylation Data

    Full text link
    We consider learning parameters of Binomial Hidden Markov Models, which may be used to model DNA methylation data. The standard algorithm for the problem is EM, which is computationally expensive for sequences of the scale of the mammalian genome. Recently developed spectral algorithms can learn parameters of latent variable models via tensor decomposition, and are highly efficient for large data. However, these methods have only been applied to categorial HMMs, and the main challenge is how to extend them to Binomial HMMs while still retaining computational efficiency. We address this challenge by introducing a new feature-map based approach that exploits specific properties of Binomial HMMs. We provide theoretical performance guarantees for our algorithm and evaluate it on real DNA methylation data

    Statistical learning based inference and analysis of epigenetic regulatory network topologies in T-helper cells

    Get PDF
    The reliable statistical inference of epigenetic regulatory networks that govern mammalian cell fates is very challenging. In this thesis we study this question for the differentiation decisions of T-helper (Th) cells, which have recently been shown to adopt a continuum of differentiated states in response to cytokine signals. To infer the underlying regulatory networks we introduce a novel framework for the inference of epigenetic regulatory network topologies based on statistical learning. First, we infer, via a Hidden Markov Model, chromatin states based on histone modification patterns in naĂŻve Th cells and differentiated Th1, Th2 and mixed Th1/2 states; these states are controlled by external cytokine stimuli and the gene dose of the Th1 master transcription factor Tbet (Tbx21). We then introduce a linear multivariate correlation measure for mapping enhancers to their target genes, which is parametrized on a training set of known enhancers. This analysis is refined further by the application of partial correlations to distinguish direct from indirect effects. Applying this approach to our data, we recover known enhancers and obtain a genomewide enhancer-gene mapping. We also extend this to the correlation of repressive regulatory elements with gene expression. Next, we focus on the enhancers that regulate differentially expressed Th1 and Th2 specific transcripts. Building machine learning based predictors, we identify Th1 and Th2 specific enhancer and repressive state classes characterized by their response patterns to cytokine stimuli and Tbet dose. In turn, we use chromatin immunoprecipitation data of transcription factors to define the transcriptional regulatory logic governing the activities of the enhancer classes. Finally, we combine enhancer-target gene maps and enhancer regulatory logic as well as inhibitory elements to infer a bipartite epigenetic network. The network architecture builds on enhancer and repressive state classes as well as on genes and transcription factors leading to a weighted multidigraph. The network topology reveals distinct community structures related to Th1, Th2 and hybrid functionality. We furthermore analyse multiplex networks resulting in condition-specific topologies. From these analyses we obtain unique contributions of distinct network nodes. Utilizing random walks on multidigraphs we extract metastable processes underlying the observed system. In conclusion we present a robust quantitative framework for mapping chromatin states to gene activity, and, by factoring in transcription factor regulation of enhancers, inferring epigenetic regulatory networks. This methodology is applicable to a wide range of systems

    Information retrieval and text mining technologies for chemistry

    Get PDF
    Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio

    Application of multivariate statistics and machine learning to phenotypic imaging and chemical high-content data

    Get PDF
    Image-based high-content screens (HCS) hold tremendous promise for cell-based phenotypic screens. Challenges related to HCS include not only storage and management of data, but critical analysis of the complex image-based data. I implemented a data storage and screen management framework and developed approaches for data analysis of a number high-content microscopy screen formats. I visualized and analysed pilot screens to develop a robust multi-parametric assay for the identification of genes involved in DNA damage repair in HeLa cells. Further, I developed and implemented new approaches for image processing and screen data normalization. My analyses revealed that the ubiquitin ligase RNF8 plays a central role in DNA-damage response and that a related ubiquitin ligase RNF168 causes the cellular and developmental phenotypes characteristic for the RIDDLE syndrome. My approaches also uncovered a role for the MMS22LTONSL complex in DSB repair and its role in the recombination-dependent repair of stalled or collapsed replication forks. The discovery of novel bioactive molecules is a challenge because the fraction of active candidate molecules is usually small and confounded by noise in experimental readouts. Cheminformatics can improve robustness of chemical high-throughput screens and functional genomics data sets by taking structure-activity relationships into account. I applied statistics, machine learning and cheminformatics to different data sets to discern novel bioactive compounds. I showed that phenothiazines and apomorphines are regulators for cell differentiation in murine embryonic stem cells. Further, I pioneered computational methods for the identification of structural features that influence the degradation and retention of compounds in the nematode C. elegans. I used chemoinformatics to assemble a comprehensive screening library of previously approved drugs for redeployment in new bioassays. A combination of chemical genetic interactions, cheminformatics and machine learning allowed me to predict novel synergistic antifungal small molecule combinations from sensitized screens with the drug library. In another study on the biological effects of commonly prescribed psychoactive compounds, I discovered a strong link between lipophilicity and bioactivity of compounds in yeast and unexpected off-target effects that could account for unwanted side effects in humans. I also investigated structure-activity relationships and assessed the chemical diversity of a compound collection that was used to probe chemical-genetic interactions in yeast. Finally, I have made these methods and tools available to the scientific community, including an open source software package called MolClass that allows researchers to make predictions about bioactivity of small molecules based on their chemical structure

    Annual Report

    Get PDF
    corecore