Search CORE

12 research outputs found

Spectral Learning of Binomial HMMs for DNA Methylation Data

Author: Chaudhuri Kamalika
Mukamel Eran A.
Zhang Chicheng
Publication venue
Publication date: 07/02/2018
Field of study

We consider learning parameters of Binomial Hidden Markov Models, which may be used to model DNA methylation data. The standard algorithm for the problem is EM, which is computationally expensive for sequences of the scale of the mammalian genome. Recently developed spectral algorithms can learn parameters of latent variable models via tensor decomposition, and are highly efficient for large data. However, these methods have only been applied to categorial HMMs, and the main challenge is how to extend them to Binomial HMMs while still retaining computational efficiency. We address this challenge by introducing a new feature-map based approach that exploits specific properties of Binomial HMMs. We provide theoretical performance guarantees for our algorithm and evaluate it on real DNA methylation data

arXiv.org e-Print Archive

eScholarship - University of California

Information retrieval and text mining technologies for chemistry

Author: Abacha A. B.
Alberts D.
Alfonso Valencia
American Chemical Society
Anália Lourenço
Aphinyanaphongs Y.
Appelt D. E.
Aramaki E.
Aronson A. R.
Asahara M.
Babych B.
Baeza-Yates R.
Bambenek J.
Barnard J. M.
Bast H.
Batista-Navarro R.
Batista-Navarro R. T.
Bian J.
Bies A.
Bikel D. M.
Blaschke C.
Brecher J. S.
Brill E.
Bunescu R.
Bunescu R. C.
Califf M. E.
Carpenter B.
Caruana R.
Chee B. W.
Chhieng D.
Chinchor N.
Chiticariu L.
Chowdhury M. F. M.
Chowdhury M. F. M.
Ciravegna F.
Cleverdon C. W.
Coden A.
Cohen R.
Collier N.
Corbett P.
Corbett P.
Cover T. M.
Craven M.
Cummings M. D.
Currano J. N.
Currano J. N.
Currano J. N.
Currano J. N.
Cutting D. R.
Davis C. H.
Dieb T. M.
Dieb T. M.
Dogan R. I.
Downs G. M.
Dunikowski L. G.
Embarek M.
Eom J.-H.
Faber J.
Fall C. J.
Fattore M.
Fennell R. W.
Freund Y.
Fujiyoshi A.
Fukuda K.
Gale W. A.
Garcelon N.
Garnier J.-P.
Garten Y.
Ginn R.
Giuliano C.
Gold S.
Grefenstette G.
Grishman R.
Gurulingappa H.
Gurulingappa H.
Gusfield D.
He Y.
Hearst M. A.
Hersh W.
Hersh W.
Hirschman L.
Hobbs J. R.
Hodge G. M.
Holzinger A.
Hsueh P.-Y.
Huber T.
Iyer S. V
Jackson P.
Joachims T.
Johnson D.
Jonnalagadda S.
Jonnalagadda S.
Julen Oyarzabal
Jurafsky D.
Kaewphan S.
Kaewphan S.
Karkaletsis V.
Katragadda S.
Kazama J.
Kazawa H.
Kelly L.
Kenny P. W.
Kim J.-D.
Kim Y.
Kleene S. C.
Kolárik C.
Kongburan W.
Kornai A.
Kraaij W.
Krallinger M.
Krallinger M.
Krallinger M.
Kremer G.
Kreuzthaler M.
Kucera H.
Lai H.
Lawson A. J.
Leaman R.
Leaman R.
Lee C.-H.
Levenshtein V. I.
Levin M. A.
Li J.
Li N.
Li Y.
Liu X.
Locke W. N.
Lovins J. B.
Lowe D. M.
Lupu M.
Lupu M.
Mackenzie C. E.
Manning C. D.
Mansouri A.
Martin E.
Martin Krallinger
Mattmann C.
Maynard D.
McCallum A.
McEwen L.
McKnight L.
McNaught A.
Meystre S. M.
Michalski S. R.
Michie D.
Mihalcea R.
Mitton R.
Miwa M.
Mollá D.
Murray-Rust P.
Müller B.
Nebel A.
Nikfarjam A.
Névéol A.
Névéol A.
Obdulia Rabal
Pang B.
Panico R.
Perez-Iratxeta C.
Ponomareva N.
Ratinov L.
Ratnaparkhi A.
Read J.
Rebholz-Schuhmann D.
Reeker L. H.
Rocchio J. J.
Rohbeck H.-G.
Rosario B.
Roth D. L.
Rupp C. J.
Rupp C. J.
Sagae K.
Salim N.
Salton G.
Sanchez-Cisneros D.
Saracevic T.
Sasaki Y.
Schapire R. E.
Schenck R.
Schenck R. J.
Schlaf A.
Schuemie M. J.
Segura Bedmar I.
Segura-Bedmar I.
Sekine S.
Sequeira E.
Settles B.
Settles B.
Sewell W.
Shen D.
Shidha M. V
Singhal A.
Smith E. G.
Stamatatos E.
Sutton C.
Sætre R.
Taylor K. T.
Tharatipyakul A.
Tomanek K.
Tomanek K.
Tsuruoka Y.
Tsuruoka Y.
Täger W.
Urbain J.
van Rijsbergen C. J.
Vapnik V. N.
Vasserman A.
Visweswaran S.
Voorhees E. M.
Wang W.
Wang Y.
Wei C.-H.
Wei C.-H.
Wermter J.
Wilbur W. J.
Willett P.
Willett P.
Williams A. J.
Witten I. H.
Workman M. L.
Wrublewski D. T.
Xu R.
Xue N.
Yan S.
Yang C.
Yang C. C.
Yang Y.
Zass E.
Zipf G. K.
Zipf G. K.
Zitnik S.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 01/01/2017
Field of study

Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio

Universidade do Minho: RepositoriUM

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Systems Analytics and Integration of Big Omics Data

Author: Hardiman Gary
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

A “genotype"" is essentially an organism's full hereditary information which is obtained from its parents. A ""phenotype"" is an organism's actual observed physical and behavioral properties. These may include traits such as morphology, size, height, eye color, metabolism, etc. One of the pressing challenges in computational and systems biology is genotype-to-phenotype prediction. This is challenging given the amount of data generated by modern Omics technologies. This “Big Data” is so large and complex that traditional data processing applications are not up to the task. Challenges arise in collection, analysis, mining, sharing, transfer, visualization, archiving, and integration of these data. In this Special Issue, there is a focus on the systems-level analysis of Omics data, recent developments in gene ontology annotation, and advances in biological pathways and network biology. The integration of Omics data with clinical and biomedical data using machine learning is explored. This Special Issue covers new methodologies in the context of gene–environment interactions, tissue-specific gene expression, and how external factors or host genetics impact the microbiome

Directory of Open Access Books (DOAB)

Statistical learning based inference and analysis of epigenetic regulatory network topologies in T-helper cells

Author: Kommer Christoph
Publication venue
Publication date: 01/01/2018
Field of study

The reliable statistical inference of epigenetic regulatory networks that govern mammalian cell fates is very challenging. In this thesis we study this question for the differentiation decisions of T-helper (Th) cells, which have recently been shown to adopt a continuum of differentiated states in response to cytokine signals. To infer the underlying regulatory networks we introduce a novel framework for the inference of epigenetic regulatory network topologies based on statistical learning. First, we infer, via a Hidden Markov Model, chromatin states based on histone modification patterns in naïve Th cells and differentiated Th1, Th2 and mixed Th1/2 states; these states are controlled by external cytokine stimuli and the gene dose of the Th1 master transcription factor Tbet (Tbx21). We then introduce a linear multivariate correlation measure for mapping enhancers to their target genes, which is parametrized on a training set of known enhancers. This analysis is refined further by the application of partial correlations to distinguish direct from indirect effects. Applying this approach to our data, we recover known enhancers and obtain a genomewide enhancer-gene mapping. We also extend this to the correlation of repressive regulatory elements with gene expression. Next, we focus on the enhancers that regulate differentially expressed Th1 and Th2 specific transcripts. Building machine learning based predictors, we identify Th1 and Th2 specific enhancer and repressive state classes characterized by their response patterns to cytokine stimuli and Tbet dose. In turn, we use chromatin immunoprecipitation data of transcription factors to define the transcriptional regulatory logic governing the activities of the enhancer classes. Finally, we combine enhancer-target gene maps and enhancer regulatory logic as well as inhibitory elements to infer a bipartite epigenetic network. The network architecture builds on enhancer and repressive state classes as well as on genes and transcription factors leading to a weighted multidigraph. The network topology reveals distinct community structures related to Th1, Th2 and hybrid functionality. We furthermore analyse multiplex networks resulting in condition-specific topologies. From these analyses we obtain unique contributions of distinct network nodes. Utilizing random walks on multidigraphs we extract metastable processes underlying the observed system. In conclusion we present a robust quantitative framework for mapping chromatin states to gene activity, and, by factoring in transcription factor regulation of enhancers, inferring epigenetic regulatory networks. This methodology is applicable to a wide range of systems

Heidelberger Dokumentenserver

Recommended from our members

Computational Methods for Comparative Genomic and Epigenomic Annotations across Multiple Species

Author: Arneson Adriana Cristina
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

In recent years Genome Wide Association Studies (GWAS) and large-scale whole genome sequencing case-control studies have led to the identification of a wealth of phenotype-associated and rare genetic variants. Interpreting the biological significance of these variants has been a significant challenge, especially since a large majority of their genomic locations fall within non-protein coding genomic regions. Here we present a computational method, ConsHMM, for annotating the genome at single-nucleotide resolution into a set of conservation states learned from the combinatorial and spatial patterns of species aligning and matching a reference genome in a multiple-sequence alignment. Conservation states have specific enrichments for orthogonal biological annotations and can be used for interpreting genetic variants. We provide here a comprehensive resource of conservation state annotations, the ConsHMM atlas, comprised of models and annotations for eight different organisms based on several multiple-sequence alignments. At the epigenomic level, modifications such as DNA methylation have emerged as useful biomarkers for several phenotypes, but a large majority of these phenotypes have been studied predominantly in human samples. Leveraging sequence conservation among genomes, we have designed a methylation array that can query DNA methylation of many different mammals, and therefore facilitate cross species epigenetic studies. The array has been produced and used to profile 8730 samples from 145 different mammals. In summary, this work takes a comparative genomics based approach to expanding the available genomic and epigenomic annotations of multiple species

eScholarship - University of California

Engineering a Mastoparan Peptide Concatemer Prodrug From CircRNA for Cancer Therapy

Author: Grewcock Declan
Publication venue
Publication date: 22/07/2023
Field of study

CircRNAs are covalently closed loops of RNA formed as products of RNA backsplicing in mammalian cells. Engineered circRNAs containing a desired coding sequence have been produced using self-splicing introns. Translatable circRNAs require an internal ribosomal entry site or m6A methylation site for translation initiation. CircRNAs with a nucleotide length a multiple of three, a start codon, and no stop codon in the same frame have an infinite open reading frame. This project aimed to produce a mastoparan peptide concatemer prodrug from circRNA for treatment in cancer therapeutics. Anabaena group I self-splicing introns were used to circularise a mastoparan prodrug containing a metalloproteinase cleavage site for activation (construct named Anabaena Mastoparan). RNA circularisation was achieved in vitro but not in mammalian cells, indicating that group I Anabaena introns do not have the catalytic ability to splice in mammalian cells. Mastoparan peptides were detected in vitro and in vivo after adding a Flag tag to the Anabaena Mastoparan construct. However, only peptides produced from unspliced RNA translation were detected. Mastoparan peptides extracted from Anabaena Mastoparan transfected cells caused cytotoxicity when added to the culture medium of MDA-MB-231 and MCF-7 cells. Anabaena Mastoparan transfection did not directly lead to cytotoxicity, demonstrating the effectiveness of mastoparan as a prodrug, only being activated by metalloproteinase cleavage in the extracellular environment. This project aimed to identify endogenous circRNAs that have the coding potential to produce a peptide with a different biological function to their parent gene. Using a Bioinformatics approach, circRNAs containing an ORF through the circular junction were identified. Their ORF through junction peptides were investigated for differences in predicted function to their parent gene using InterProScan and Protein Homology/analogY Recognition (Phyre2). Using this approach, four candidate circRNAs were identified that encode a predicted peptide with a different biological function to their parent gene. The four candidate circRNAs contain either a predicted m6A or an internal ribosomal entry site for translation initiation, and have a codon adaption index score (CAI) between 0.781 and 0.821, comparable to the 75th percentile of ORFs through the circular junction (079), and the mean CAI score of coding sequence mRNA. This project demonstrates that the circular junction of circRNAs can provide the coding potential to produce unique peptides with a different function to their parent gene

Nottingham eTheses

Engineering a Mastoparan Peptide Concatemer Prodrug From CircRNA for Cancer Therapy

Author: Grewcock Declan
Publication venue
Publication date
Field of study

Nottingham ePrints

Application of multivariate statistics and machine learning to phenotypic imaging and chemical high-content data

Author: Wildenhain Jan
Publication venue: The University of Edinburgh
Publication date: 29/11/2016
Field of study

Image-based high-content screens (HCS) hold tremendous promise for cell-based phenotypic screens. Challenges related to HCS include not only storage and management of data, but critical analysis of the complex image-based data. I implemented a data storage and screen management framework and developed approaches for data analysis of a number high-content microscopy screen formats. I visualized and analysed pilot screens to develop a robust multi-parametric assay for the identification of genes involved in DNA damage repair in HeLa cells. Further, I developed and implemented new approaches for image processing and screen data normalization. My analyses revealed that the ubiquitin ligase RNF8 plays a central role in DNA-damage response and that a related ubiquitin ligase RNF168 causes the cellular and developmental phenotypes characteristic for the RIDDLE syndrome. My approaches also uncovered a role for the MMS22LTONSL complex in DSB repair and its role in the recombination-dependent repair of stalled or collapsed replication forks. The discovery of novel bioactive molecules is a challenge because the fraction of active candidate molecules is usually small and confounded by noise in experimental readouts. Cheminformatics can improve robustness of chemical high-throughput screens and functional genomics data sets by taking structure-activity relationships into account. I applied statistics, machine learning and cheminformatics to different data sets to discern novel bioactive compounds. I showed that phenothiazines and apomorphines are regulators for cell differentiation in murine embryonic stem cells. Further, I pioneered computational methods for the identification of structural features that influence the degradation and retention of compounds in the nematode C. elegans. I used chemoinformatics to assemble a comprehensive screening library of previously approved drugs for redeployment in new bioassays. A combination of chemical genetic interactions, cheminformatics and machine learning allowed me to predict novel synergistic antifungal small molecule combinations from sensitized screens with the drug library. In another study on the biological effects of commonly prescribed psychoactive compounds, I discovered a strong link between lipophilicity and bioactivity of compounds in yeast and unexpected off-target effects that could account for unwanted side effects in humans. I also investigated structure-activity relationships and assessed the chemical diversity of a compound collection that was used to probe chemical-genetic interactions in yeast. Finally, I have made these methods and tools available to the scientific community, including an open source software package called MolClass that allows researchers to make predictions about bioactivity of small molecules based on their chemical structure

Edinburgh Research Archive

Annual Report

Author: Stillman Bruce
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/01/2014
Field of study

Cold Spring Harbor Laboratory Institutional Repository