Search CORE

19 research outputs found

Artificial intelligence used in genome analysis studies

Author: D'Agaro Edo
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2018
Field of study

Next Generation Sequencing (NGS) or deep sequencing technology enables parallel reading of multiple individual DNA fragments, thereby enabling the identification of millions of base pairs in several hours. Recent research has clearly shown that machine learning technologies can efficiently analyse large sets of genomic data and help to identify novel gene functions and regulation regions. A deep artificial neural network consists of a group of artificial neurons that mimic the properties of living neurons. These mathematical models, termed Artificial Neural Networks (ANN), can be used to solve artificial intelligence engineering problems in several different technological fields (e.g., biology, genomics, proteomics, and metabolomics). In practical terms, neural networks are non-linear statistical structures that are organized as modelling tools and are used to simulate complex genomic relationships between inputs and outputs. To date, Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNN) have been demonstrated to be the best tools for improving performance in problem solving tasks within the genomic field

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Udine

Directory of Open Access Journals

Atlas of Transcription Factor Binding Sites from ENCODE DNase Hypersensitivity Data across 27 Tissue Types.

Author: Ament Seth A
Casella Alex M
Chard Kyle
Donovan-Maiye Rory
Ertekin-Taner Nilufer
Foster Ian
Funk Cory C
Glusman Gustavo
Golde Todd E
Heavner Ben
Hood Leroy
Jung Segun
Kesselman Carl
Madduri Ravi
Price Nathan D
Richards Matthew A
Rodriguez Alex
Shannon Paul
Toga Arthur
Van Horn John D
Xiao Yukai
Publication venue: Providence St. Joseph Health Digital Commons
Publication date: 18/08/2020
Field of study

Characterizing the tissue-specific binding sites of transcription factors (TFs) is essential to reconstruct gene regulatory networks and predict functions for non-coding genetic variation. DNase-seq footprinting enables the prediction of genome-wide binding sites for hundreds of TFs simultaneously. Despite the public availability of high-quality DNase-seq data from hundreds of samples, a comprehensive, up-to-date resource for the locations of genomic footprints is lacking. Here, we develop a scalable footprinting workflow using two state-of-the-art algorithms: Wellington and HINT. We apply our workflow to detect footprints in 192 ENCODE DNase-seq experiments and predict the genomic occupancy of 1,515 human TFs in 27 human tissues. We validate that these footprints overlap true-positive TF binding sites from ChIP-seq. We demonstrate that the locations, depth, and tissue specificity of footprints predict effects of genetic variants on gene expression and capture a substantial proportion of genetic risk for complex traits

Providence St. Joseph Health Digital Commons

MOST+: A de novo motif finding approach combining genomic sequence and heterogeneous genome-wide signatures

Author: Chaochun Wei
Guangyong Zheng
Yizhe Zhang
Yupeng He
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Crossref

Springer - Publisher Connector

Integrating Diverse Datasets Improves Developmental Enhancer Prediction

Author: A Arvey
A Barski
A Ben-Hur
A He
A Miquelajauregui
A Rada-Iglesias
A Siepel
A Visel
A Visel
A Visel
A Visel
A Visel
A Visel
A Woolfe
A Woznica
AI Su
AP Boyle
AR Quinlan
AS Nord
BW Busser
C Cheng
C Jin
C Leslie
CE Grant
CM Koch
CT Ong
CY McLean
D Lee
D May
D Wang
Dennis Kostka
DM McGaughey
DS Johnson
DU Gorkin
E Birney
E Seuntjens
G Cuellar-Partida
GE Zentner
Genevieve D. Erwin
GM Burzynski
H Lahdesmaki
HH He
I Dunham
J Banerji
J Cotney
J Ernst
JA Capra
JA Capra
JA Wamstad
John A. Capra
JP Noonan
K Koshiba-Takeuchi
K Lindblad-Toh
KA Aldinger
Karl K. Murphy
Katherine S. Pollard
KJ Won
KS Pollard
KY Yip
L Narlikar
L Taher
LA Hindorff
LA Pennacchio
M Bulger
M Kloft
M Levine
M Wilson
MA Nobrega
MA White
MJ Blow
MM El-Kasti
MM Hoffman
MP Creyghton
MR Kantorovitz
N Oksenberg
N Rajagopal
Nadav Ahituv
ND Heintzman
ND Heintzman
NE Renthal
Nir Oksenberg
NJ Sakabe
PG Giresi
Q Li
Q Weng
R Andersson
R O'Rahilly
R Pique-Regi
RE Thurman
Rebecca M. Truty
RP Zinzen
RS Smith
S Bonn
S Ghisletti
S Lomvardas
S Prabhakar
S Salzberg
S Sonnenburg
S Sonnenburg
SD Gillies
SJ Sholtis
SL Paige
T Casci
T Kume
T Kume
TG Dietterich
TS Mikkelsen
UA Orom
Uwe Ohler
VW Zhou
Z Wang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 27/09/2013
Field of study

Gene-regulatory enhancers have been identified using various approaches, including evolutionary conservation, regulatory protein binding, chromatin modifications, and DNA sequence motifs. To integrate these different approaches, we developed EnhancerFinder, a two-step method for distinguishing developmental enhancers from the genomic background and then predicting their tissue specificity. EnhancerFinder uses a multiple kernel learning approach to integrate DNA sequence motifs, evolutionary patterns, and diverse functional genomics datasets from a variety of cell types. In contrast with prediction approaches that define enhancers based on histone marks or p300 sites from a single cell line, we trained EnhancerFinder on hundreds of experimentally verified human developmental enhancers from the VISTA Enhancer Browser. We comprehensively evaluated EnhancerFinder using cross validation and found that our integrative method improves the identification of enhancers over approaches that consider a single type of data, such as sequence motifs, evolutionary conservation, or the binding of enhancer-associated proteins. We find that VISTA enhancers active in embryonic heart are easier to identify than enhancers active in several other embryonic tissues, likely due to their uniquely high GC content. We applied EnhancerFinder to the entire human genome and predicted 84,301 developmental enhancers and their tissue specificity. These predictions provide specific functional annotations for large amounts of human non-coding DNA, and are significantly enriched near genes with annotated roles in their predicted tissues and lead SNPs from genome-wide association studies. We demonstrate the utility of EnhancerFinder predictions through in vivo validation of novel embryonic gene regulatory enhancers from three developmental transcription factor loci. Our genome-wide developmental enhancer predictions are freely available as a UCSC Genome Browser track, which we hope will enable researchers to further investigate questions in developmental biology. © 2014 Erwin et al

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

D-Scholarship@Pitt

FigShare

Predicting tissue specific transcription factor binding sites

Author: Shan Zhong
Xin He
Ziv Bar-Joseph
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Crossref

Springer - Publisher Connector

Linking Proteomic and Transcriptional Data through the Interactome and Epigenome Reveals a Map of Oncogene-induced Signaling

Author: A Ceol
A Gaulton
A Ghazalpour
A Lan
AB Heimberger
Adam Labadorf
AE Kel
B Aranda
B Hanstein
B Langmead
B Mukherjee
B Schwanhäusser
BC Foat
BC Foat
BC Foat
C Kim
C Knox
C Liu
C Ritz
C Stark
C-L Tso
Candace R. Chouinard
CD Andl
CE Pelloski
CM Klinge
CM-E Sauvageot
CS Ross-Innes
CT Harbison
D Guo
D Hanahan
D Hanahan
D Yin
David C. Clarke
DB Ramnarain
Douglas A. Lauffenburger
DP Schunemann
DT Odom
E Cerami
E Eden
E Galanis
E Lee
E Lundberg
E Yeger-Lotem
ER Levin
Ernest Fraenkel
F Markowetz
F Yamoutpour
G Cuellar Partida
G Ling
GC Kabat
GD Bader
GK Smyth
H Dong
H Johnson
H Shao
H-W Lo
HI Robins
HS Huang
I Ljubić
I Thiele
I Ulitsky
IY Eyüpoglu
JM Gil
JR Hesselberth
JS Lewis-Wambi
JV Olsen
KD MacIsaac
KH Emami
KV Lu
L Björnström
L Choy
LJ Zhu
M Bansal
M Lepourcelet
MD Robinson
MJ Clark
MM Feldkamp
MS Carro
MW Pedersen
MW Pedersen
N de la Iglesia
P Flicek
P Hallock
P Pu
P-C Leow
PH Huang
PJ Sabo
Q Li
R Bonavia
R Chen
R Kalluri
R Nishikawa
R Pique-Regi
R Schiff
R Zeineldin
RGW Verhaak
RH Shoemaker
RM Hallett
RM Myers
S Bamford
S Imarisio
S Kerrien
S Razick
S Schinner
S-SC Huang
SA Prigent
Sara J. C. Gosline
Shao-shan Carol Huang
SP Panicker
SZ Usmani
T Nagashima
T Takano
TS Keshava Prasad
V Matys
V Milano
W Couldwell
W Lu
W Wei
W Wick
William Gordon
William Stafford Noble
X Liu
Y Benjamini
Y Narita
Y Ning
Y Wang
Y Zhang
Z Wu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/03/2012
Field of study

Cellular signal transduction generally involves cascades of post-translational protein modifications that rapidly catalyze changes in protein-DNA interactions and gene expression. High-throughput measurements are improving our ability to study each of these stages individually, but do not capture the connections between them. Here we present an approach for building a network of physical links among these data that can be used to prioritize targets for pharmacological intervention. Our method recovers the critical missing links between proteomic and transcriptional data by relating changes in chromatin accessibility to changes in expression and then uses these links to connect proteomic and transcriptome data. We applied our approach to integrate epigenomic, phosphoproteomic and transcriptome changes induced by the variant III mutation of the epidermal growth factor receptor (EGFRvIII) in a cell line model of glioblastoma multiforme (GBM). To test the relevance of the network, we used small molecules to target highly connected nodes implicated by the network model that were not detected by the experimental data in isolation and we found that a large fraction of these agents alter cell viability. Among these are two compounds, ICG-001, targeting CREB binding protein (CREBBP), and PKF118–310, targeting β-catenin (CTNNB1), which have not been tested previously for effectiveness against GBM. At the level of transcriptional regulation, we used chromatin immunoprecipitation sequencing (ChIP-Seq) to experimentally determine the genome-wide binding locations of p300, a transcriptional co-regulator highly connected in the network. Analysis of p300 target genes suggested its role in tumorigenesis. We propose that this general method, in which experimental measurements are used as constraints for building regulatory networks from the interactome while taking into account noise and missing data, should be applicable to a wide range of high-throughput datasets.National Science Foundation (U.S.) (DB1-0821391)National Institutes of Health (U.S.) (Grant U54-CA112967)National Institutes of Health (U.S.) (Grant R01-GM089903)National Institutes of Health (U.S.) (P30-ES002109

Public Library of Science (PLOS)

DSpace@MIT

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Analysis and modeling of the ecdysone response in Drosophila melanogaster

Author: Cortini Roberto
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 12/11/2018
Field of study

Epigenetic priors for identifying active transcription factor binding sites

Author: Bailey
Barski
Bernat
Boyle
Crawford
Cui
Duda
Ernst
Fabian A. Buske
Gabriel Cuellar-Partida
Gordân
Grant
Heintzman
Heintzmann
Hesselberth
Keene
Kurdistani
Lahdesmaki
McArthur
Mikkelsen
Myers
Narlikar
Pique-Regi
Robert C. McLeay
Robertson
Sinha
Swets
Timothy L. Bailey
Tom Whitington
Whitington
William Stafford Noble
Won
Won
Wu
Zhou
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

Motivation Accurate knowledge of the genome-wide binding of transcription factors in a particular cell type or under a particular condition is necessary for understanding transcriptional regulation. Using epigenetic data such as histone modification and DNase I, accessibility data has been shown to improve motif-based in silico methods for predicting such binding, but this approach has not yet been fully explored. Results We describe a probabilistic method for combining one or more tracks of epigenetic data with a standard DNA sequence motif model to improve our ability to identify active transcription factor binding sites (TFBSs). We convert each data type into a position-specific probabilistic prior and combine these priors with a traditional probabilistic motif model to compute a log-posterior odds score. Our experiments, using histone modifications H3K4me1, H3K4me3, H3K9ac and H3K27ac, as well as DNase I sensitivity, show conclusively that the log-posterior odds score consistently outperforms a simple binary filter based on the same data. We also show that our approach performs competitively with a more complex method, CENTIPEDE, and suggest that the relative simplicity of the log-posterior odds scoring method makes it an appealing and very general method for identifying functional TFBSs on the basis of DNA and epigenetic evidence

Crossref

University of Queensland eSpace