Search CORE

14,349 research outputs found

DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences.

Author: Quang Daniel
Xie Xiaohui
Publication venue: eScholarship, University of California
Publication date: 15/04/2016
Field of study

Modeling the properties and functions of DNA sequences is an important, but challenging task in the broad field of genomics. This task is particularly difficult for non-coding DNA, the vast majority of which is still poorly understood in terms of function. A powerful predictive model for the function of non-coding DNA can have enormous benefit for both basic science and translational research because over 98% of the human genome is non-coding and 93% of disease-associated variants lie in these regions. To address this need, we propose DanQ, a novel hybrid convolutional and bi-directional long short-term memory recurrent neural network framework for predicting non-coding function de novo from sequence. In the DanQ model, the convolution layer captures regulatory motifs, while the recurrent layer captures long-term dependencies between the motifs in order to learn a regulatory 'grammar' to improve predictions. DanQ improves considerably upon other models across several metrics. For some regulatory markers, DanQ can achieve over a 50% relative improvement in the area under the precision-recall curve metric compared to related models. We have made the source code available at the github repository http://github.com/uci-cbcl/DanQ

PubMed Central

eScholarship - University of California

Recommended from our members

Meta-analysis of massively parallel reporter assays enables prediction of regulatory function across cell types.

Author: Ahituv Nadav
Kreimer Anat
Yan Zhongxia
Yosef Nir
Publication venue: eScholarship, University of California
Publication date: 01/09/2019
Field of study

Deciphering the potential of noncoding loci to influence gene regulation has been the subject of intense research, with important implications in understanding genetic underpinnings of human diseases. Massively parallel reporter assays (MPRAs) can measure regulatory activity of thousands of DNA sequences and their variants in a single experiment. With increasing number of publically available MPRA data sets, one can now develop data-driven models which, given a DNA sequence, predict its regulatory activity. Here, we performed a comprehensive meta-analysis of several MPRA data sets in a variety of cellular contexts. We first applied an ensemble of methods to predict MPRA output in each context and observed that the most predictive features are consistent across data sets. We then demonstrate that predictive models trained in one cellular context can be used to predict MPRA output in another, with loss of accuracy attributed to cell-type-specific features. Finally, we show that our approach achieves top performance in the Fifth Critical Assessment of Genome Interpretation "Regulation Saturation" Challenge for predicting effects of single-nucleotide variants. Overall, our analysis provides insights into how MPRA data can be leveraged to highlight functional regulatory regions throughout the genome and can guide effective design of future experiments by better prioritizing regions of interest

eScholarship - University of California

Workshop during the Pacific Symposium of Biocomputing, Jan 3-7, 2019: Reading between the genes: interpreting non-coding DNA in high-throughput

Author: Berghout Joanne
Bulyk Martha L
Kann Maricel G
Lussier Yves A
Moore Jason H
Vitali Francesca
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/2019
Field of study

Identifying functional elements and predicting mechanistic insight from non-coding DNA and non-coding variation remains a challenge. Advances in genome-scale, high-throughput technology, however, have brought these answers closer within reach than ever, though there is still a need for new computational approaches to analysis and integration. This workshop aims to explore these resources and new computational methods applied to regulatory elements, chromatin interactions, non-protein-coding genes, and other non-coding DNA.Open access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]

Crossref

The University of Arizona

Recommended from our members

Landscape of stimulation-responsive chromatin across diverse human immune cells.

Author: Anderson Mark S
Blaeschke Franziska
Burt Trevor D
Calderon Diego
Criswell Lindsey A
Gao Ziyue
Greenleaf William J
Kathiria Arwa
Knowles David A
Lescano Ninnia
Marson Alexander
Mezger Anja
Müller Fabian
Nguyen Michelle LT
Nguyen Vinh
Parent Audrey V
Pritchard Jonathan K
Ribado Jessica V
Trombetta John
Wu Beijing
Publication venue: eScholarship, University of California
Publication date: 01/10/2019
Field of study

A hallmark of the immune system is the interplay among specialized cell types transitioning between resting and stimulated states. The gene regulatory landscape of this dynamic system has not been fully characterized in human cells. Here we collected assay for transposase-accessible chromatin using sequencing (ATAC-seq) and RNA sequencing data under resting and stimulated conditions for up to 32 immune cell populations. Stimulation caused widespread chromatin remodeling, including response elements shared between stimulated B and T cells. Furthermore, several autoimmune traits showed significant heritability in stimulation-responsive elements from distinct cell types, highlighting the importance of these cell states in autoimmunity. Allele-specific read mapping identified variants that alter chromatin accessibility in particular conditions, allowing us to observe evidence of function for a candidate causal variant that is undetected by existing large-scale studies in resting cells. Our results provide a resource of chromatin dynamics and highlight the need to characterize the effects of genetic variation in stimulated cells

eScholarship - University of California

Recommended from our members

The impact of short tandem repeat variation on gene expression.

Author: Fotsing Stephanie Feupe
Goren Alon
Gymrek Melissa
Margoliash Jonathan
Saini Shubham
Shleizer-Burko Sharona
Wang Catherine
Yanicky Richard
Publication venue: eScholarship, University of California
Publication date: 01/11/2019
Field of study

Short tandem repeats (STRs) have been implicated in a variety of complex traits in humans. However, genome-wide studies of the effects of STRs on gene expression thus far have had limited power to detect associations and provide insights into putative mechanisms. Here, we leverage whole-genome sequencing and expression data for 17 tissues from the Genotype-Tissue Expression Project to identify more than 28,000 STRs for which repeat number is associated with expression of nearby genes (eSTRs). We use fine-mapping to quantify the probability that each eSTR is causal and characterize the top 1,400 fine-mapped eSTRs. We identify hundreds of eSTRs linked with published genome-wide association study signals and implicate specific eSTRs in complex traits, including height, schizophrenia, inflammatory bowel disease and intelligence. Overall, our results support the hypothesis that eSTRs contribute to a range of human phenotypes, and our data should serve as a valuable resource for future studies of complex traits

eScholarship - University of California

Genetic determinants of co-accessible chromatin regions in activated T cells across humans.

Author: A Barrie
A Battle
A Franke
AA Shabalin
AM Klein
AR Quinlan
Atsede Siba
Aviv Regev
Aviva P. Aiden
B Li
BE Stranger
C Hou
Christine S. Cheng
Christophe Benoist
Chun J. Ye
CJ Ye
CK Stroud
D Hnisz
D Lee
D Sakata
DE Speiser
Dmytro Lituiev
E Elinav
E Splinter
EM Schmidt
Erez Lieberman Aiden
EZ Macosko
G Jun
G McVicker
H Kilpinen
H Li
H Li
HK Finucane
HM Kang
Howard Y. Chang
Ido Machol
Ivo Wortman
J Yang
JD Buenrostro
JD Buenrostro
JD Storey
JE Phillips
JF Degner
JN Hirschhorn
JS Delisle
K Enjyoji
Kendrick L. Hougen
KK Farh
L Chen
L Plesner
M Feuerer
M Ghandi
M Kasowski
M Kronenberg
M Kurachi
M. Grace Gordon
Marcin Tabaka
MB Gerstein
Meena Subramaniam
MI Love
MI McCarthy
Michael A. Beer
MN Lee
MT Maurano
Muhammad Shamim
MY Donath
N Kumasaka
NC Durand
Neva C. Durand
NP Restifo
P Cauchy
P Li
PC Hollenhorst
Philip L. De Jager
PM Visscher
PS Ohashi
R Satija
Rachel E. Gate
RE Thurman
RM Samstein
Roadmap Epigenomics Consortium
S Deaglio
S Heinz
S Neph
SM Waszak
SS Rao
Su-Chen Huang
T Lappalainen
T Raj
The ENCODE Project Consortium.
Ting Feng
TL Murphy
UM Marigorta
WA Whyte
WJ Astle
X Chen
X Sun
Y Belkaid
Y Zhang
YY Fan
Publication venue: eScholarship, University of California
Publication date: 01/08/2018
Field of study

Over 90% of genetic variants associated with complex human traits map to non-coding regions, but little is understood about how they modulate gene regulation in health and disease. One possible mechanism is that genetic variants affect the activity of one or more cis-regulatory elements leading to gene expression variation in specific cell types. To identify such cases, we analyzed ATAC-seq and RNA-seq profiles from stimulated primary CD4+ T cells in up to 105 healthy donors. We found that regions of accessible chromatin (ATAC-peaks) are co-accessible at kilobase and megabase resolution, consistent with the three-dimensional chromatin organization measured by in situ Hi-C in T cells. Fifteen percent of genetic variants located within ATAC-peaks affected the accessibility of the corresponding peak (local-ATAC-QTLs). Local-ATAC-QTLs have the largest effects on co-accessible peaks, are associated with gene expression and are enriched for autoimmune disease variants. Our results provide insights into how natural genetic variants modulate cis-regulatory elements, in isolation or in concert, to influence gene expression

Crossref

eScholarship - University of California

Learning the Regulatory Code of Gene Expression

Author: Buric Filip
Garcia Victor
Kokina Mariia
Zelezniak Aleksej
Zrimec Jan
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2021
Field of study

Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecular phenotypes and decode the cis-regulatory grammar from prokaryotic and eukaryotic sequencing data. Our approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance. We thus provide a quantitative view of gene expression regulation from nucleotide sequence, concluding with an information-centric overview of the central dogma of molecular biology

PubMed Central

Chalmers Research

ZHAW digitalcollection

Online Research Database In Technology