Search CORE

457 research outputs found

Tartarus: A Benchmarking Platform for Realistic And Practical Inverse Molecular Design

Author: Aspuru-Guzik Alan
Jorner Kjell
Kundaje Anshul
Nigam AkshatKumar
Pollice Robert
Thiede Luca A.
Tom Gary
Publication venue
Publication date: 23/09/2023
Field of study

The efficient exploration of chemical space to design molecules with intended properties enables the accelerated discovery of drugs, materials, and catalysts, and is one of the most important outstanding challenges in chemistry. Encouraged by the recent surge in computer power and artificial intelligence development, many algorithms have been developed to tackle this problem. However, despite the emergence of many new approaches in recent years, comparatively little progress has been made in developing realistic benchmarks that reflect the complexity of molecular design for real-world applications. In this work, we develop a set of practical benchmark tasks relying on physical simulation of molecular systems mimicking real-life molecular design problems for materials, drugs, and chemical reactions. Additionally, we demonstrate the utility and ease of use of our new benchmark set by demonstrating how to compare the performance of several well-established families of algorithms. Surprisingly, we find that model performance can strongly depend on the benchmark domain. We believe that our benchmark suite will help move the field towards more realistic molecular design benchmarks, and move the development of inverse molecular design algorithms closer to designing molecules that solve existing problems in both academia and industry alike.Comment: 29+21 pages, 6+19 figures, 6+2 table

arXiv.org e-Print Archive

From a Conceptual Model to a Knowledge Graph for Genomic Datasets

Author: A Bernasconi
A Kundaje
A Messina
AL Palacio
AM Martínez Ferrandis
J Hammer
JF Reyes Román
M Masseroli
MA Jensen
V Bonnici
ZD Stephens
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Data access at genomic repositories is problematic, as data is described by heterogeneous and hardly comparable metadata. We previously introduced a unified conceptual schema, collected metadata in a single repository and provided classical search methods upon them. We here propose a new paradigm to support semantic search of integrated genomic metadata, based on the Genomic Knowledge Graph, a semantic graph of genomic terms and concepts, which combines the original information provided by each source with curated terminological content from specialized ontologies. Commercial knowledge-assisted search is designed for transparently supporting keyword-based search without explaining inferences; in biology, inference understanding is instead critical. For this reason, we propose a graph-based visual search for data exploration; some expert users can navigate the semantic graph along the conceptual schema, enriched with simple forms of homonyms and term hierarchies, thus understanding the semantic reasoning behind query results

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Detection of regulator genes and eQTLs in gene networks

Author: A Butte
A Chatr-Aryamontri
A Clauset
A Joshi
A Joshi
A Kundaje
AA Shabalin
AJ Enright
AJ Walhout
AS Dimas
B Schwanhausser
B Zhang
B Zhang
C Cenik
CO Daub
D Koller
DA Cusanovich
DM Greenawalt
E Bonnet
E Ravasz
E Segal
EC Neto
EC Neto
EC Neto
EE Schadt
EE Schadt
EE Schadt
EE Schadt
EE Schadt
EJ Foss
F Grubert
F Yue
FA Cubillos
FW Albert
G Hemani
G Nicholson
GD Smith
GH Golub
H Foroughi Asl
H Talukdar
HN Kadarmideen
J Millstein
J Qi
J Zhu
J Zhu
J Zhu
JE Aten
JF Ayroles
JJ Faith
JL Björkegren
JS Liu
K Basso
K Qu
KG Ardlie
L Wu
LA Hindorff
LH Hartwell
LS Chen
M Ashburner
M Civelek
M Georges
M Gerstein
M Medvedovic
M Schmidt
M Scutari
MA Schaub
MB Eisen
MD Ritchie
ME Goddard
MEJ Newman
MEJ Newman
MV Rockman
MV Rockman
N Friedman
N Friedman
N Friedman
N Laird
O Stegle
P Langfelder
P Langfelder
P Langfelder
P Lu
R Sharan
R Sharan
RB Brem
RW Williams
S Lee
S Roy
S Tavazoie
SI Lee
SM Waszak
SS Rao
T Lappalainen
T Michoel
TA Manolio
TF Mackay
The ENCODE
TS Furey
VG Cheung
W Cookson
W Zhang
Y Chen
Y Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2016
Field of study

Genetic differences between individuals associated to quantitative phenotypic traits, including disease states, are usually found in non-coding genomic regions. These genetic variants are often also associated to differences in expression levels of nearby genes (they are "expression quantitative trait loci" or eQTLs for short) and presumably play a gene regulatory role, affecting the status of molecular networks of interacting genes, proteins and metabolites. Computational systems biology approaches to reconstruct causal gene networks from large-scale omics data have therefore become essential to understand the structure of networks controlled by eQTLs together with other regulatory genes, and to generate detailed hypotheses about the molecular mechanisms that lead from genotype to phenotype. Here we review the main analytical methods and softwares to identify eQTLs and their associated genes, to reconstruct co-expression networks and modules, to reconstruct causal Bayesian gene and module networks, and to validate predicted networks in silico.Comment: minor revision with typos corrected; review article; 24 pages, 2 figure

arXiv.org e-Print Archive

Crossref

Differential viral accessibility (DIVA) identifies alterations in chromatin architecture through large-scale mapping of lentiviral integration sites.

Author: A Kundaje
AP Boyle
AR Quinlan
DE Schones
G Gargiulo
IA Tchasovnikarova
IA Tchasovnikarova
Iva A. Tchasovnikarova
JD Buenrostro
JD Buenrostro
JE Carette
LT Jae
M Kvaratskhelia
M Tsompana
Paul J. Lehner
PB Chen
PG Giresi
Richard T. Timms
RT Timms
RT Timms
RT Timms
VA Blomen
Z Debyser
Publication venue: Nat Protoc
Publication date: 01/01/2019
Field of study

Alterations in chromatin structure play a major role in the epigenetic regulation of gene expression. Here, we describe a step-by-step protocol for differential viral accessibility (DIVA), a method for identifying changes in chromatin accessibility genome-wide. Commonly used methods for mapping accessible genomic loci have strong preferences toward detecting 'open' chromatin found at regulatory regions but are not well suited to studying chromatin accessibility in gene bodies and intergenic regions. DIVA overcomes this limitation, enabling a broader range of sites to be interrogated. Conceptually, DIVA is similar to ATAC-seq in that it relies on the integration of exogenous DNA into the genome to map accessible chromatin, except that chromatin architecture is probed through mapping integration sites of exogenous lentiviruses. An isogenic pair of cell lines are transduced with a lentiviral vector, followed by PCR amplification and Illumina sequencing of virus-genome junctions; the resulting sequences define a set of unique lentiviral integration sites, which are compared to determine whether genomic loci exhibit significantly altered accessibility between experimental and control cells. Experienced researchers will take 6 d to generate lentiviral stocks and transduce the target cells, a further 5 d to prepare the Illumina sequencing libraries and a few hours to perform the bioinformatic analysis

Crossref

Apollo (Cambridge)

Quantitative real-time PCR assisted cell counting (qPACC) for epigenetic - based immune cell quantification in blood and tissue

Author: A Kundaje
DJ Huss
G Wieczorek
I Turbachova
J Sehouli
K Schildknecht
S Steinfelder
U Baron
Y Kitagawa
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Genome-wide enhancer maps link risk variants to disease genes

Author: Bergman DT
Collins RL
Cui A
Daly MJ
Dey K
Doughty BR
Eisenhaure TM
Engreitz JM
Epstein CB
Finucane HK
Fulco CP
Guckelberger P
Hacohen N
Huang HL
Jones TR
Kane M
Kang HY
Kundaje A
Lander ES
Lekschas F
Mualim K
Munson G
Nasser J
Natri HM
Nguyen TH
Patwardhan TA
Pfister H
Price AL
Ray JP
Ulirsch JC
Weeks EM
Xavier RJ
Publication venue
Publication date: 13/05/2021
Field of study

Genome-wide association studies (GWAS) have identified thousands of noncoding loci that are associated with human diseases and complextraits, each of which could reveal insights into the mechanisms of disease(1). Many ofthe underlying causal variants may affect enhancers(2,3), but we lack accurate maps of enhancers and their target genes to interpret such variants. We recently developed the activity-by-contact (ABC) model to predict which enhancers regulate which genes and validated the model using CRISPR perturbations in several cell types(4). Here we apply this ABC model to create enhancer-gene maps in 131 human cell types and tissues, and use these maps to interpret the functions of GWAS variants. Across 72 diseases and complex traits, ABC links 5,036 GWAS signals to 2,249 unique genes, including a class of 577genesthat appear to influence multiple phenotypes through variants in enhancers that act in different cell types. In inflammatory bowel disease (IBD), causal variants are enriched in predicted enhancers by more than 20-fold in particular cell types such as dendritic cells, and ABC achieves higher precision than other regulatory methods at connecting noncoding variants to target genes. These variant-to-function maps reveal an enhancer that contains an IBD risk variant and that regulates the expression of PPIF to alter the membrane potential of mitochondria in macrophages. Our study reveals principles of genome regulation, identifies genes that affect IBD and provides a resource and generalizable strategy to connect risk variants of common diseases to their molecular and cellular functions.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Lineage-specific dynamic and pre-established enhancer–promoter contacts cooperate in terminal differentiation

Chromosome conformation is an important feature of metazoan gene regulation; however, enhancer–promoter contact remodeling during cellular differentiation remains poorly understood. To address this, genome-wide promoter capture Hi-C (CHi-C) was performed during epidermal differentiation. Two classes of enhancer–promoter contacts associated with differentiation-induced genes were identified. The first class ('gained') increased in contact strength during differentiation in concert with enhancer acquisition of the H3K27ac activation mark. The second class ('stable') were pre-established in undifferentiated cells, with enhancers constitutively marked by H3K27ac. The stable class was associated with the canonical conformation regulator cohesin, whereas the gained class was not, implying distinct mechanisms of contact formation and regulation. Analysis of stable enhancers identified a new, essential role for a constitutively expressed, lineage-restricted ETS-family transcription factor, EHF, in epidermal differentiation. Furthermore, neither class of contacts was observed in pluripotent cells, suggesting that lineage-specific chromatin structure is established in tissue progenitor cells and is further remodeled in terminal differentiation

Crossref

UCL Discovery

Defining functional DNA elements in the human genome

Author: Bernstein B. E.
Birney E.
Crawford G. E.
Dekker J.
Dunham I.
Elnitski L. L.
Farnham P. J.
Feingold E. A.
Gerstein M.
Giddings M. C.
Gilbert D. M.
Gingeras T. R.
Green E. D.
Guigo R.
Hardison R. C.
Hubbard T.
Kellis M.
Kent J.
Kundaje A.
Lieb J. D.
Marinov G. K.
Myers R. M.
Pazin M. J.
Ren B.
Snyder M. P.
Stamatoyannopoulos J. A.
Ward L. D.
Weng Z. P.
White K. P.
Wold B.
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 01/04/2014
Field of study

With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease

Cold Spring Harbor Laboratory Institutional Repository