457 research outputs found
Tartarus: A Benchmarking Platform for Realistic And Practical Inverse Molecular Design
The efficient exploration of chemical space to design molecules with intended
properties enables the accelerated discovery of drugs, materials, and
catalysts, and is one of the most important outstanding challenges in
chemistry. Encouraged by the recent surge in computer power and artificial
intelligence development, many algorithms have been developed to tackle this
problem. However, despite the emergence of many new approaches in recent years,
comparatively little progress has been made in developing realistic benchmarks
that reflect the complexity of molecular design for real-world applications. In
this work, we develop a set of practical benchmark tasks relying on physical
simulation of molecular systems mimicking real-life molecular design problems
for materials, drugs, and chemical reactions. Additionally, we demonstrate the
utility and ease of use of our new benchmark set by demonstrating how to
compare the performance of several well-established families of algorithms.
Surprisingly, we find that model performance can strongly depend on the
benchmark domain. We believe that our benchmark suite will help move the field
towards more realistic molecular design benchmarks, and move the development of
inverse molecular design algorithms closer to designing molecules that solve
existing problems in both academia and industry alike.Comment: 29+21 pages, 6+19 figures, 6+2 table
From a Conceptual Model to a Knowledge Graph for Genomic Datasets
Data access at genomic repositories is problematic, as data
is described by heterogeneous and hardly comparable metadata. We previously
introduced a unified conceptual schema, collected metadata in a
single repository and provided classical search methods upon them. We
here propose a new paradigm to support semantic search of integrated
genomic metadata, based on the Genomic Knowledge Graph, a semantic
graph of genomic terms and concepts, which combines the original
information provided by each source with curated terminological content
from specialized ontologies.
Commercial knowledge-assisted search is designed for transparently
supporting keyword-based search without explaining inferences; in biology,
inference understanding is instead critical. For this reason, we propose
a graph-based visual search for data exploration; some expert users
can navigate the semantic graph along the conceptual schema, enriched
with simple forms of homonyms and term hierarchies, thus understanding
the semantic reasoning behind query results
Detection of regulator genes and eQTLs in gene networks
Genetic differences between individuals associated to quantitative phenotypic
traits, including disease states, are usually found in non-coding genomic
regions. These genetic variants are often also associated to differences in
expression levels of nearby genes (they are "expression quantitative trait
loci" or eQTLs for short) and presumably play a gene regulatory role, affecting
the status of molecular networks of interacting genes, proteins and
metabolites. Computational systems biology approaches to reconstruct causal
gene networks from large-scale omics data have therefore become essential to
understand the structure of networks controlled by eQTLs together with other
regulatory genes, and to generate detailed hypotheses about the molecular
mechanisms that lead from genotype to phenotype. Here we review the main
analytical methods and softwares to identify eQTLs and their associated genes,
to reconstruct co-expression networks and modules, to reconstruct causal
Bayesian gene and module networks, and to validate predicted networks in
silico.Comment: minor revision with typos corrected; review article; 24 pages, 2
figure
Differential viral accessibility (DIVA) identifies alterations in chromatin architecture through large-scale mapping of lentiviral integration sites.
Alterations in chromatin structure play a major role in the epigenetic regulation of gene expression. Here, we describe a step-by-step protocol for differential viral accessibility (DIVA), a method for identifying changes in chromatin accessibility genome-wide. Commonly used methods for mapping accessible genomic loci have strong preferences toward detecting 'open' chromatin found at regulatory regions but are not well suited to studying chromatin accessibility in gene bodies and intergenic regions. DIVA overcomes this limitation, enabling a broader range of sites to be interrogated. Conceptually, DIVA is similar to ATAC-seq in that it relies on the integration of exogenous DNA into the genome to map accessible chromatin, except that chromatin architecture is probed through mapping integration sites of exogenous lentiviruses. An isogenic pair of cell lines are transduced with a lentiviral vector, followed by PCR amplification and Illumina sequencing of virus-genome junctions; the resulting sequences define a set of unique lentiviral integration sites, which are compared to determine whether genomic loci exhibit significantly altered accessibility between experimental and control cells. Experienced researchers will take 6 d to generate lentiviral stocks and transduce the target cells, a further 5 d to prepare the Illumina sequencing libraries and a few hours to perform the bioinformatic analysis
Genome-wide enhancer maps link risk variants to disease genes
Genome-wide association studies (GWAS) have identified thousands of noncoding loci that are associated with human diseases and complextraits, each of which could reveal insights into the mechanisms of disease(1). Many ofthe underlying causal variants may affect enhancers(2,3), but we lack accurate maps of enhancers and their target genes to interpret such variants. We recently developed the activity-by-contact (ABC) model to predict which enhancers regulate which genes and validated the model using CRISPR perturbations in several cell types(4). Here we apply this ABC model to create enhancer-gene maps in 131 human cell types and tissues, and use these maps to interpret the functions of GWAS variants. Across 72 diseases and complex traits, ABC links 5,036 GWAS signals to 2,249 unique genes, including a class of 577genesthat appear to influence multiple phenotypes through variants in enhancers that act in different cell types. In inflammatory bowel disease (IBD), causal variants are enriched in predicted enhancers by more than 20-fold in particular cell types such as dendritic cells, and ABC achieves higher precision than other regulatory methods at connecting noncoding variants to target genes. These variant-to-function maps reveal an enhancer that contains an IBD risk variant and that regulates the expression of PPIF to alter the membrane potential of mitochondria in macrophages. Our study reveals principles of genome regulation, identifies genes that affect IBD and provides a resource and generalizable strategy to connect risk variants of common diseases to their molecular and cellular functions.Peer reviewe
Lineage-specific dynamic and pre-established enhancer–promoter contacts cooperate in terminal differentiation
Chromosome conformation is an important feature of metazoan gene regulation; however, enhancer–promoter contact remodeling during cellular differentiation remains poorly understood. To address this, genome-wide promoter capture Hi-C (CHi-C) was performed during epidermal differentiation. Two classes of enhancer–promoter contacts associated with differentiation-induced genes were identified. The first class ('gained') increased in contact strength during differentiation in concert with enhancer acquisition of the H3K27ac activation mark. The second class ('stable') were pre-established in undifferentiated cells, with enhancers constitutively marked by H3K27ac. The stable class was associated with the canonical conformation regulator cohesin, whereas the gained class was not, implying distinct mechanisms of contact formation and regulation. Analysis of stable enhancers identified a new, essential role for a constitutively expressed, lineage-restricted ETS-family transcription factor, EHF, in epidermal differentiation. Furthermore, neither class of contacts was observed in pluripotent cells, suggesting that lineage-specific chromatin structure is established in tissue progenitor cells and is further remodeled in terminal differentiation
Defining functional DNA elements in the human genome
With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease
- …