38 research outputs found
Extracting Molecular Properties from Natural Language with Multimodal Contrastive Learning
Deep learning in computational biochemistry has traditionally focused on
molecular graphs neural representations; however, recent advances in language
models highlight how much scientific knowledge is encoded in text. To bridge
these two modalities, we investigate how molecular property information can be
transferred from natural language to graph representations. We study property
prediction performance gains after using contrastive learning to align neural
graph representations with representations of textual descriptions of their
characteristics. We implement neural relevance scoring strategies to improve
text retrieval, introduce a novel chemically-valid molecular graph augmentation
strategy inspired by organic reactions, and demonstrate improved performance on
downstream MoleculeNet property classification tasks. We achieve a +4.26% AUROC
gain versus models pre-trained on the graph modality alone, and a +1.54% gain
compared to recently proposed molecular graph/text contrastively trained MoMu
model (Su et al. 2022).Comment: 2023 ICML Workshop on Computational Biolog
Recommended from our members
The genomic diversification of grapevine clones.
BACKGROUND:Vegetatively propagated clones accumulate somatic mutations. The purpose of this study was to better appreciate clone diversity and involved defining the nature of somatic mutations throughout the genome. Fifteen Zinfandel winegrape clone genomes were sequenced and compared to one another using a highly contiguous genome reference produced from one of the clones, Zinfandel 03. RESULTS:Though most heterozygous variants were shared, somatic mutations accumulated in individual and subsets of clones. Overall, heterozygous mutations were most frequent in intergenic space and more frequent in introns than exons. A significantly larger percentage of CpG, CHG, and CHH sites in repetitive intergenic space experienced transition mutations than in genic and non-repetitive intergenic spaces, likely because of higher levels of methylation in the region and because methylated cytosines often spontaneously deaminate. Of the minority of mutations that occurred in exons, larger proportions of these were putatively deleterious when they occurred in relatively few clones. CONCLUSIONS:These data support three major conclusions. First, repetitive intergenic space is a major driver of clone genome diversification. Second, clones accumulate putatively deleterious mutations. Third, the data suggest selection against deleterious variants in coding regions or some mechanism by which mutations are less frequent in coding than noncoding regions of the genome
Differential Dynamics of Transposable Elements during Long-Term Diploidization of Nicotiana Section Repandae (Solanaceae) Allopolyploid Genomes
PubMed ID: 23185607This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Genome size diversity in angiosperms and its influence on gene space
Genome size varies c. 2400-fold in angiosperms (flowering plants), although the range of genome size is skewed towards small genomes, with a mean genome size of 1C = 5.7 Gb. One of the most crucial factors governing genome size in angiosperms is the relative amount and activity of repetitive elements. Recently, there have been new insights into how these repeats, previously discarded as ‘junk’ DNA, can have a significant impact on gene space (i.e. the part of the genome comprising all the genes and gene-related DNA). Here we review these new findings and explore in what ways genome size itself plays a role in influencing how repeats impact genome dynamics and gene space, including gene expression