Search CORE

38 research outputs found

Extracting Molecular Properties from Natural Language with Multimodal Contrastive Learning

Author: Gaut Andrew
He Jeff
Lacombe Romain
Lüdeke David
Pistunova Kateryna
Publication venue
Publication date: 22/07/2023
Field of study

Deep learning in computational biochemistry has traditionally focused on molecular graphs neural representations; however, recent advances in language models highlight how much scientific knowledge is encoded in text. To bridge these two modalities, we investigate how molecular property information can be transferred from natural language to graph representations. We study property prediction performance gains after using contrastive learning to align neural graph representations with representations of textual descriptions of their characteristics. We implement neural relevance scoring strategies to improve text retrieval, introduce a novel chemically-valid molecular graph augmentation strategy inspired by organic reactions, and demonstrate improved performance on downstream MoleculeNet property classification tasks. We achieve a +4.26% AUROC gain versus models pre-trained on the graph modality alone, and a +1.54% gain compared to recently proposed molecular graph/text contrastively trained MoMu model (Su et al. 2022).Comment: 2023 ICML Workshop on Computational Biolog

arXiv.org e-Print Archive

Recommended from our members

The genomic diversification of grapevine clones.

Author: Anderson Michael M
Blanco-Ulate Barbara
Cantu Dario
Espinoza Lucero K
Figueroa-Balderas Rosa
Gaut Brandon
Liang Dingren
Minio Andrea
Penn Michael A
Seymour Danelle
Vondras Amanda M
Walker M Andrew
Ye Zirou
Zhou Yongfeng
Publication venue: eScholarship, University of California
Publication date: 01/12/2019
Field of study

BACKGROUND:Vegetatively propagated clones accumulate somatic mutations. The purpose of this study was to better appreciate clone diversity and involved defining the nature of somatic mutations throughout the genome. Fifteen Zinfandel winegrape clone genomes were sequenced and compared to one another using a highly contiguous genome reference produced from one of the clones, Zinfandel 03. RESULTS:Though most heterozygous variants were shared, somatic mutations accumulated in individual and subsets of clones. Overall, heterozygous mutations were most frequent in intergenic space and more frequent in introns than exons. A significantly larger percentage of CpG, CHG, and CHH sites in repetitive intergenic space experienced transition mutations than in genic and non-repetitive intergenic spaces, likely because of higher levels of methylation in the region and because methylated cytosines often spontaneously deaminate. Of the minority of mutations that occurred in exons, larger proportions of these were putatively deleterious when they occurred in relatively few clones. CONCLUSIONS:These data support three major conclusions. First, repetitive intergenic space is a major driver of clone genome diversification. Second, clones accumulate putatively deleterious mutations. Third, the data suggest selection against deleterious variants in coding regions or some mechanism by which mutations are less frequent in coding than noncoding regions of the genome

eScholarship - University of California

Narrative Representation and Phenomenological Knowledge

Author: Burri Alex
Carroll Noël
Devereaux Mary
Fabb Nigel
Gaut Berys
Kania Andrew
Lamarque Peter
McGregor Rafe
Murray Penelope
Rafe McGregor
Ryle Gilbert
Walsh Dorothy
Zamir Tzachi
Publication venue: 'Informa UK Limited'
Publication date: 02/04/2016
Field of study

Crossref

Edge Hill University Research Information Repository

Differential Dynamics of Transposable Elements during Long-Term Diploidization of Nicotiana Section Repandae (Solanaceae) Allopolyploid Genomes

Author: A Leitch
Andrew R. Leitch
AR Leitch
B Koukalova
B McClintock
BS Gaut
C Biemont
C Parisod
C Parisod
C Parisod
CE Grover
CE Grover
Christian Parisod
Corinne Mhiri
CP Witte
D Bryant
D Melayah
DE Soltis
DH Huson
Elvira Hörandl
F Lu
IJ Leitch
J Clarkson
J Goudet
J Ramsey
James J. Clarkson
JJ Clarkson
JJ Clarkson
JJ Doyle
JL Bennetzen
K Skalicka
K Song
K. Yoong Lim
KD Whitney
KY Lim
KY Lim
KY Lim
L Comai
L Comai
LA Zhivotovsky
LH Rieseberg
M Charles
M Feldman
M Petit
M Petit
MA Grandbastien
Marie-Angèle Grandbastien
Mark W. Chase
MI Tenaillon
MJ Hegarty
MW Chase
PS Soltis
R Kalendar
R Waugh
RA Martienssen
RG Newcombe
S Renny-Byfield
SC Le Comber
SM Tam
SP Otto
T Bureau
T Wenke
WR Rice
Y Yoshioka
ZJ Chen
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

PubMed ID: 23185607This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Queen Mary Research Online

ProdInra

FigShare

Genome size diversity in angiosperms and its influence on gene space

Author: Andrew R Leitch
Beló
Bennetzen
Birchler
Choulet
Comfort
Eichten
El Baidouri
Fedoroff
Freeling
Gaut
Gent
Gore
Gottlieb
Grandbastien
Greilhuber
Hirsch
Ilia J Leitch
Kejnovsky
Kelly
Le
Leitch
Leitch
Lim
Lisch
Luo
Makarevitch
Marroni
Massa
Matzke
Matzke
Maumus
McCue
Metcalfe
Mirouze
Pellicer
Pezer
Renny-Byfield
Schmidt
Schnable
Slotkin
Soltis
Steven Dodsworth
Vonholdt
Wang
West
Yang
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Genome size varies c. 2400-fold in angiosperms (flowering plants), although the range of genome size is skewed towards small genomes, with a mean genome size of 1C = 5.7 Gb. One of the most crucial factors governing genome size in angiosperms is the relative amount and activity of repetitive elements. Recently, there have been new insights into how these repeats, previously discarded as ‘junk’ DNA, can have a significant impact on gene space (i.e. the part of the genome comprising all the genes and gene-related DNA). Here we review these new findings and explore in what ways genome size itself plays a role in influencing how repeats impact genome dynamics and gene space, including gene expression

Crossref

Elsevier - Publisher Connector

Shared Research Repository

Queen Mary Research Online

University of Bedfordshire Repository

Harnessing Expressed Single Nucleotide Variation and Single Cell RNA Sequencing To Define Immune Cell Chimerism in the Rejecting Kidney Transplant

Author: Andrew F. Malone
Benjamin D. Humphreys
Catrina Fronick
Haojia Wu
Joseph P. Gaut
Robert Fulton
Publication venue: 'American Society of Nephrology (ASN)'
Publication date
Field of study

Crossref