Search CORE

206 research outputs found

Trial2Vec: Zero-Shot Clinical Trial Document Similarity Search using Self-Supervision

Author: Sun Jimeng
Wang Zifeng
Publication venue
Publication date: 09/10/2022
Field of study

Clinical trials are essential for drug development but are extremely expensive and time-consuming to conduct. It is beneficial to study similar historical trials when designing a clinical trial. However, lengthy trial documents and lack of labeled data make trial similarity search difficult. We propose a zero-shot clinical trial retrieval method, Trial2Vec, which learns through self-supervision without annotating similar clinical trials. Specifically, the meta-structure of trial documents (e.g., title, eligibility criteria, target disease) along with clinical knowledge (e.g., UMLS knowledge base https://www.nlm.nih.gov/research/umls/index.html) are leveraged to automatically generate contrastive samples. Besides, Trial2Vec encodes trial documents considering meta-structure thus producing compact embeddings aggregating multi-aspect information from the whole document. We show that our method yields medically interpretable embeddings by visualization and it gets a 15% average improvement over the best baselines on precision/recall for trial retrieval, which is evaluated on our labeled 1600 trial pairs. In addition, we prove the pre-trained embeddings benefit the downstream trial outcome prediction task over 240k trials. Software ias available at https://github.com/RyanWangZf/Trial2Vec.Comment: Findings of EMNLP 202

arXiv.org e-Print Archive

CORE: Automatic Molecule Optimization Using Copy & Refine Strategy

Author: Fu Tianfan
Sun Jimeng
Xiao Cao
Publication venue
Publication date: 23/11/2019
Field of study

Molecule optimization is about generating molecule

Y

with more desirable properties based on an input molecule

X

. The state-of-the-art approaches partition the molecules into a large set of substructures

S

and grow the new molecule structure by iteratively predicting which substructure from

S

to add. However, since the set of available substructures

S

is large, such an iterative prediction task is often inaccurate especially for substructures that are infrequent in the training data. To address this challenge, we propose a new generating strategy called "Copy & Refine" (CORE), where at each step the generator first decides whether to copy an existing substructure from input

X

or to generate a new substructure, then the most promising substructure will be added to the new molecule. Combining together with scaffolding tree generation and adversarial training, CORE can significantly improve several latest molecule optimization methods in various measures including drug likeness (QED), dopamine receptor (DRD2) and penalized LogP. We tested CORE and baselines using the ZINC database and CORE obtained up to 11% and 21% relatively improvement over the baselines on success rate on the complete test set and the subset with infrequent substructures, respectively.Comment: Accepted by AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications