206 research outputs found
Trial2Vec: Zero-Shot Clinical Trial Document Similarity Search using Self-Supervision
Clinical trials are essential for drug development but are extremely
expensive and time-consuming to conduct. It is beneficial to study similar
historical trials when designing a clinical trial. However, lengthy trial
documents and lack of labeled data make trial similarity search difficult. We
propose a zero-shot clinical trial retrieval method, Trial2Vec, which learns
through self-supervision without annotating similar clinical trials.
Specifically, the meta-structure of trial documents (e.g., title, eligibility
criteria, target disease) along with clinical knowledge (e.g., UMLS knowledge
base https://www.nlm.nih.gov/research/umls/index.html) are leveraged to
automatically generate contrastive samples. Besides, Trial2Vec encodes trial
documents considering meta-structure thus producing compact embeddings
aggregating multi-aspect information from the whole document. We show that our
method yields medically interpretable embeddings by visualization and it gets a
15% average improvement over the best baselines on precision/recall for trial
retrieval, which is evaluated on our labeled 1600 trial pairs. In addition, we
prove the pre-trained embeddings benefit the downstream trial outcome
prediction task over 240k trials. Software ias available at
https://github.com/RyanWangZf/Trial2Vec.Comment: Findings of EMNLP 202
CORE: Automatic Molecule Optimization Using Copy & Refine Strategy
Molecule optimization is about generating molecule with more desirable
properties based on an input molecule . The state-of-the-art approaches
partition the molecules into a large set of substructures and grow the new
molecule structure by iteratively predicting which substructure from to
add. However, since the set of available substructures is large, such an
iterative prediction task is often inaccurate especially for substructures that
are infrequent in the training data. To address this challenge, we propose a
new generating strategy called "Copy & Refine" (CORE), where at each step the
generator first decides whether to copy an existing substructure from input
or to generate a new substructure, then the most promising substructure will be
added to the new molecule. Combining together with scaffolding tree generation
and adversarial training, CORE can significantly improve several latest
molecule optimization methods in various measures including drug likeness
(QED), dopamine receptor (DRD2) and penalized LogP. We tested CORE and
baselines using the ZINC database and CORE obtained up to 11% and 21%
relatively improvement over the baselines on success rate on the complete test
set and the subset with infrequent substructures, respectively.Comment: Accepted by AAAI 202
- …