2 research outputs found
Knowledge-Base Enriched Word Embeddings for Biomedical Domain
Word embeddings have been shown adept at capturing the semantic and syntactic
regularities of the natural language text, as a result of which these
representations have found their utility in a wide variety of downstream
content analysis tasks. Commonly, these word embedding techniques derive the
distributed representation of words based on the local context information.
However, such approaches ignore the rich amount of explicit information present
in knowledge-bases. This is problematic, as it might lead to poor
representation for words with insufficient local context such as domain
specific words. Furthermore, the problem becomes pronounced in domain such as
bio-medicine where the presence of these domain specific words are relatively
high. Towards this end, in this project, we propose a new word embedding based
model for biomedical domain that jointly leverages the information from
available corpora and domain knowledge in order to generate knowledge-base
powered embeddings. Unlike existing approaches, the proposed methodology is
simple but adept at capturing the precise knowledge available in domain
resources in an accurate way. Experimental results on biomedical concept
similarity and relatedness task validates the effectiveness of the proposed
approach.Comment: Work in progres
AGATHA: Automatic Graph-mining And Transformer based Hypothesis generation Approach
Medical research is risky and expensive. Drug discovery, as an example,
requires that researchers efficiently winnow thousands of potential targets to
a small candidate set for more thorough evaluation. However, research groups
spend significant time and money to perform the experiments necessary to
determine this candidate set long before seeing intermediate results.
Hypothesis generation systems address this challenge by mining the wealth of
publicly available scientific information to predict plausible research
directions. We present AGATHA, a deep-learning hypothesis generation system
that can introduce data-driven insights earlier in the discovery process.
Through a learned ranking criteria, this system quickly prioritizes plausible
term-pairs among entity sets, allowing us to recommend new research directions.
We massively validate our system with a temporal holdout wherein we predict
connections first introduced after 2015 using data published beforehand. We
additionally explore biomedical sub-domains, and demonstrate AGATHA's
predictive capacity across the twenty most popular relationship types. This
system achieves best-in-class performance on an established benchmark, and
demonstrates high recommendation scores across subdomains. Reproducibility: All
code, experimental data, and pre-trained models are available online:
sybrandt.com/2020/agath