8,381 research outputs found
Semi-Supervised Recurrent Neural Network for Adverse Drug Reaction Mention Extraction
Social media is an useful platform to share health-related information due to
its vast reach. This makes it a good candidate for public-health monitoring
tasks, specifically for pharmacovigilance. We study the problem of extraction
of Adverse-Drug-Reaction (ADR) mentions from social media, particularly from
twitter. Medical information extraction from social media is challenging,
mainly due to short and highly information nature of text, as compared to more
technical and formal medical reports.
Current methods in ADR mention extraction relies on supervised learning
methods, which suffers from labeled data scarcity problem. The State-of-the-art
method uses deep neural networks, specifically a class of Recurrent Neural
Network (RNN) which are Long-Short-Term-Memory networks (LSTMs)
\cite{hochreiter1997long}. Deep neural networks, due to their large number of
free parameters relies heavily on large annotated corpora for learning the end
task. But in real-world, it is hard to get large labeled data, mainly due to
heavy cost associated with manual annotation. Towards this end, we propose a
novel semi-supervised learning based RNN model, which can leverage unlabeled
data also present in abundance on social media. Through experiments we
demonstrate the effectiveness of our method, achieving state-of-the-art
performance in ADR mention extraction.Comment: Accepted at DTMBIO workshop, CIKM 2017. To appear in BMC
Bioinformatics. Pls cite that versio
Semi-supervised prediction of protein interaction sentences exploiting semantically encoded metrics
Protein-protein interaction (PPI) identification is an integral component of many biomedical research and database curation tools. Automation of this task through classification is one of the key goals of text mining (TM). However, labelled PPI corpora required to train classifiers are generally small. In order to overcome this sparsity in the training data, we propose a novel method of integrating corpora that do not contain relevance judgements. Our approach uses a semantic language model to gather word similarity from a large unlabelled corpus. This additional information is integrated into the sentence classification process using kernel transformations and has a re-weighting effect on the training features that leads to an 8% improvement in F-score over the baseline results. Furthermore, we discover that some words which are generally considered indicative of interactions are actually neutralised by this process
PaperRobot: Incremental Draft Generation of Scientific Ideas
We present a PaperRobot who performs as an automatic research assistant by
(1) conducting deep understanding of a large collection of human-written papers
in a target domain and constructing comprehensive background knowledge graphs
(KGs); (2) creating new ideas by predicting links from the background KGs, by
combining graph attention and contextual text attention; (3) incrementally
writing some key elements of a new paper based on memory-attention networks:
from the input title along with predicted related entities to generate a paper
abstract, from the abstract to generate conclusion and future work, and finally
from future work to generate a title for a follow-on paper. Turing Tests, where
a biomedical domain expert is asked to compare a system output and a
human-authored string, show PaperRobot generated abstracts, conclusion and
future work sections, and new titles are chosen over human-written ones up to
30%, 24% and 12% of the time, respectively.Comment: 12 pages. Accepted by ACL 2019 Code and resource is available at
https://github.com/EagleW/PaperRobo
InteractiveIE: Towards Assessing the Strength of Human-AI Collaboration in Improving the Performance of Information Extraction
Learning template based information extraction from documents is a crucial
yet difficult task. Prior template-based IE approaches assume foreknowledge of
the domain templates; however, real-world IE do not have pre-defined schemas
and it is a figure-out-as you go phenomena. To quickly bootstrap templates in a
real-world setting, we need to induce template slots from documents with zero
or minimal supervision. Since the purpose of question answering intersect with
the goal of information extraction, we use automatic question generation to
induce template slots from the documents and investigate how a tiny amount of a
proxy human-supervision on-the-fly (termed as InteractiveIE) can further boost
the performance. Extensive experiments on biomedical and legal documents, where
obtaining training data is expensive, reveal encouraging trends of performance
improvement using InteractiveIE over AI-only baseline.Comment: Version
- …