56,487 research outputs found
Graph Based Semi-supervised Learning with Convolution Neural Networks to Classify Crisis Related Tweets
During time-critical situations such as natural disasters, rapid
classification of data posted on social networks by affected people is useful
for humanitarian organizations to gain situational awareness and to plan
response efforts. However, the scarcity of labeled data in the early hours of a
crisis hinders machine learning tasks thus delays crisis response. In this
work, we propose to use an inductive semi-supervised technique to utilize
unlabeled data, which is often abundant at the onset of a crisis event, along
with fewer labeled data. Specif- ically, we adopt a graph-based deep learning
framework to learn an inductive semi-supervised model. We use two real-world
crisis datasets from Twitter to evaluate the proposed approach. Our results
show significant improvements using unlabeled data as compared to only using
labeled data.Comment: 5 pages. arXiv admin note: substantial text overlap with
arXiv:1805.0515
Label-efficient Time Series Representation Learning: A Review
The scarcity of labeled data is one of the main challenges of applying deep
learning models on time series data in the real world. Therefore, several
approaches, e.g., transfer learning, self-supervised learning, and
semi-supervised learning, have been recently developed to promote the learning
capability of deep learning models from the limited time series labels. In this
survey, for the first time, we provide a novel taxonomy to categorize existing
approaches that address the scarcity of labeled data problem in time series
data based on their dependency on external data sources. Moreover, we present a
review of the recent advances in each approach and conclude the limitations of
the current works and provide future directions that could yield better
progress in the field.Comment: Under Revie
Mixture of Expert/Imitator Networks: Scalable Semi-supervised Learning Framework
The current success of deep neural networks (DNNs) in an increasingly broad
range of tasks involving artificial intelligence strongly depends on the
quality and quantity of labeled training data. In general, the scarcity of
labeled data, which is often observed in many natural language processing
tasks, is one of the most important issues to be addressed. Semi-supervised
learning (SSL) is a promising approach to overcoming this issue by
incorporating a large amount of unlabeled data. In this paper, we propose a
novel scalable method of SSL for text classification tasks. The unique property
of our method, Mixture of Expert/Imitator Networks, is that imitator networks
learn to "imitate" the estimated label distribution of the expert network over
the unlabeled data, which potentially contributes a set of features for the
classification. Our experiments demonstrate that the proposed method
consistently improves the performance of several types of baseline DNNs. We
also demonstrate that our method has the more data, better performance property
with promising scalability to the amount of unlabeled data.Comment: Accepted by AAAI 201
Semi-Supervised Recurrent Neural Network for Adverse Drug Reaction Mention Extraction
Social media is an useful platform to share health-related information due to
its vast reach. This makes it a good candidate for public-health monitoring
tasks, specifically for pharmacovigilance. We study the problem of extraction
of Adverse-Drug-Reaction (ADR) mentions from social media, particularly from
twitter. Medical information extraction from social media is challenging,
mainly due to short and highly information nature of text, as compared to more
technical and formal medical reports.
Current methods in ADR mention extraction relies on supervised learning
methods, which suffers from labeled data scarcity problem. The State-of-the-art
method uses deep neural networks, specifically a class of Recurrent Neural
Network (RNN) which are Long-Short-Term-Memory networks (LSTMs)
\cite{hochreiter1997long}. Deep neural networks, due to their large number of
free parameters relies heavily on large annotated corpora for learning the end
task. But in real-world, it is hard to get large labeled data, mainly due to
heavy cost associated with manual annotation. Towards this end, we propose a
novel semi-supervised learning based RNN model, which can leverage unlabeled
data also present in abundance on social media. Through experiments we
demonstrate the effectiveness of our method, achieving state-of-the-art
performance in ADR mention extraction.Comment: Accepted at DTMBIO workshop, CIKM 2017. To appear in BMC
Bioinformatics. Pls cite that versio
- …