3,704 research outputs found
Distribution-Based Categorization of Classifier Transfer Learning
Transfer Learning (TL) aims to transfer knowledge acquired in one problem,
the source problem, onto another problem, the target problem, dispensing with
the bottom-up construction of the target model. Due to its relevance, TL has
gained significant interest in the Machine Learning community since it paves
the way to devise intelligent learning models that can easily be tailored to
many different applications. As it is natural in a fast evolving area, a wide
variety of TL methods, settings and nomenclature have been proposed so far.
However, a wide range of works have been reporting different names for the same
concepts. This concept and terminology mixture contribute however to obscure
the TL field, hindering its proper consideration. In this paper we present a
review of the literature on the majority of classification TL methods, and also
a distribution-based categorization of TL with a common nomenclature suitable
to classification problems. Under this perspective three main TL categories are
presented, discussed and illustrated with examples
Graph Based Semi-supervised Learning with Convolution Neural Networks to Classify Crisis Related Tweets
During time-critical situations such as natural disasters, rapid
classification of data posted on social networks by affected people is useful
for humanitarian organizations to gain situational awareness and to plan
response efforts. However, the scarcity of labeled data in the early hours of a
crisis hinders machine learning tasks thus delays crisis response. In this
work, we propose to use an inductive semi-supervised technique to utilize
unlabeled data, which is often abundant at the onset of a crisis event, along
with fewer labeled data. Specif- ically, we adopt a graph-based deep learning
framework to learn an inductive semi-supervised model. We use two real-world
crisis datasets from Twitter to evaluate the proposed approach. Our results
show significant improvements using unlabeled data as compared to only using
labeled data.Comment: 5 pages. arXiv admin note: substantial text overlap with
arXiv:1805.0515
Automatic document classification of biological literature
Background: Document classification is a wide-spread problem with many applications, from organizing search engine snippets to spam filtering. We previously described Textpresso, a text-mining system for biological literature, which marks up full text according to a shallow ontology that includes terms of biological interest. This project investigates document classification in the context of biological literature, making use of the Textpresso markup of a corpus of Caenorhabditis elegans literature.
Results: We present a two-step text categorization algorithm to classify a corpus of C. elegans papers. Our classification method first uses a support vector machine-trained classifier, followed by a novel, phrase-based clustering algorithm. This clustering step autonomously creates cluster labels that are descriptive and understandable by humans. This clustering engine performed better on a standard test-set (Reuters 21578) compared to previously published results (F-value of 0.55 vs. 0.49), while producing cluster descriptions that appear more useful. A web interface allows researchers to quickly navigate through the hierarchy and look for documents that belong to a specific concept.
Conclusions: We have demonstrated a simple method to classify biological documents that embodies an improvement over current methods. While the classification results are currently optimized for Caenorhabditis elegans papers by human-created rules, the classification engine can be adapted to different types of documents. We have demonstrated this by presenting a web interface that allows researchers to quickly navigate through the hierarchy and look for documents that belong to a specific concept
- …