31,275 research outputs found

    Graph Based Semi-supervised Learning with Convolution Neural Networks to Classify Crisis Related Tweets

    Full text link
    During time-critical situations such as natural disasters, rapid classification of data posted on social networks by affected people is useful for humanitarian organizations to gain situational awareness and to plan response efforts. However, the scarcity of labeled data in the early hours of a crisis hinders machine learning tasks thus delays crisis response. In this work, we propose to use an inductive semi-supervised technique to utilize unlabeled data, which is often abundant at the onset of a crisis event, along with fewer labeled data. Specif- ically, we adopt a graph-based deep learning framework to learn an inductive semi-supervised model. We use two real-world crisis datasets from Twitter to evaluate the proposed approach. Our results show significant improvements using unlabeled data as compared to only using labeled data.Comment: 5 pages. arXiv admin note: substantial text overlap with arXiv:1805.0515

    Semi-supervised clinical text classification with Laplacian SVMs: An application to cancer case management

    Get PDF
    AbstractObjectiveTo compare linear and Laplacian SVMs on a clinical text classification task; to evaluate the effect of unlabeled training data on Laplacian SVM performance.BackgroundThe development of machine-learning based clinical text classifiers requires the creation of labeled training data, obtained via manual review by clinicians. Due to the effort and expense involved in labeling data, training data sets in the clinical domain are of limited size. In contrast, electronic medical record (EMR) systems contain hundreds of thousands of unlabeled notes that are not used by supervised machine learning approaches. Semi-supervised learning algorithms use both labeled and unlabeled data to train classifiers, and can outperform their supervised counterparts.MethodsWe trained support vector machines (SVMs) and Laplacian SVMs on a training reference standard of 820 abdominal CT, MRI, and ultrasound reports labeled for the presence of potentially malignant liver lesions that require follow up (positive class prevalence 77%). The Laplacian SVM used 19,845 randomly sampled unlabeled notes in addition to the training reference standard. We evaluated SVMs and Laplacian SVMs on a test set of 520 labeled reports.ResultsThe Laplacian SVM trained on labeled and unlabeled radiology reports significantly outperformed supervised SVMs (Macro-F1 0.773 vs. 0.741, Sensitivity 0.943 vs. 0.911, Positive Predictive value 0.877 vs. 0.883). Performance improved with the number of labeled and unlabeled notes used to train the Laplacian SVM (pearsonā€™s Ļ=0.529 for correlation between number of unlabeled notes and macro-F1 score). These results suggest that practical semi-supervised methods such as the Laplacian SVM can leverage the large, unlabeled corpora that reside within EMRs to improve clinical text classification

    CLASSIFICATION BASED ON SEMI-SUPERVISED LEARNING: A REVIEW

    Get PDF
    Semi-supervised learning is the class of machine learning that deals with the use of supervised and unsupervised learning to implement the learning process. Conceptually placed between labelled and unlabeled data. In certain cases, it enables the large numbers of unlabeled data required to be utilized in comparison with usually limited collections of labeled data. In standard classification methods in machine learning, only a labeled collection is used to train the classifier. In addition, labelled instances are difficult to acquire since they necessitate the assistance of annotators, who serve in an occupation that is identified by their label. A complete audit without a supervisor is fairly easy to do, but nevertheless represents a significant risk to the enterprise, as there have been few chances to safely experiment with it so far. By utilizing a large number of unsupervised inputs along with the supervised inputs, the semi-supervised learning solves this issue, to create a good training sample. Since semi-supervised learning requires fewer human effort and allows greater precision, both theoretically or in practice, it is of critical interest
    • ā€¦
    corecore