41,671 research outputs found

    Graph Based Semi-supervised Learning with Convolution Neural Networks to Classify Crisis Related Tweets

    Full text link
    During time-critical situations such as natural disasters, rapid classification of data posted on social networks by affected people is useful for humanitarian organizations to gain situational awareness and to plan response efforts. However, the scarcity of labeled data in the early hours of a crisis hinders machine learning tasks thus delays crisis response. In this work, we propose to use an inductive semi-supervised technique to utilize unlabeled data, which is often abundant at the onset of a crisis event, along with fewer labeled data. Specif- ically, we adopt a graph-based deep learning framework to learn an inductive semi-supervised model. We use two real-world crisis datasets from Twitter to evaluate the proposed approach. Our results show significant improvements using unlabeled data as compared to only using labeled data.Comment: 5 pages. arXiv admin note: substantial text overlap with arXiv:1805.0515

    Graph Neural Networks for Natural Language Processing

    Get PDF
    By constructing graph-structured data from the input data, Graph Neural Network (GNN) enhances the performance of numerous Natural Language Processing (NLP) tasks. In this thesis, we mainly focus on two aspects of NLP: text classification and knowledge graph completion. TextGCN shows excellent performance in text classification by leveraging the graph structure of the entire corpus without using any external resources, especially under a limited labelled data setting. Two questions are explored: (1) Under the transductive semi-supervised setting, how to utilize the documents better and learn the complex relationship between nodes. (2) How to transform TextGCN into an inductive model and also reduce the time and space complexity? In detail, firstly, a comprehensive analysis was conducted on TextGCN and its variants. Secondly, we propose ME-GCN, a novel method for text classification that utilizes multi-dimensional edge features in a graph neural network (GNN) for the first time. It uses the corpus-trained word and document-based edge features for semi-supervised classification and has been shown to be effective through experiments on benchmark datasets under the limited labelled data setting. Thirdly, InducT-GCN, an inductive framework for GCN-based text classification that does not require additional resources is introduced. The framework introduces a novel approach to make transductive GCN-based text classification models inductive, improving performance and reducing time and space complexity. Most existing work for Temporal Knowledge Graph Completion (TKGC) overlooks the significance of explicit temporal information and fails to skip irrelevant snapshots based on the entity-related relation in the query. To address this, we introduced Re-Temp (Relation-Aware Temporal Representation Learning), a model that leverages explicit temporal embedding and a skip information flow after each timestamp to eliminate unnecessary information for prediction

    Applicability of semi-supervised learning assumptions for gene ontology terms prediction

    Get PDF
    Gene Ontology (GO) is one of the most important resources in bioinformatics, aiming to provide a unified framework for the biological annotation of genes and proteins across all species. Predicting GO terms is an essential task for bioinformatics, but the number of available labelled proteins is in several cases insufficient for training reliable machine learning classifiers. Semi-supervised learning methods arise as a powerful solution that explodes the information contained in unlabelled data in order to improve the estimations of traditional supervised approaches. However, semi-supervised learning methods have to make strong assumptions about the nature of the training data and thus, the performance of the predictor is highly dependent on these assumptions. This paper presents an analysis of the applicability of semi-supervised learning assumptions over the specific task of GO terms prediction, focused on providing judgment elements that allow choosing the most suitable tools for specific GO terms. The results show that semi-supervised approaches significantly outperform the traditional supervised methods and that the highest performances are reached when applying the cluster assumption. Besides, it is experimentally demonstrated that cluster and manifold assumptions are complimentary to each other and an analysis of which GO terms can be more prone to be correctly predicted with each assumption, is provided.Postprint (published version
    • …
    corecore