7 research outputs found

    Improving imbalanced question classification using structured smote based approach

    Get PDF
    Questions Classification (QC) is one of the most popular text classification applications. QC plays an important role in question-answering systems. However, as in many real-world classification problems, QC may suffer from the problem of class imbalance. The classification of imbalanced data has been a key problem in machine learning and data mining. In this paper, we propose a framework that deals with the class imbalance using a hierarchical SMOTE algorithm for balancing different types of questions. The proposed framework is grammar-based, which involves using the grammatical pattern for each question and using machine learning algorithms to classify them. Experimental results imply that the proposed framework demonstrates a good level of accuracy in identifying different question types and handling class imbalance

    Towards context-aware syntax parsing and tagging

    Get PDF
    Information retrieval (IR) has become one of the most popular Natural Language Processing (NLP) applications. Part of speech (PoS) parsing and tagging plays an important role in IR systems. A broad range of PoS parsers and taggers tools have been proposed with the aim of helping to find a solution for the information retrieval problems, but most of these are tools based on generic NLP tags which do not capture domain-related information. In this research, we present a domain-specific parsing and tagging approach that uses not only generic PoS tags but also domain-specific PoS tags, grammatical rules, and domain knowledge. Experimental results show that our approach has a good level of accuracy when applying it to different domains
    corecore