4 research outputs found

    Enhanced lexicon based models for extracting question-answer pairs from web forum

    Get PDF
    A Web forum is an online community that brings people in different geographical locations together. Members of the forum exchange ideas and expertise. As a result, a huge amount of contents on different topics are generated on a daily basis. The huge human generated contents of web forum can be mined as questionanswer pairs (Q&A). One of the major challenges in mining Q&A from web forum is to establish a good relationship between the question and the candidate answers. This problem is compounded by the noisy nature of web forum's human generated contents. Unfortunately, the existing methods that are used to mine knowledge from web forums ignore the effect of noise on the mining tools, making the lexical contents less effective. This study proposes lexicon based models that can automatically mine question-answer pairs with higher accuracy scores from web forum. The first phase of the research produces question mining model. It was implemented using features generated from unigram, bigram, forum metadata and simple rules. These features were screened using both chi-square and wrapper techniques. Wrapper generated features were used by Multinomial Naïve Bayes to finally build the model. The second phase produced a normalized lexical model for answer mining. It was implemented using 13 lexical features that cut across four quality dimensions. The performance of the features was enhanced by noise normalization, a process that fixed orthographic, phonetic and acronyms noises. The third phase of the research produced a hybridized model of lexical and non-lexical features. The average performances of the question mining model, normalized lexical model and hybridized model for answer mining were 90.3%, 97.5%, and 99.5% respectively on three data sets used. They outperformed all previous works in the domain. The first major contribution of the study is the development of an improved question mining model that is characterized by higher accuracy, better specificity, less complex and ability to generate good accuracy across different forum genres. The second contribution is the development of normalized lexical based model that has capability to establish good relationship between a question and its corresponding answer. The third contribution is the development of a hybridized model that integrates lexical features that guarantee relevance with non-lexical that guarantee quality to mine web forum answers. The fourth contribution is a novel integration of question and answer mining models to automatically generate question-answer pairs from web forum

    A Review of the Analytics Techniques for an Efficient Management of Online Forums: An Architecture Proposal

    Get PDF
    E-learning is a response to the new educational needs of society and an important development in information and communication technologies because it represents the future of the teaching and learning processes. However, this trend presents many challenges, such as the processing of online forums which generate a huge number of messages with an unordered structure and a great variety of topics. These forums provide an excellent platform for learning and connecting students of a subject but the difficulty of following and searching the vast volume of information that they generate may be counterproductive. The main goal of this paper is to review the approaches and techniques related to online courses in order to present a set of learning analytics techniques and a general architecture that solve the main challenges found in the state of the art by managing them in a more efficient way: 1) efficient tracking and monitoring of forums generated; 2) design of effective search mechanisms for questions and answers in the forums; and 3) extraction of relevant key performance indicators with the objective of carrying out an efficient management of online forums. In our proposal, natural language processing, clustering, information retrieval, question answering, and data mining techniques will be used.This work was supported in part by the Spanish Ministry of Economy and Competitiveness through the Project SEQUOIA-UA under Grant TIN2015-63502-C3-3-R, the Project RESCATA under Grant TIN2015-65100-R, and the Project PROMETEO/2018/089, and in part by the Spanish Research Agency (AEI) and the European Regional Development Fund (FEDER) through the Project CloudDriver4Industry under Grant TIN2017-89266-R