6 research outputs found

    Enhanced TextRank using weighted word embedding for text summarization

    Get PDF
    The length of a news article may influence peopleā€™s interest to read the article. In this case, text summarization can help to create a shorter representative version of an article to reduce peopleā€™s read time. This paper proposes to use weighted word embedding based on Word2Vec, FastText, and bidirectional encoder representations from transformers (BERT) models to enhance the TextRank summarization algorithm. The use of weighted word embedding is aimed to create better sentence representation, in order to produce more accurate summaries. The results show that using (unweighted) word embedding significantly improves the performance of the TextRank algorithm, with the best performance gained by the summarization system using BERT word embedding. When each word embedding is weighed using term frequency-inverse document frequency (TF-IDF), the performance for all systems using unweighted word embedding further significantly improve, with the biggest improvement achieved by the systems using Word2Vec (with 6.80% to 12.92% increase) and FastText (with 7.04% to 12.78% increase). Overall, our systems using weighted word embedding can outperform the TextRank method by up to 17.33% in ROUGE-1 and 30.01% in ROUGE-2. This demonstrates the effectiveness of weighted word embedding in the TextRank algorithm for text summarization

    From Human Grading to Machine Grading: Automatic Diagnosis of e-Book Text Marking Skills in Precision Education

    Get PDF
    Precision education is a new challenge in leveraging artificial intelligence, machine learning, and learning analytics to enhance teaching quality and learning performance. To facilitate precision education, text marking skills can be used to determine studentsā€™ learning process. Text marking is an essential learning skill in reading. In this study, we proposed a model that leverages the state-of-the-art text summarization technique, Bidirectional Encoder Representations from Transformers (BERT), to calculate the marking score for 130 graduate students enrolled in an accounting course. Then, we applied learning analytics to analyze the correlation between their marking scores and learning performance. We measured studentsā€™ self-regulated learning (SRL) and clustered them into four groups based on their marking scores and marking frequencies to examine whether differences in reading skills and text marking influence studentsā€™ learning performance and awareness of self-regulation. Consistent with past research, our results did not indicate a strong relationship between marking scores and learning performance. However, high-skill readers who use more marking strategies perform better in learning performance, task strategies, and time management than high-skill readers who use fewer marking strategies. Furthermore, high-skill readers who actively employ marking strategies also achieve superior scores of environment structure, and task strategies in SRL than low-skill readers who are inactive in marking. The findings of this research provide evidence supporting the importance of monitoring and training studentsā€™ text marking skill and facilitating precision education

    Judging Credible and Unethical Statistical Data Explanations via Phrase Similarity Graph

    Get PDF
    We propose a graph-based method to judge credible and unethical statistical data explanations with the exploitation of human instincts proposed by Rosling et al. Our previous work proposes three categories of statistical data explanations and three corresponding judgment methods based on phrase embedding and carefully designed comparison conditions. However, we observe that the previous method Ī² exhibits low accuracy in the explanations of (Ī²) category due to its counter-intuitive semantic similarities between several phrases. To address this limitation and improve the performance, our new method Ī²^2 constructs a Phrase Similarity Graph to generate additional comparison conditions and devises a credibility score to aggregate the conditions with their importance quantified by graph entropy. The experimental results show that our Ī²^2 achieves over 81% accuracy while the previous method Ī² achieves about 57%. Scrutiny reveals that our Ī²^2 mitigates the problem of the counter-intuitive semantic similarities at a satisfactory level

    An enhanced binary bat and Markov clustering algorithms to improve event detection for heterogeneous news text documents

    Get PDF
    Event Detection (ED) works on identifying events from various types of data. Building an ED model for news text documents greatly helps decision-makers in various disciplines in improving their strategies. However, identifying and summarizing events from such data is a non-trivial task due to the large volume of published heterogeneous news text documents. Such documents create a high-dimensional feature space that influences the overall performance of the baseline methods in ED model. To address such a problem, this research presents an enhanced ED model that includes improved methods for the crucial phases of the ED model such as Feature Selection (FS), ED, and summarization. This work focuses on the FS problem by automatically detecting events through a novel wrapper FS method based on Adapted Binary Bat Algorithm (ABBA) and Adapted Markov Clustering Algorithm (AMCL), termed ABBA-AMCL. These adaptive techniques were developed to overcome the premature convergence in BBA and fast convergence rate in MCL. Furthermore, this study proposes four summarizing methods to generate informative summaries. The enhanced ED model was tested on 10 benchmark datasets and 2 Facebook news datasets. The effectiveness of ABBA-AMCL was compared to 8 FS methods based on meta-heuristic algorithms and 6 graph-based ED methods. The empirical and statistical results proved that ABBAAMCL surpassed other methods on most datasets. The key representative features demonstrated that ABBA-AMCL method successfully detects real-world events from Facebook news datasets with 0.96 Precision and 1 Recall for dataset 11, while for dataset 12, the Precision is 1 and Recall is 0.76. To conclude, the novel ABBA-AMCL presented in this research has successfully bridged the research gap and resolved the curse of high dimensionality feature space for heterogeneous news text documents. Hence, the enhanced ED model can organize news documents into distinct events and provide policymakers with valuable information for decision making

    Application of TextRank Algorithm for Credibility Assessment

    No full text
    corecore