Search CORE

8 research outputs found

Development of an Automated Scoring Model Using SentenceTransformers for Discussion Forums in Online Learning Environments

Author: Dhini Bachriah Fatwa
Girsang Abba Suganda
Publication venue: University of Zagreb, Faculty of Electrical Engineering and Computing
Publication date: 01/01/2022
Field of study

Due to the limitations of public datasets, research on automatic essay scoring in Indonesian has been restrained and resulted in suboptimal accuracy. In general, the main goal of the essay scoring system is to improve execution time, which is usually done manually with human judgment. This study uses a discussion forum in online learning to generate an assessment between the responses and the lecturer\u27s rubric in the automated essay scoring. A SentenceTransformers pre-trained model that can construct the highest vector embedding was proposed to identify the semantic meaning between the responses and the lecturer\u27s rubric. The effectiveness of monolingual and multilingual models was compared. This research aims to determine the model\u27s effectiveness and the appropriate model for the Automated Essay Scoring (AES) used in paired sentence Natural Language Processing tasks. The distiluse-base-multilingual-cased-v1 model, which employed the Pearson correlation method, obtained the highest performance. Specifically, it obtained a correlation value of 0.63 and a mean absolute error (MAE) score of 0.70. It indicates that the overall prediction result is enhanced when compared to the earlier regression task research

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Hierarchical Classification System for Breast Cancer Specimen Report (HCSBC) -- an end-to-end model for characterizing severity and diagnosis

Author: Banerjee Imon
Gichoya Judy
Kamath Harish
Lehman Constance
McAdams Christopher R.
Mosunjac Marina
Newell Mary S.
Oprea-Ilies Gabriela
Santos Thiago
Smith Geoffrey
Trivedi Hari
Publication venue
Publication date: 02/11/2023
Field of study

Automated classification of cancer pathology reports can extract information from unstructured reports and categorize each report into structured diagnosis and severity categories. Thus, such system can reduce the burden for populating tumor registries, help registration for clinical trial as well as developing large dataset for deep learning model development using true pathologic ground truth. However, the content of breast pathology reports can be difficult for categorize due to the high linguistic variability in content and wide variety of potential diagnoses >50. Existing NLP models are primarily focused on developing classifier for primary breast cancer types (e.g. IDC, DCIS, ILC) and tumor characteristics, and ignore the rare diagnosis of cancer subtypes. We then developed a hierarchical hybrid transformer-based pipeline (59 labels) - Hierarchical Classification System for Breast Cancer Specimen Report (HCSBC), which utilizes the potential of the transformer context-preserving NLP technique and compared our model to several state of the art ML and DL models. We trained the model on the EUH data and evaluated our model's performance on two external datasets - MGH and Mayo Clinic. We publicly release the code and a live application under Huggingface spaces repositor

arXiv.org e-Print Archive

Automatic essay scoring for discussion forum in online learning based on semantic and keyword similarities

Author: Abba Suganda Girsang
Bachriah Fatwa Dhini
Heny Kurniawati
Unggul Utan Sufandi
Publication venue: Emerald Publishing
Publication date: 01/12/2023
Field of study

Purpose – The authors constructed an automatic essay scoring (AES) model in a discussion forum where the result was compared with scores given by human evaluators. This research proposes essay scoring, which is conducted through two parameters, semantic and keyword similarities, using a SentenceTransformers pre-trained model that can construct the highest vector embedding. Combining these models is used to optimize the model with increasing accuracy. Design/methodology/approach – The development of the model in the study is divided into seven stages: (1) data collection, (2) pre-processing data, (3) selected pre-trained SentenceTransformers model, (4) semantic similarity (sentence pair), (5) keyword similarity, (6) calculate final score and (7) evaluating model. Findings – The multilingual paraphrase-multilingual-MiniLM-L12-v2 and distilbert-base-multilingual-cased-v1 models got the highest scores from comparisons of 11 pre-trained multilingual models of SentenceTransformers with Indonesian data (Dhini and Girsang, 2023). Both multilingual models were adopted in this study. A combination of two parameters is obtained by comparing the response of the keyword extraction responses with the rubric keywords. Based on the experimental results, proposing a combination can increase the evaluation results by 0.2. Originality/value – This study uses discussion forum data from the general biology course in online learning at the open university for the 2020.2 and 2021.2 semesters. Forum discussion ratings are still manual. In this survey, the authors created a model that automatically calculates the value of discussion forums, which are essays based on the lecturer's answers moreover rubrics

Directory of Open Access Journals

Short-text semantic similarity (STSS): Techniques, challenges and future perspectives

Author: Amur Zaira Hassan
Bhanbhro Hina
Dahri Kamran
Hooi Yew Kwang
Soomro Gul Muhammad
Publication venue: 'MDPI AG'
Publication date: 18/04/2023
Field of study

In natural language processing, short-text semantic similarity (STSS) is a very prominent field. It has a significant impact on a broad range of applications, such as question-answering systems, information retrieval, entity recognition, text analytics, sentiment classification, and so on. Despite their widespread use, many traditional machine learning techniques are incapable of identifying the semantics of short text. Traditional methods are based on ontologies, knowledge graphs, and corpus-based methods. The performance of these methods is influenced by the manually defined rules. Applying such measures is still difficult, since it poses various semantic challenges. In the existing literature, the most recent advances in short-text semantic similarity (STSS) research are not included. This study presents the systematic literature review (SLR) with the aim to (i) explain short sentence barriers in semantic similarity, (ii) identify the most appropriate standard deep learning techniques for the semantics of a short text, (iii) classify the language models that produce high-level contextual semantic information, (iv) determine appropriate datasets that are only intended for short text, and (v) highlight research challenges and proposed future improvements. To the best of our knowledge, we have provided an in-depth, comprehensive, and systematic review of short text semantic similarity trends, which will assist the researchers to reuse and enhance the semantic information.Yayasan UTP Pre-commercialization grant (YUTP-PRG) [015PBC-005]; Computer and Information Science Department of Universiti Teknologi PETRONASYayasan UTP, YUTP: 015PBC-00

Institutional repository of Tomas Bata University Library

Advancement Auto-Assessment of Students Knowledge States from Natural Language Input

Author: Ait Khayi Nisrine
Publication venue: University of Memphis Digital Commons
Publication date: 01/01/2021
Field of study

Knowledge Assessment is a key element in adaptive instructional systems and in particular in Intelligent Tutoring Systems because fully adaptive tutoring presupposes accurate assessment. However, this is a challenging research problem as numerous factors affect studentsâ€™ knowledge state estimation such as the difficulty level of the problem, time spent in solving the problem, etc. In this research work, we tackle this research problem from three perspectives: assessing the prior knowledge of students, assessing the natural language short and long studentsâ€™ responses, and knowledge tracing.Prior knowledge assessment is an important component of knowledge assessment as it facilitates the adaptation of the instruction from the very beginning, i.e., when the student starts interacting with the (computer) tutor. Grouping students into groups with similar mental models and patterns of prior level of knowledge allows the system to select the right level of scaffolding for each group of students. While not adapting instruction to each individual learner, the advantage of adapting to groups of students based on a limited number of prior knowledge levels has the advantage of decreasing the authoring costs of the tutoring system. To achieve this goal of identifying or clustering students based on their prior knowledge, we have employed effective clustering algorithms. Automatically assessing open-ended student responses is another challenging aspect of knowledge assessment in ITSs. In dialogue-based ITSs, the main interaction between the learner and the system is natural language dialogue in which students freely respond to various system prompts or initiate dialogue moves in mixed-initiative dialogue systems. Assessing freely generated student responses in such contexts is challenging as students can express the same idea in different ways owing to different individual style preferences and varied individual cognitive abilities. To address this challenging task, we have proposed several novel deep learning models as they are capable to capture rich high-level semantic features of text. Knowledge tracing (KT) is an important type of knowledge assessment which consists of tracking studentsâ€™ mastery of knowledge over time and predicting their future performances. Despite the state-of-the-art results of deep learning in this task, it has many limitations. For instance, most of the proposed methods ignore pertinent information (e.g., Prior knowledge) that can enhance the knowledge tracing capability and performance. Working toward this objective, we have proposed a generic deep learning framework that accounts for the engagement level of students, the difficulty of questions and the semantics of the questions and uses a novel times series model called Temporal Convolutional Network for future performance prediction. The advanced auto-assessment methods presented in this dissertation should enable better ways to estimate learnerâ€™s knowledge states and in turn the adaptive scaffolding those systems can provide which in turn should lead to more effective tutoring and better learning gains for students. Furthermore, the proposed method should enable more scalable development and deployment of ITSs across topics and domains for the benefit of all learners of all ages and backgrounds

University of Memphis Digital Commons

Learning Analytics Through Machine Learning and Natural Language Processing

Author: Yang Bokai
Publication venue: Scholar Commons
Publication date: 01/04/2023
Field of study

The increase of computing power and the ability to log students’ data with the help of the computer-assisted learning systems has led to an increased interest in developing and applying computer science techniques for analyzing learning data. To understand and investigate how learning-generated data can be used to improve student success, data mining techniques have been applied to several educational tasks. This dissertation investigates three important tasks in various domains of educational data mining: learners’ behavior analysis, essay structure analysis and feedback providing, and learners’ dropout prediction. The first project applied latent semantic analysis and machine learning approaches to investigate how MOOC learners’ longitudinal trajectory of meaningful forum participation facilitated learner performance. The findings have implications on refining the courses’ facilitation methods and forum design, helping improve learners’ performance, and assessing learners’ academic performance in MOOCs. The second project aims to analyze the organizational structures used in previous ACT test essays and provide an argumentative structure feedback tool driven by deep learning language models to better support the current automatic essay scoring systems and classroom settings. The third project applied MOOC learners’ forum participation states to predict dropouts with the help of hidden Markov models and other machine learning techniques. The results of this project show that forum behavior can be applied to predict dropout and evaluate the learners’ status. Overall, the results of this dissertation expand current research and shed light on how computer science techniques could further improve students’ learning experience

Scholar Commons - Institutional Repository of the University of South Carolina

5th International Open and Distance Learning Conference Proceedings Book = 5. Uluslararası Açık ve Uzaktan Öğrenme Konferansı Bildiri Kitabı

Author
Publication venue: Anadolu Üniversitesi Açıköğretim Fakültesi
Publication date: 28/09/2022
Field of study

In celebration of our 40th anniversary in open and distance learning, we are happy and proud to organize the 5th International Open & Distance Learning Conference- IODL 2022, which was held at Anadolu University, Eskişehir, Türkiye on 28-30 September 2022. After the conferences in 2002, 2006, 2010, and 2019, IODL 2022 is the 5th IODL event hosted by Anadolu University Open Education System (OES)

E-LIS