177 research outputs found
Deep learning based Arabic short answer grading in serious games
Automatic short answer grading (ASAG) has become part of natural language processing problems. Modern ASAG systems start with natural language preprocessing and end with grading. Researchers started experimenting with machine learning in the preprocessing stage and deep learning techniques in automatic grading for English. However, little research is available on automatic grading for Arabic. Datasets are important to ASAG, and limited datasets are available in Arabic. In this research, we have collected a set of questions, answers, and associated grades in Arabic. We have made this dataset publicly available. We have extended to Arabic the solutions used for English ASAG. We have tested how automatic grading works on answers in Arabic provided by schoolchildren in 6th grade in the context of serious games. We found out those schoolchildren providing answers that are 5.6 words long on average. On such answers, deep learning-based grading has achieved high accuracy even with limited training data. We have tested three different recurrent neural networks for grading. With a transformer, we have achieved an accuracy of 95.67%. ASAG for school children will help detect children with learning problems early. When detected early, teachers can solve learning problems easily. This is the main purpose of this research
Short-text semantic similarity (STSS): Techniques, challenges and future perspectives
In natural language processing, short-text semantic similarity (STSS) is a very prominent field. It has a significant impact on a broad range of applications, such as question-answering systems, information retrieval, entity recognition, text analytics, sentiment classification, and so on. Despite their widespread use, many traditional machine learning techniques are incapable of identifying the semantics of short text. Traditional methods are based on ontologies, knowledge graphs, and corpus-based methods. The performance of these methods is influenced by the manually defined rules. Applying such measures is still difficult, since it poses various semantic challenges. In the existing literature, the most recent advances in short-text semantic similarity (STSS) research are not included. This study presents the systematic literature review (SLR) with the aim to (i) explain short sentence barriers in semantic similarity, (ii) identify the most appropriate standard deep learning techniques for the semantics of a short text, (iii) classify the language models that produce high-level contextual semantic information, (iv) determine appropriate datasets that are only intended for short text, and (v) highlight research challenges and proposed future improvements. To the best of our knowledge, we have provided an in-depth, comprehensive, and systematic review of short text semantic similarity trends, which will assist the researchers to reuse and enhance the semantic information.Yayasan UTP Pre-commercialization grant (YUTP-PRG) [015PBC-005]; Computer and Information Science Department of Universiti Teknologi PETRONASYayasan UTP, YUTP: 015PBC-00
Practical and Ethical Challenges of Large Language Models in Education: A Systematic Scoping Review
Educational technology innovations leveraging large language models (LLMs)
have shown the potential to automate the laborious process of generating and
analysing textual content. While various innovations have been developed to
automate a range of educational tasks (e.g., question generation, feedback
provision, and essay grading), there are concerns regarding the practicality
and ethicality of these innovations. Such concerns may hinder future research
and the adoption of LLMs-based innovations in authentic educational contexts.
To address this, we conducted a systematic scoping review of 118 peer-reviewed
papers published since 2017 to pinpoint the current state of research on using
LLMs to automate and support educational tasks. The findings revealed 53 use
cases for LLMs in automating education tasks, categorised into nine main
categories: profiling/labelling, detection, grading, teaching support,
prediction, knowledge representation, feedback, content generation, and
recommendation. Additionally, we also identified several practical and ethical
challenges, including low technological readiness, lack of replicability and
transparency, and insufficient privacy and beneficence considerations. The
findings were summarised into three recommendations for future studies,
including updating existing innovations with state-of-the-art models (e.g.,
GPT-3/4), embracing the initiative of open-sourcing models/systems, and
adopting a human-centred approach throughout the developmental process. As the
intersection of AI and education is continuously evolving, the findings of this
study can serve as an essential reference point for researchers, allowing them
to leverage the strengths, learn from the limitations, and uncover potential
research opportunities enabled by ChatGPT and other generative AI models
"What is on your mind?" Automated Scoring of Mindreading in Childhood and Early Adolescence
In this paper we present the first work on the automated scoring of
mindreading ability in middle childhood and early adolescence. We create
MIND-CA, a new corpus of 11,311 question-answer pairs in English from 1,066
children aged 7 to 14. We perform machine learning experiments and carry out
extensive quantitative and qualitative evaluation. We obtain promising results,
demonstrating the applicability of state-of-the-art NLP solutions to a new
domain and task.Comment: Accepted in COLING 202
Artificial Intelligence-Enabled Intelligent Assistant for Personalized and Adaptive Learning in Higher Education
This paper presents a novel framework, Artificial Intelligence-Enabled
Intelligent Assistant (AIIA), for personalized and adaptive learning in higher
education. The AIIA system leverages advanced AI and Natural Language
Processing (NLP) techniques to create an interactive and engaging learning
platform. This platform is engineered to reduce cognitive load on learners by
providing easy access to information, facilitating knowledge assessment, and
delivering personalized learning support tailored to individual needs and
learning styles. The AIIA's capabilities include understanding and responding
to student inquiries, generating quizzes and flashcards, and offering
personalized learning pathways. The research findings have the potential to
significantly impact the design, implementation, and evaluation of AI-enabled
Virtual Teaching Assistants (VTAs) in higher education, informing the
development of innovative educational tools that can enhance student learning
outcomes, engagement, and satisfaction. The paper presents the methodology,
system architecture, intelligent services, and integration with Learning
Management Systems (LMSs) while discussing the challenges, limitations, and
future directions for the development of AI-enabled intelligent assistants in
education.Comment: 29 pages, 10 figures, 9659 word
Inspecting Spoken Language Understanding from Kids for Basic Math Learning at Home
Enriching the quality of early childhood education with interactive math
learning at home systems, empowered by recent advances in conversational AI
technologies, is slowly becoming a reality. With this motivation, we implement
a multimodal dialogue system to support play-based learning experiences at
home, guiding kids to master basic math concepts. This work explores Spoken
Language Understanding (SLU) pipeline within a task-oriented dialogue system
developed for Kid Space, with cascading Automatic Speech Recognition (ASR) and
Natural Language Understanding (NLU) components evaluated on our home
deployment data with kids going through gamified math learning activities. We
validate the advantages of a multi-task architecture for NLU and experiment
with a diverse set of pretrained language representations for Intent
Recognition and Entity Extraction tasks in the math learning domain. To
recognize kids' speech in realistic home environments, we investigate several
ASR systems, including the commercial Google Cloud and the latest open-source
Whisper solutions with varying model sizes. We evaluate the SLU pipeline by
testing our best-performing NLU models on noisy ASR output to inspect the
challenges of understanding children for math learning in authentic homes.Comment: Proceedings of the 18th Workshop on Innovative Use of NLP for
Building Educational Applications (BEA) at ACL 202
Large Language Model Alignment: A Survey
Recent years have witnessed remarkable progress made in large language models
(LLMs). Such advancements, while garnering significant attention, have
concurrently elicited various concerns. The potential of these models is
undeniably vast; however, they may yield texts that are imprecise, misleading,
or even detrimental. Consequently, it becomes paramount to employ alignment
techniques to ensure these models to exhibit behaviors consistent with human
values.
This survey endeavors to furnish an extensive exploration of alignment
methodologies designed for LLMs, in conjunction with the extant capability
research in this domain. Adopting the lens of AI alignment, we categorize the
prevailing methods and emergent proposals for the alignment of LLMs into outer
and inner alignment. We also probe into salient issues including the models'
interpretability, and potential vulnerabilities to adversarial attacks. To
assess LLM alignment, we present a wide variety of benchmarks and evaluation
methodologies. After discussing the state of alignment research for LLMs, we
finally cast a vision toward the future, contemplating the promising avenues of
research that lie ahead.
Our aspiration for this survey extends beyond merely spurring research
interests in this realm. We also envision bridging the gap between the AI
alignment research community and the researchers engrossed in the capability
exploration of LLMs for both capable and safe LLMs.Comment: 76 page
Exploring Automated Code Evaluation Systems and Resources for Code Analysis: A Comprehensive Survey
The automated code evaluation system (AES) is mainly designed to reliably
assess user-submitted code. Due to their extensive range of applications and
the accumulation of valuable resources, AESs are becoming increasingly popular.
Research on the application of AES and their real-world resource exploration
for diverse coding tasks is still lacking. In this study, we conducted a
comprehensive survey on AESs and their resources. This survey explores the
application areas of AESs, available resources, and resource utilization for
coding tasks. AESs are categorized into programming contests, programming
learning and education, recruitment, online compilers, and additional modules,
depending on their application. We explore the available datasets and other
resources of these systems for research, analysis, and coding tasks. Moreover,
we provide an overview of machine learning-driven coding tasks, such as bug
detection, code review, comprehension, refactoring, search, representation, and
repair. These tasks are performed using real-life datasets. In addition, we
briefly discuss the Aizu Online Judge platform as a real example of an AES from
the perspectives of system design (hardware and software), operation
(competition and education), and research. This is due to the scalability of
the AOJ platform (programming education, competitions, and practice), open
internal features (hardware and software), attention from the research
community, open source data (e.g., solution codes and submission documents),
and transparency. We also analyze the overall performance of this system and
the perceived challenges over the years
Deep learning applied to the assessment of online student programming exercises
Massive online open courses (MOOCs) teaching coding are increasing in number and popularity. They commonly include homework assignments in which the students must write code that is evaluated by
functional tests. Functional testing may to some extent be automated
however provision of more qualitative evaluation and feedback may
be prohibitively labor-intensive. Provision of qualitative evaluation at
scale, automatically, is the subject of much research effort.
In this thesis, deep learning is applied to the task of performing
automatic assessment of source code, with a focus on provision of
qualitative feedback. Four tasks: language modeling, detecting idiomatic code, semantic code search, and predicting variable names are
considered in detail.
First, deep learning models are applied to the task of language modeling source code. A comparison is made between the performance of
different deep learning language models, and it is shown how language
models can be used for source code auto-completion. It is also demonstrated how language models trained on source code can be used for
transfer learning, providing improved performance on other tasks.
Next, an analysis is made on how the language models from the
previous task can be used to detect idiomatic code. It is shown that
these language models are able to locate where a student has deviated
from correct code idioms. These locations can be highlighted to the
student in order to provide qualitative feedback.
Then, results are shown on semantic code search, again comparing
the performance across a variety of deep learning models. It is demonstrated how semantic code search can be used to reduce the time taken
for qualitative evaluation, by automatically pairing a student submission with an instructor’s hand-written feedback.
Finally, it is examined how deep learning can be used to predict
variable names within source code. These models can be used in a
qualitative evaluation setting where the deep learning models can be
used to suggest more appropriate variable names. It is also shown that
these models can even be used to predict the presence of functional
errors.
Novel experimental results show that: fine-tuning a pre-trained
language model is an effective way to improve performance across a
variety of tasks on source code, improving performance by 5% on average; pre-trained language models can be used as zero-shot learners across a variety of tasks, with the zero-shot performance of some architectures outperforming the fine-tuned performance of others; and
that language models can be used to detect both semantic and syntactic errors. Other novel findings include: removing the non-variable
tokens within source code has negligible impact on the performance of
models, and that these remaining tokens can be shuffled with only a
minimal decrease in performance.Engineering and Physical Sciences Research Council (EPSRC) fundin
Augmented Behavioral Annotation Tools, with Application to Multimodal Datasets and Models: A Systematic Review
Annotation tools are an essential component in the creation of datasets for machine learning purposes. Annotation tools have evolved greatly since the turn of the century, and now commonly include collaborative features to divide labor efficiently, as well as automation employed to amplify human efforts. Recent developments in machine learning models, such as Transformers, allow for training upon very large and sophisticated multimodal datasets and enable generalization across domains of knowledge. These models also herald an increasing emphasis on prompt engineering to provide qualitative fine-tuning upon the model itself, adding a novel emerging layer of direct machine learning annotation. These capabilities enable machine intelligence to recognize, predict, and emulate human behavior with much greater accuracy and nuance, a noted shortfall of which have contributed to algorithmic injustice in previous techniques. However, the scale and complexity of training data required for multimodal models presents engineering challenges. Best practices for conducting annotation for large multimodal models in the most safe and ethical, yet efficient, manner have not been established. This paper presents a systematic literature review of crowd and machine learning augmented behavioral annotation methods to distill practices that may have value in multimodal implementations, cross-correlated across disciplines. Research questions were defined to provide an overview of the evolution of augmented behavioral annotation tools in the past, in relation to the present state of the art. (Contains five figures and four tables)
- …