302 research outputs found

    Advancement Auto-Assessment of Students Knowledge States from Natural Language Input

    Get PDF
    Knowledge Assessment is a key element in adaptive instructional systems and in particular in Intelligent Tutoring Systems because fully adaptive tutoring presupposes accurate assessment. However, this is a challenging research problem as numerous factors affect students’ knowledge state estimation such as the difficulty level of the problem, time spent in solving the problem, etc. In this research work, we tackle this research problem from three perspectives: assessing the prior knowledge of students, assessing the natural language short and long students’ responses, and knowledge tracing.Prior knowledge assessment is an important component of knowledge assessment as it facilitates the adaptation of the instruction from the very beginning, i.e., when the student starts interacting with the (computer) tutor. Grouping students into groups with similar mental models and patterns of prior level of knowledge allows the system to select the right level of scaffolding for each group of students. While not adapting instruction to each individual learner, the advantage of adapting to groups of students based on a limited number of prior knowledge levels has the advantage of decreasing the authoring costs of the tutoring system. To achieve this goal of identifying or clustering students based on their prior knowledge, we have employed effective clustering algorithms. Automatically assessing open-ended student responses is another challenging aspect of knowledge assessment in ITSs. In dialogue-based ITSs, the main interaction between the learner and the system is natural language dialogue in which students freely respond to various system prompts or initiate dialogue moves in mixed-initiative dialogue systems. Assessing freely generated student responses in such contexts is challenging as students can express the same idea in different ways owing to different individual style preferences and varied individual cognitive abilities. To address this challenging task, we have proposed several novel deep learning models as they are capable to capture rich high-level semantic features of text. Knowledge tracing (KT) is an important type of knowledge assessment which consists of tracking students’ mastery of knowledge over time and predicting their future performances. Despite the state-of-the-art results of deep learning in this task, it has many limitations. For instance, most of the proposed methods ignore pertinent information (e.g., Prior knowledge) that can enhance the knowledge tracing capability and performance. Working toward this objective, we have proposed a generic deep learning framework that accounts for the engagement level of students, the difficulty of questions and the semantics of the questions and uses a novel times series model called Temporal Convolutional Network for future performance prediction. The advanced auto-assessment methods presented in this dissertation should enable better ways to estimate learner’s knowledge states and in turn the adaptive scaffolding those systems can provide which in turn should lead to more effective tutoring and better learning gains for students. Furthermore, the proposed method should enable more scalable development and deployment of ITSs across topics and domains for the benefit of all learners of all ages and backgrounds

    VLEngagement: A Dataset of Scientific Video Lectures for Evaluating Population-based Engagement

    Get PDF
    With the emergence of e-learning and personalised education, the production and distribution of digital educational resources have boomed. Video lectures have now become one of the primary modalities to impart knowledge to masses in the current digital age. The rapid creation of video lecture content challenges the currently established human-centred moderation and quality assurance pipeline, demanding for more efficient, scalable and automatic solutions for managing learning resources. Although a few datasets related to engagement with educational videos exist, there is still an important need for data and research aimed at understanding learner engagement with scientific video lectures. This paper introduces VLEngagement, a novel dataset that consists of content-based and video-specific features extracted from publicly available scientific video lectures and several metrics related to user engagement. We introduce several novel tasks related to predicting and understanding context-agnostic engagement in video lectures, providing preliminary baselines. This is the largest and most diverse publicly available dataset to our knowledge that deals with such tasks. The extraction of Wikipedia topic-based features also allows associating more sophisticated Wikipedia based features to the dataset to improve the performance in these tasks. The dataset, helper tools and example code snippets are available publicly at https://github.com/sahanbull/context-agnostic-engagemen

    Can Population-based Engagement Improve Personalisation? A Novel Dataset and Experiments

    Get PDF
    This work explores how population-based engagement prediction can address cold-start at scale in large learning resource collections. This paper introduces i) VLE, a novel dataset that consists of content and video based features extracted from publicly available scientific video lectures coupled with implicit and explicit signals related to learner engagement, ii) two standard tasks related to predicting and ranking context-agnostic engagement in video lectures with preliminary baselines and iii) a set of experiments that validate the usefulness of the proposed dataset. Our experimental results indicate that the newly proposed VLE dataset leads to building context-agnostic engagement prediction models that are significantly performant than ones based on previous datasets, mainly attributing to the increase of training examples. VLE dataset’s suitability in building models towards Computer Science/ Artificial Intelligence education focused on e-learning/ MOOC use-cases is also evidenced. Further experiments in combining the built model with a personalising algorithm show promising improvements in addressing the cold-start problem encountered in educational recommenders. This is the largest and most diverse publicly available dataset to our knowledge that deals with learner engagement prediction tasks. The dataset, helper tools, descriptive statistics and example code snippets are available publicly

    Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

    Get PDF
    Indonesian and Malay are underrepresented in the development of natural language processing (NLP) technologies and available resources are difficult to find. A clear picture of existing work can invigorate and inform how researchers conceptualise worthwhile projects. Using an education sector project to motivate the study, we conducted a wide-ranging overview of Indonesian and Malay human language technologies and corpus work. We charted 657 included studies according to Hirschberg and Manning's 2015 description of NLP, concluding that the field was dominated by exploratory corpus work, machine reading of text gathered from the Internet, and sentiment analysis. In this paper, we identify most published authors and research hubs, and make a number of recommendations to encourage future collaboration and efficiency within NLP in Indonesian and Malay

    Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

    Get PDF
    This paper surveys the current state of the art in Natural Language Generation (NLG), defined as the task of generating text or speech from non-linguistic input. A survey of NLG is timely in view of the changes that the field has undergone over the past decade or so, especially in relation to new (usually data-driven) methods, as well as new applications of NLG technology. This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organised; (b) highlight a number of relatively recent research topics that have arisen partly as a result of growing synergies between NLG and other areas of artificial intelligence; (c) draw attention to the challenges in NLG evaluation, relating them to similar challenges faced in other areas of Natural Language Processing, with an emphasis on different evaluation methods and the relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118 pages, 8 figures, 1 tabl

    A Corpus-Based Analysis of Cohesion in L2 Writing by Undergraduates in Ecuador

    Get PDF
    In finding out the nature of cohesion in L2 writing, the present study set out to address three research questions: (1) What types of cohesion relations occur in L2 writing at the sentence, paragraph, and whole-text levels? (2) What is the relationship between lexico-grammatical cohesion features and teachers’ judgements of writing quality? (3) Do expectations of cohesion suggested by the CEFR match what is found in student writing? To answer those questions, a corpus of 240 essays and 240 emails from college- level students learning English as a foreign language in Ecuador enabled the analysis of cohesion. Each text included the scores, or teachers’ judgements of writing quality aligned to the upper-intermediate level (or B2) as proposed by the Common European Framework of Reference for learning, teaching, and assessing English as a foreign language. Lexical and grammatical items used by L2 students to build relationships of meaning in sentences, paragraphs, and the entire text were considered to analyse cohesion in L2 writing. Utilising Natural Language Processing tools (e.g., TAACO, TextInspector, NVivo), the analysis focused on determining which cohesion features (e.g., word repetition/overlap, semantical similarity, connective words) predicted the teachers’ judgements of writing quality in the collected essays and emails. The findings indicate that L2 writing is characterised by word overlap and synonyms occurring at the paragraph level and, to a lesser degree, cohesion between sentences and the entire text (e.g., connective words). Whilst these cohesion features positively and negatively predicted the teachers’ scores, a cautious interpretation of these findings is required, as many other factors beyond cohesion features must have also influenced the allocation of scores in L2 writing

    Challenges and Remedies to Privacy and Security in AIGC: Exploring the Potential of Privacy Computing, Blockchain, and Beyond

    Full text link
    Artificial Intelligence Generated Content (AIGC) is one of the latest achievements in AI development. The content generated by related applications, such as text, images and audio, has sparked a heated discussion. Various derived AIGC applications are also gradually entering all walks of life, bringing unimaginable impact to people's daily lives. However, the rapid development of such generative tools has also raised concerns about privacy and security issues, and even copyright issues in AIGC. We note that advanced technologies such as blockchain and privacy computing can be combined with AIGC tools, but no work has yet been done to investigate their relevance and prospect in a systematic and detailed way. Therefore it is necessary to investigate how they can be used to protect the privacy and security of data in AIGC by fully exploring the aforementioned technologies. In this paper, we first systematically review the concept, classification and underlying technologies of AIGC. Then, we discuss the privacy and security challenges faced by AIGC from multiple perspectives and purposefully list the countermeasures that currently exist. We hope our survey will help researchers and industry to build a more secure and robust AIGC system.Comment: 43 pages, 10 figure


    Get PDF
    Fluency in a second language (L2) is one of the most important skills for the modern world. However, adults learning a new language face many obstacles, including motivation, time, and other challenges in learning. Technology learning tools may help solve these problems. In this dissertation, I tested the effectiveness of cognitive word games as a vocabulary learning method, with the main goal of investigating how different word games including a crossword paradigm task, a free association task and a word-stem completion task were effective at improving vocabulary memory access. The games selectively increased semantic (meaning) or orthographic (spelling) associations in an English lexicon, which may lead to improved access and usage of L2 vocabulary. Three experiments were conducted. Experiment 1 examined lexical memory and recognition/retrieval processes in native English speakers. The results showed a significant effect of the game conditions on response times of a lexical association task, such that the most effective training game was the free association task. Experiment 2 was designed to probe the same game effectiveness with non-native English speakers. This time, the findings indicated significant effects of the training games on correct responses of the lexical association task and response times of a new anagram solving task. Experiment 3 was designed to investigate the game effectiveness on comprehensive English reading test scores. The results suggested that after a week of training, the games failed to improve learners\u27 performance on the English reading scores. However, training methods differed in how much the learners improved during the practice, with crossword practice leading to large improvements and word stem completion getting worse, indicating differences in engagement and in-task language learning. In addition, feedback from participants revealed that some of them enjoyed the games, especially the crossword paradigm task. In summary, these studies provided a broad understanding of using the word games to enhance English vocabulary skills. The games can be used for further lexical investigations or adapted for classroom purposes

    Novel Datasets, User Interfaces and Learner Models to Improve Learner Engagement Prediction on Educational Videos

    Get PDF
    With the emergence of Open Education Resources (OERs), educational content creation has rapidly scaled up, making a large collection of new materials made available. Among these, we find educational videos, the most popular modality for transferring knowledge in the technology-enhanced learning paradigm. Rapid creation of learning resources opens up opportunities in facilitating sustainable education, as the potential to personalise and recommend specific materials that align with individual users’ interests, goals, knowledge level, language and stylistic preferences increases. However, the quality and topical coverage of these materials could vary significantly, posing significant challenges in managing this large collection, including the risk of negative user experience and engagement with these materials. The scarcity of support resources such as public datasets is another challenge that slows down the development of tools in this research area. This thesis develops a set of novel tools that improve the recommendation of educational videos. Two novel datasets and an e-learning platform with a novel user interface are developed to support the offline and online testing of recommendation models for educational videos. Furthermore, a set of learner models that accounts for the learner interests, knowledge, novelty and popularity of content is developed through this thesis. The different models are integrated together to propose a novel learner model that accounts for the different factors simultaneously. The user studies conducted on the novel user interface show that the new interface encourages users to explore the topical content more rigorously before making relevance judgements about educational videos. Offline experiments on the newly constructed datasets show that the newly proposed learner models outperform their relevant baselines significantly
    • …