13 research outputs found

    Context Matters: A Strategy to Pre-train Language Model for Science Education

    Full text link
    This study aims at improving the performance of scoring student responses in science education automatically. BERT-based language models have shown significant superiority over traditional NLP models in various language-related tasks. However, science writing of students, including argumentation and explanation, is domain-specific. In addition, the language used by students is different from the language in journals and Wikipedia, which are training sources of BERT and its existing variants. All these suggest that a domain-specific model pre-trained using science education data may improve model performance. However, the ideal type of data to contextualize pre-trained language model and improve the performance in automatically scoring student written responses remains unclear. Therefore, we employ different data in this study to contextualize both BERT and SciBERT models and compare their performance on automatic scoring of assessment tasks for scientific argumentation. We use three datasets to pre-train the model: 1) journal articles in science education, 2) a large dataset of students' written responses (sample size over 50,000), and 3) a small dataset of students' written responses of scientific argumentation tasks. Our experimental results show that in-domain training corpora constructed from science questions and responses improve language model performance on a wide variety of downstream tasks. Our study confirms the effectiveness of continual pre-training on domain-specific data in the education domain and demonstrates a generalizable strategy for automating science education tasks with high accuracy. We plan to release our data and SciEdBERT models for public use and community engagement

    NLP-based personal learning assistant for school education

    Get PDF
    Computer-based knowledge and computation systems are becoming major sources of leverage for multiple industry segments. Hence, educational systems and learning processes across the world are on the cusp of a major digital transformation. This paper seeks to explore the concept of an artificial intelligence and natural language processing (NLP) based intelligent tutoring system (ITS) in the context of computer education in primary and secondary schools. One of the components of an ITS is a learning assistant, which can enable students to seek assistance as and when they need, wherever they are. As part of this research, a pilot prototype chatbot was developed, to serve as a learning assistant for the subject Scratch (Scratch is a graphical utility used to teach school children the concepts of programming). By the use of an open source natural language understanding (NLU) or NLP library, and a slackbased UI, student queries were input to the chatbot, to get the sought explanation as the answer. Through a two-stage testing process, the chatbot’s NLP extraction and information retrieval performance were evaluated. The testing results showed that the ontology modelling for such a learning assistant was done relatively accurately, and shows its potential to be pursued as a cloud-based solution in future

    Matching Exemplar as Next Sentence Prediction (MeNSP): Zero-shot Prompt Learning for Automatic Scoring in Science Education

    Full text link
    Developing models to automatically score students' written responses to science problems is critical for science education. However, collecting and labeling sufficient student responses for training models is time and cost-consuming. Recent studies suggest that pre-trained language models (PLMs) can be adapted to downstream tasks without fine-tuning with prompts. However, no research has employed such a prompt approach in science education. As student responses are presented with natural language, aligning the scoring procedure as the next sentence prediction task using prompts can skip the costly fine-tuning stage. In this study, we developed a zero-shot approach to automatically score student responses via Matching Exemplars as Next Sentence Prediction (MeNSP). This approach employs no training samples. We first apply MeNSP in scoring three assessment tasks of scientific argumentation and found machine-human scoring agreements, Cohen's Kappa ranges from 0.30 to 0.57, and F1 score ranges from 0.54 to 0.81. To improve the performance, we extend our research to the few-shots setting, either randomly selecting labeled student responses or manually constructing responses to fine-tune the models. We find that one task's performance is improved with more samples, Cohen's Kappa from 0.30 to 0.38, and F1 score from 0.54 to 0.59; for the two others, scoring performance is not improved. We also find that randomly selected few-shots perform better than the human expert-crafted approach. This study suggests that MeNSP can yield referable automatic scoring for student responses while significantly reducing the cost of model training. This method can benefit low-stakes classroom assessment practices in science education. Future research should further explore the applicability of the MeNSP in different types of assessment tasks in science education and improve the model performance.Comment: 10+3 page

    Natural Language Processing Applications in Business

    Get PDF
    Increasing dependency of humans on computer-assisted systems has led to researchers focusing on more effective communication technologies that can mimic human interactions as well as understand natural languages and human emotions. The problem of information overload in every sector, including business, healthcare, education etc., has led to an increase in unstructured data, which is considered not to be useful. Natural language processing (NLP) in this context is one of the effective technologies that can be integrated with advanced technologies, such as machine learning, artificial intelligence, and deep learning, to improve the process of understanding and processing the natural language. This can enable human-computer interaction in a more effective way as well as allow for the analysis and formatting of large volumes of unusable and unstructured data/text in various industries. This will deliver meaningful outcomes that can enhance decision-making and thus improve operational efficiency. Focusing on this aspect, this chapter explains the concept of NLP, its history and development, while also reviewing its application in various industrial sectors

    Using Natural Language Processing to Increase Modularity and Interpretability of Automated Essay Evaluation and Student Feedback

    Get PDF
    For English teachers and students who are dissatisfied with the one-size-fits-all approach of current Automated Essay Scoring (AES) systems, this research uses Natural Language Processing (NLP) techniques that provide a focus on configurability and interpretability. Unlike traditional AES models which are designed to provide an overall score based on pre-trained criteria, this tool allows teachers to tailor feedback based upon specific focus areas. The tool implements a user-interface that serves as a customizable rubric. Students’ essays are inputted into the tool either by the student or by the teacher via the application’s user-interface. Based on the rubric settings, the tool evaluates the essay and provides instant feedback. In addition to rubric-based feedback, the tool also implements a Multi-Armed Bandit recommender engine to suggest educational resources to the student that align with the rubric. Thus, reducing the amount of time teachers spend grading essay drafts and re-teaching. The tool developed and deployed as part of this research reduces the burden on teachers and provides instant, customizable feedback to students. Our minimum estimation for time savings to students and teachers is 117 hours per semester. The effectiveness of the feedback criteria for predicting if an essay was proficient or needs improvement was measured using recall. The recall for the model built for the persuasive essays was 0.96 and 0.86 for the source dependent essay model

    Cybersecurity Awareness and Training (CAT) Framework for Remote Working Employees

    Get PDF
    Currently, cybersecurity plays an essential role in computing and information technology due to its direct effect on organizations’ critical assets and information. Cybersecurity is applied using integrity, availability, and confidentiality to protect organizational assets and information from various malicious attacks and vulnerabilities. The COVID-19 pandemic has generated different cybersecurity issues and challenges for businesses as employees have become accustomed to working from home. Firms are speeding up their digital transformation, making cybersecurity the current main concern. For software and hardware systems protection, organizations tend to spend an excessive amount of money procuring intrusion detection systems, antivirus software, antispyware software, and encryption mechanisms. However, these solutions are not enough, and organizations continue to suffer security risks due to the escalating list of security vulnerabilities during the COVID-19 pandemic. There is a thriving need to provide a cybersecurity awareness and training framework for remote working employees. The main objective of this research is to propose a CAT framework for cybersecurity awareness and training that will help organizations to evaluate and measure their employees’ capability in the cybersecurity domain. The proposed CAT framework will assist different organizations in effectively and efficiently managing security-related issues and challenges to protect their assets and critical information. The developed CAT framework consists of three key levels and twenty-five core practices. Case studies are conducted to evaluate the usefulness of the CAT framework in cybersecurity-based organizational settings in a real-world environment. The case studies’ results showed that the proposed CAT framework can identify employees’ capability levels and help train them to effectively overcome the cybersecurity issues and challenges faced by the organizations

    Predicting the Need for Urgent Instructor Intervention in MOOC Environments

    Get PDF
    In recent years, massive open online courses (MOOCs) have become universal knowledge resources and arguably one of the most exciting innovations in e-learning environments. MOOC platforms comprise numerous courses covering a wide range of subjects and domains. Thousands of learners around the world enrol on these online platforms to satisfy their learning needs (mostly) free of charge. However, the retention rates of MOOC courses (i.e., those who successfully complete a course of study) are low (around 10% on average); dropout rates tend to be very high (around 90%). The principal channel via which MOOC learners can communicate their difficulties with the learning content and ask for assistance from instructors is by posting in a dedicated MOOC forum. Importantly, in the case of learners who are suffering from burnout or stress, some of these posts require urgent intervention. Given the above, urgent instructor intervention regarding learner requests for assistance via posts made on MOOC forums has become an important topic for research among researchers. Timely intervention by MOOC instructors may mitigate dropout issues and make the difference between a learner dropping out or staying on a course. However, due to the typically extremely high learner-to-instructor ratio in MOOCs and the often-huge numbers of posts on forums, while truly urgent posts are rare, managing them can be very challenging –– if not sometimes impossible. Instructors can find it challenging to monitor all existing posts and identify which posts require immediate intervention to help learners, encourage retention, and reduce the current high dropout rates. The main objective of this research project, therefore, was thus to mine and analyse learners’ MOOC posts as a fundamental step towards understanding their need for instructor intervention. To achieve this, the researcher proposed and built comprehensive classification models to predict the need for instructor intervention. The ultimate goal is to help instructors by guiding them to posts, topics, and learners that require immediate interventions. Given the above research aim the researcher conducted different experiments to fill the gap in literature based on different platform datasets (the FutureLearn platform and the Stanford MOOCPosts dataset) in terms of the former, three MOOC corpora were prepared: two of them gold-standard MOOC corpora to identify urgent posts, annotated by selected experts in the field; the third is a corpus detailing learner dropout. Based in these datasets, different architectures and classification models based on traditional machine learning, and deep learning approaches were proposed. In this thesis, the task of determining the need for instructor intervention was tackled from three perspectives: (i) identifying relevant posts, (ii) identifying relevant topics, and (iii) identifying relevant learners. Posts written by learners were classified into two categories: (i) (urgent) intervention and (ii) (non-urgent) intervention. Also, learners were classified into: (i) requiring instructor intervention (at risk of dropout) and (ii) no need for instructor intervention (completer). In identifying posts, two experiments were used to contribute to this field. The first is a novel classifier based on a deep learning model that integrates novel MOOC post dimensions such as numerical data in addition to textual data; this represents a novel contribution to the literature as all available models at the time of writing were based on text-only. The results demonstrate that the combined, multidimensional features model proposed in this project is more effective than the text-only model. The second contribution relates to creating various simple and hybrid deep learning models by applying plug & play techniques with different types of inputs (word-based or word-character-based) and different ways of representing target input words as vector representations of a particular word. According to the experimental findings, employing Bidirectional Encoder Representations from Transformers (BERT) for word embedding rather than word2vec as the former is more effective at the intervention task than the latter across all models. Interestingly, adding word-character inputs with BERT does not improve performance as it does for word2vec. Additionally, on the task of identifying topics, this is the first time in the literature that specific language terms to identify the need for urgent intervention in MOOCs were obtained. This was achieved by analysing learner MOOC posts using latent Dirichlet allocation (LDA) and offers a visualisation tool for instructors or learners that may assist them and improve instructor intervention. In addition, this thesis contributes to the literature by creating mechanisms for identifying MOOC learners who may need instructor intervention in a new context, i.e., by using their historical online forum posts as a multi-input approach for other deep learning architectures and Transformer models. The findings demonstrate that using the Transformer model is more effective at identifying MOOC learners who require instructor intervention. Next, the thesis sought to expand its methodology to identify posts that relate to learner behaviour, which is also a novel contribution, by proposing a novel priority model to identify the urgency of intervention building based on learner histories. This model can classify learners into three groups: low risk, mid risk, and high risk. The results show that the completion rates of high-risk learners are very low, which confirms the importance of this model. Next, as MOOC data in terms of urgent posts tend to be highly unbalanced, the thesis contributes by examining various data balancing methods to spot situations in which MOOC posts urgently require instructor assistance. This included developing learner and instructor models to assist instructors to respond to urgent MOOCs posts. The results show that models with undersampling can predict the most urgent cases; 3x augmentation + undersampling usually attains the best performance. Finally, for the first time, this thesis contributes to the literature by applying text classification explainability (eXplainable Artificial Intelligence (XAI)) to an instructor intervention model, demonstrating how using a reliable predictor in combination with XAI and colour-coded visualisation could be utilised to assist instructors in deciding when posts require urgent intervention, as well as supporting annotators to create high-quality, gold-standard datasets to determine posts cases where urgent intervention is required

    Augmenting the CoAST system with automated text simplification

    Get PDF
    Proper comprehension of academic texts is important for students in higher education. The CoAST platform is a virtual learning environment that endeavours to improve reading comprehension by augmenting theoretically, and lexically, complex texts with helpful annotations provided by a teacher. This thesis extends the CoAST system, and introduces machine learning models that assist the teacher with identifying complex terminology, and writing annotations, by providing relevant definitions for a given word or phrase. A deep learning model is implemented to retrieve definitions for words, or phrases of a arbitrary length. This model surpasses previous work on the task of definition modelling, when evaluated on various automated benchmarks. We investigate the task of complex word identification, producing two convolutional based models that predict the complexity of words and two-word phrases in a context dependent manner. These models were submitted as part of the Lexical Complexity Prediction 2021 shared task, and showed results in a comparable range to that of other submissions. Both of these models are integrated into the CoAST system and evaluated through an online study. When selecting complex words from a document, the teacher’s selections, shared a sizeable overlap with the systems predictions. Results suggest that the technologies introduced in this work would benefit students, and teachers, using the CoAST system

    Understanding, assessing, and facilitating entrepreneurship competence in the digital age

    Full text link
    Entrepreneurship and entrepreneurship education become mainstream inside and outside business schools after scholarships and educators from this field made efforts in the past two decades. Nutrition of entrepreneurship competence is an emergency task for the economy and society especially during economic shock and uncertainty. Digital entrepreneurship competence brings new possibilities for learners living in this digital world. This study facilitates digital entrepreneurship and digital entrepreneurship competence as 21st-century skills at the higher education level, experimenting in Chinese universities and colleges. In addition, this research will help stakeholders in Germany and other countries whose learners lack such knowledge and skills. I propose a methodology set consisting of three main ingredients. Initially, a systematic review was undertaken by the researcher in collaboration with two educators who specialized in entrepreneurship theories and practice to extract insights on the utilization of educational technologies in the context of entrepreneurship education. In response to the current trend of educational technology, a comprehensive examination was conducted to scrutinize the regulations and potential of AI within entrepreneurship learning and teaching. Secondly, the present study endeavored to assess the effectiveness of virtual team learning in online entrepreneurship education during the COVID-19 pandemic, taking into consideration the dimensions of teamwork, taskwork, and information and communication technology. In the final investigation, a digital entrepreneurship training program was administered through an online platform, with the aim of obtaining both quantitative and qualitative feedback regarding the program’s effectiveness and assessing the participants’ digital entrepreneurship competence. The following presents a summary of each study: Regarding the systematic review on the utilization of educational technologies in entrepreneurship education, Study 1 uncovered that social media, serious games, and digital platforms emerged as three prominent technological approaches. In light of extensive application of artificial intelligence in various educational domains, Study 2 delved into its utilization within the context of entrepreneurship education. The findings indicated the prevalence of machine learning, big data analysis, and adaptive learning systems in this field. Meanwhile, the investigation identified potential prospects for the integration of natural language processing and chatbots in entrepreneurship teaching and learning. I evaluated online entrepreneurship education courses, supported by virtual teams from existing freely available learning content and multimedia materials. Evaluation of the content and materials is whether they fit the needs of educators and learners with various demographic backgrounds. Specifically, we evaluated the influence of gender and other demographic backgrounds on virtual team learning and its impact on entrepreneurship competence. Furthermore, experiential learning in online settings was explored in the field of entrepreneurship, focusing on the evaluation of an online practical entrepreneurship training program using the digital entrepreneurship competence framework. The research showed that digital opportunity identification competence is apparently improved from a complete novice to a nascent entrepreneur who understands the theory and practice of digital entrepreneurship. However, the effectiveness of online practical learning is limited because of participants’ isolation. If possible, tutorials and project guides are conducted online whereas experiential learning is partly moved into face-to-face contexts. To analyze entrepreneurship competence in the digital age, this thesis construct and discuss theoretical frameworks, namely entrepreneurship education, educational technology, and digital entrepreneurship competence. The current studies seldom analyze entrepreneurship competence in online entrepreneurship education programs. Therefore, this research attempts to understand, assess, and facilitate entrepreneurship competence and digital entrepreneurship competence in the digital age. The thesis consists of two qualitative studies (Study 1 and 3) and two quantitative studies (Study 2 and 4). This research aims to offer valuable insights for developing countries engaging in entrepreneurship education with limited resources, enabling the younger generation to navigate the path of venture creation. It holds practical and theoretical implications, establishing a solid foundation for online entrepreneurship education and fostering digital entrepreneurship competence. It is hoped that this thesis will inspire scholars and policy-makers to actively contribute to this field and work collaboratively

    Laburpen-gaitasunaren garapena eta eskolako laburpen-testuen prozesamendua

    Get PDF
    270 p.Tesi honetan laburpen-gaitasunaren garapenari heldu diogu, eskolako laburpen-testuen prozesamenduaren bidez. Bi helburu nagusi izan ditugu: i) laburpen-gaitasunaren egoeraren azterketa egitea; eta, horretarako, laburpenaren oinarri teorikoak finkatu ditugu eta laburpen-testuen deskribapena egin dugu. ii) Hezkuntza- eta hizkuntza-teknologiak erabiliz laburpena eskolan lantzeko eta ebaluatzeko proposamena egitea. Helburuak erdiesteko, Hizkuntzaren Prozesamenduko teknikak (bereziki diskurtsoan oinarrituz) erabili ditugu, teknika horiei ikuspegi didaktikotik helduz. Euskarazko laburpen-corpusa biltzeko sortu dugun Compress-eus tresnarekin, LabEus corpusa bildu dugu, LHko eta unibertsitateko ikasleen 1758 laburpenez osatua. Ikasleek estrakzio- eta abstrakzio-laburpenak egingo dituzte. LabEus corpusetik, 80 laburpenekin EskoLab corpusa sortu dugu, eta, laburpenak sortzeko prozesuan gertatzen dena ulertzeko, ikerketa-galderak zehaztu eta etiketatze-lana egin dugu. Ondoren, laburpenen ebaluaziorako baliabideak eta prozedurak diseinatu eta sortu ditugu. i) Metalaburpenak eratzeko algoritmoa, ii) laburpenak egiteko eta ebaluatzeko irizpideak eta errubrika, iii) laburpenaren hierarkiaren atzeraelikadura automatikoaren bi bertsio, HIMAM eta GOM metodoetan oinarrituak. Bukatzeko, sortutako baliabideekin, laburpena lantzeko hiru tailer burutu ditugu euskaraz eta ingelesez. Bi laburpen bat egiteko behar diren irizpideak barneratuz laburpen-gaitasuna garatzeko. Hirugarrenarena, bestalde, diskurtsoan oinarrituz, laburpena lantzeko teknika ezberdinak ezagutu eta horien inguruan hausnartzeko
    corecore