2,210 research outputs found
Deep learning based Arabic short answer grading in serious games
Automatic short answer grading (ASAG) has become part of natural language processing problems. Modern ASAG systems start with natural language preprocessing and end with grading. Researchers started experimenting with machine learning in the preprocessing stage and deep learning techniques in automatic grading for English. However, little research is available on automatic grading for Arabic. Datasets are important to ASAG, and limited datasets are available in Arabic. In this research, we have collected a set of questions, answers, and associated grades in Arabic. We have made this dataset publicly available. We have extended to Arabic the solutions used for English ASAG. We have tested how automatic grading works on answers in Arabic provided by schoolchildren in 6th grade in the context of serious games. We found out those schoolchildren providing answers that are 5.6 words long on average. On such answers, deep learning-based grading has achieved high accuracy even with limited training data. We have tested three different recurrent neural networks for grading. With a transformer, we have achieved an accuracy of 95.67%. ASAG for school children will help detect children with learning problems early. When detected early, teachers can solve learning problems easily. This is the main purpose of this research
Development of an Automated Scoring Model Using SentenceTransformers for Discussion Forums in Online Learning Environments
Due to the limitations of public datasets, research on automatic essay scoring in Indonesian has been restrained and resulted in suboptimal accuracy. In general, the main goal of the essay scoring system is to improve execution time, which is usually done manually with human judgment. This study uses a discussion forum in online learning to generate an assessment between the responses and the lecturer\u27s rubric in the automated essay scoring. A SentenceTransformers pre-trained model that can construct the highest vector embedding was proposed to identify the semantic meaning between the responses and the lecturer\u27s rubric. The effectiveness of monolingual and multilingual models was compared. This research aims to determine the model\u27s effectiveness and the appropriate model for the Automated Essay Scoring (AES) used in paired sentence Natural Language Processing tasks. The distiluse-base-multilingual-cased-v1 model, which employed the Pearson correlation method, obtained the highest performance. Specifically, it obtained a correlation value of 0.63 and a mean absolute error (MAE) score of 0.70. It indicates that the overall prediction result is enhanced when compared to the earlier regression task research
Crowdsourcing in Computer Vision
Computer vision systems require large amounts of manually annotated data to
properly learn challenging visual concepts. Crowdsourcing platforms offer an
inexpensive method to capture human knowledge and understanding, for a vast
number of visual perception tasks. In this survey, we describe the types of
annotations computer vision researchers have collected using crowdsourcing, and
how they have ensured that this data is of high quality while annotation effort
is minimized. We begin by discussing data collection on both classic (e.g.,
object recognition) and recent (e.g., visual story-telling) vision tasks. We
then summarize key design decisions for creating effective data collection
interfaces and workflows, and present strategies for intelligently selecting
the most important data instances to annotate. Finally, we conclude with some
thoughts on the future of crowdsourcing in computer vision.Comment: A 69-page meta review of the field, Foundations and Trends in
Computer Graphics and Vision, 201
Exploration of aunnotation strategies for entailment-based Automatic Short Answer Grading
[EN] Recent work has shown that Automatic Short Answer Grading can effectively be
reformulated as a Textual Entailment problem. In this work we show that this
reformulation is also effective in zero-shot and few-shot settings, where we report
competent results close to state-of-the-art performance with the few-shot setting. More
importantly, we show that the annotation strategy can have significant impact on
performance. When annotating few examples, empirical results show that increasing the
variability on the question side, at cost of decreasing the amount of annotated answers
per question, is preferable than having the same number of annotated examples with less
questions and more answers. With this annotation strategy, using only the 10% of the full
training set our model levels with state-of-the-art systems in the SciEntsBank dataset.
Finally, experiments over SciEntsBank and Beetle domains show that the use of
out-of-domain annotated question-answer examples can be harmful, concluding that
task-aware fine-tuned models obtain significantly lower results compared to task-agnostic
general purpose inference models, at least with the domains employed for this work.[EU] Erantzun labur automatikoen sailkapenaren inguruan azken urteetan egindako ikerketek
atazaren birformulazio eraginkorra eraikitzea posible dela erakutsi dute, inferentzia
testualaren atazarako birformulazioa, bereziki. Gure lan honetan, birformulazioaren
eraginkortasuna erakusten da adibide gutxitako eszenarioetan (few-shot) eta adibide
gabeko eszenarioetan (zero-shot) ere bai. Are eta garrantzitsuago, atazarako adibideak
anotatzeko estrategiak modeloaren erredimenduan eragin nabarmena duela erakusten da.
Adibide gutxi batzuk idaztean, emaitza enpirikoek erakusten dute hobe dela galderaren
aldeko aldagarritasuna handitzea, galdera bakoitzeko idatzitako erantzun-kopurua
murriztearen kostuari dagokionez, galdera gutxiagorekin eta erantzun gehiagorekin
idatzitako adibide-kopuru bera izatea baino. Idazteko estrategia honi jarraituz,
entrenamendu osoko datu-basearen %10a erabiliz artearen egoerako sistemen
errendimenduaren parekoa da, SciEntsBank domeinuko datu-basean. Azkenik,
Beetle eta SciEntsBank domeinuen gainean aurrera eramandako esperimentuek
domeinuz kanpoko galdera-erantzun adibide bikoteek errendimendurako mingarriak izan
daitezkeela erakutsi dute, beste domeinu batetik ataza ezagutzen duten sistemek ataza
ezagutzen ez dutenak baino emaitza apalagoak emateko joera dutela ondorioztatuz,
aztertutako domeinuetan behintzat
A Machine Learning Prediction of Automatic Text Based Assessment for Open and Distance Learning: A Review
In this systematic literature review, automatic text-based and easy type assessment
grading system using Machine Learning and Natural Language Processing (NLP)
techniques was investigated. The major focus is on text-based and essay type
assessment in ODL courses. Text-based and essay type questions is an important tool
for performing quality examination and assessment to help the students gain mastery
over the task and widen their horizon of knowledge and increase the learner’s
development and learning than, for instance subjective question type, single choice
question (SCQ), multiple choice question (MCQ) and true/false question type.
Automatic text-based and essay type assessment grading system can be used as an
important tool in ODL institutions, where assessment and examination can be quickly
and easily evaluated for the purpose of efficient feedback. We carried out this study
using quality, exclusion and inclusion criteria by selecting only studies that focuses on
NLP and Machine Learning techniques for automatic text-based and essay type
assessment grading task. Searches in ACM Digital Library, Semantic Scholar, Scopus,
IEEE Xplore, Google Scholar, Microsoft Academic, Learn Tech Library and Springer is
performed in order to retrieve important and relevant literature in this research
domain. Conference papers, journals and articles between the year 2011 and 2019 were
considered in this study. This study found 34 published articles describing automatic
text-based and essay type assessment and examination grading task out of a total of
1260 articles that met our search criteria
Recommended from our members
Modelling text meta-properties in automated text scoring for non-native English writing
Automated text scoring (ATS) is the task of automatically scoring a text based on some given grading criteria. This thesis focuses on ATS in the context of free-text writing exams aimed at learners of English as a foreign language (EFL). The benefit of an ATS system is primarily to provide instant and consistent feedback to language learners, and service reliability also forms a crucial part of an ATS system. Based on previous work, we investigated only partially explored meta-properties in text and integrated them into a machine learning based ATS model across multiple datasets:
In most previous work, the proposed models implicitly assume that texts produced by learners in an exam are written independently. However, this is not true for the exams where learners are required to compose multiple texts. We hence explicitly instructed our model which texts are written by the same learner, which boosts model performance in most cases.
We used three intra-exam properties within the same exam including prompt, genre and task as a starting point, and we showed that explicitly modelling these properties via frustratingly easy domain adaptation (FEDA) can positively affect model performance in some cases. Furthermore, modelling multiple intra-exam properties together is better than modelling any single property individually or no property in four out of five test sets.
We studied how to utilise and combine learners' responses from multiple writing exams. We also proposed a new variant of the transfer-learning ATS model which mitigates the drawbacks of previous work. This variant first builds a ranking model across multiple datasets via FEDA, and the ranking score of each text predicted by the ranking model is used as an extra feature in the baseline model. This variants gives improvement compared to a baseline model on the development sets in terms of root-mean-square error. Furthermore, the transfer-learning model utilising multiple datasets tuned on each development set is always better than the baseline model on the corresponding test set.
We found that different datasets favour different meta properties. We therefore combined all the models looking at different meta properties together using ensemble learning. Compared to the baseline model, the combined model has a statistically significant improvement on all the test sets in terms of root-mean-square error based on a permutation test.The Institute for Automated Language Teaching and Assessmen
GuavaNet: A deep neural network architecture for automatic sensory evaluation to predict degree of acceptability for Guava by a consumer
This thesis is divided into two parts:Part I: Analysis of Fruits, Vegetables, Cheese and Fish based on Image Processing using Computer Vision and Deep Learning: A Review. It consists of a comprehensive review of image processing, computer vision and deep learning techniques applied to carry out analysis of fruits, vegetables, cheese and fish.This part also serves as a literature review for Part II.Part II: GuavaNet: A deep neural network architecture for automatic sensory evaluation to predict degree of acceptability for Guava by a consumer. This part introduces to an end-to-end deep neural network architecture that can predict the degree of acceptability by the consumer for a guava based on sensory evaluation
Feature Space Augmentation: Improving Prediction Accuracy of Classical Problems in Cognitive Science and Computer Vison
The prediction accuracy in many classical problems across multiple domains has seen a rise since computational tools such as multi-layer neural nets and complex machine learning algorithms have become widely accessible to the research community. In this research, we take a step back and examine the feature space in two problems from very different domains. We show that novel augmentation to the feature space yields higher performance. Emotion Recognition in Adults from a Control Group: The objective is to quantify the emotional state of an individual at any time using data collected by wearable sensors. We define emotional state as a mixture of amusement, anger, disgust, fear, sadness, anxiety and neutral and their respective levels at any time. The generated model predicts an individual’s dominant state and generates an emotional spectrum, 1x7 vector indicating levels of each emotional state and anxiety. We present an iterative learning framework that alters the feature space uniquely to an individual’s emotion perception, and predicts the emotional state using the individual specific feature space. Hybrid Feature Space for Image Classification: The objective is to improve the accuracy of existing image recognition by leveraging text features from the images. As humans, we perceive objects using colors, dimensions, geometry and any textual information we can gather. Current image recognition algorithms rely exclusively on the first 3 and do not use the textual information. This study develops and tests an approach that trains a classifier on a hybrid text based feature space that has comparable accuracy to the state of the art CNN’s while being significantly inexpensive computationally. Moreover, when combined with CNN’S the approach yields a statistically significant boost in accuracy. Both models are validated using cross validation and holdout validation, and are evaluated against the state of the art
- …