205 research outputs found
Effect of Tuned Parameters on a LSA MCQ Answering Model
This paper presents the current state of a work in progress, whose objective
is to better understand the effects of factors that significantly influence the
performance of Latent Semantic Analysis (LSA). A difficult task, which consists
in answering (French) biology Multiple Choice Questions, is used to test the
semantic properties of the truncated singular space and to study the relative
influence of main parameters. A dedicated software has been designed to fine
tune the LSA semantic space for the Multiple Choice Questions task. With
optimal parameters, the performances of our simple model are quite surprisingly
equal or superior to those of 7th and 8th grades students. This indicates that
semantic spaces were quite good despite their low dimensions and the small
sizes of training data sets. Besides, we present an original entropy global
weighting of answers' terms of each question of the Multiple Choice Questions
which was necessary to achieve the model's success.Comment: 9 page
Generating Distractors for Reading Comprehension Questions from Real Examinations
We investigate the task of distractor generation for multiple choice reading
comprehension questions from examinations. In contrast to all previous works,
we do not aim at preparing words or short phrases distractors, instead, we
endeavor to generate longer and semantic-rich distractors which are closer to
distractors in real reading comprehension from examinations. Taking a reading
comprehension article, a pair of question and its correct option as input, our
goal is to generate several distractors which are somehow related to the
answer, consistent with the semantic context of the question and have some
trace in the article. We propose a hierarchical encoder-decoder framework with
static and dynamic attention mechanisms to tackle this task. Specifically, the
dynamic attention can combine sentence-level and word-level attention varying
at each recurrent time step to generate a more readable sequence. The static
attention is to modulate the dynamic attention not to focus on question
irrelevant sentences or sentences which contribute to the correct option. Our
proposed framework outperforms several strong baselines on the first prepared
distractor generation dataset of real reading comprehension questions. For
human evaluation, compared with those distractors generated by baselines, our
generated distractors are more functional to confuse the annotators.Comment: AAAI201
Co-Attention Hierarchical Network: Generating Coherent Long Distractors for Reading Comprehension
In reading comprehension, generating sentence-level distractors is a
significant task, which requires a deep understanding of the article and
question. The traditional entity-centered methods can only generate word-level
or phrase-level distractors. Although recently proposed neural-based methods
like sequence-to-sequence (Seq2Seq) model show great potential in generating
creative text, the previous neural methods for distractor generation ignore two
important aspects. First, they didn't model the interactions between the
article and question, making the generated distractors tend to be too general
or not relevant to question context. Second, they didn't emphasize the
relationship between the distractor and article, making the generated
distractors not semantically relevant to the article and thus fail to form a
set of meaningful options. To solve the first problem, we propose a
co-attention enhanced hierarchical architecture to better capture the
interactions between the article and question, thus guide the decoder to
generate more coherent distractors. To alleviate the second problem, we add an
additional semantic similarity loss to push the generated distractors more
relevant to the article. Experimental results show that our model outperforms
several strong baselines on automatic metrics, achieving state-of-the-art
performance. Further human evaluation indicates that our generated distractors
are more coherent and more educative compared with those distractors generated
by baselines.Comment: 8 pages, 3 figures. Accepted by AAAI202
Encyclopedic Memory: Long-Term Memory Capacity for Knowledge Vocabulary in Middle School
This article is a synthesis of unpublished and published experiments showing that elementary memory scores (words and pictures immediate recall; delayed recall, recognition), which are very sensitive to aging and in pharmacological protocols, have little or no correlation with school achievement. The alternative assumption developed is that school achievement strongly depends on the long-term memory of scholastic knowledge (history, literature, sciences, maths, etc), called encyclopedic memory.A longitudinal study from the grade 6 to the grade 9 of a cohort of eight classes of a French college, was undertaken in order to observe the implication of the encyclopedic vocabulary (i.e. Julius Caesar, Manhattan, Shangaï, Uranus, vector) in school performance. An inventory in the school textbooks gives approximately 6000 encyclopedic words in grade 6, to 24000 in grade 9. The encyclopedic storage capacity was estimated at the end of each year by a multiple-choice questionnaire with random samples of words (800 items; 8 subject subjects). The results show an estimation of 2500 words acquired at the end of grade 6, to 17000 at the end of the grade 9. The correlations are from .61 to .72 between the score of encyclopedic memory and the average school grades
Learning to Reuse Distractors to support Multiple Choice Question Generation in Education
Multiple choice questions (MCQs) are widely used in digital learning systems,
as they allow for automating the assessment process. However, due to the
increased digital literacy of students and the advent of social media
platforms, MCQ tests are widely shared online, and teachers are continuously
challenged to create new questions, which is an expensive and time-consuming
task. A particularly sensitive aspect of MCQ creation is to devise relevant
distractors, i.e., wrong answers that are not easily identifiable as being
wrong. This paper studies how a large existing set of manually created answers
and distractors for questions over a variety of domains, subjects, and
languages can be leveraged to help teachers in creating new MCQs, by the smart
reuse of existing distractors. We built several data-driven models based on
context-aware question and distractor representations, and compared them with
static feature-based models. The proposed models are evaluated with automated
metrics and in a realistic user test with teachers. Both automatic and human
evaluations indicate that context-aware models consistently outperform a static
feature-based approach. For our best-performing context-aware model, on average
3 distractors out of the 10 shown to teachers were rated as high-quality
distractors. We create a performance benchmark, and make it public, to enable
comparison between different approaches and to introduce a more standardized
evaluation of the task. The benchmark contains a test of 298 educational
questions covering multiple subjects & languages and a 77k multilingual pool of
distractor vocabulary for future research.Comment: 24 pages and 4 figures Accepted for publication in IEEE Transactions
on Learning technologie
Automatic Distractor Generation for Multiple Choice Questions in Standard Tests
To assess the knowledge proficiency of a learner, multiple choice question is
an efficient and widespread form in standard tests. However, the composition of
the multiple choice question, especially the construction of distractors is
quite challenging. The distractors are required to both incorrect and plausible
enough to confuse the learners who did not master the knowledge. Currently, the
distractors are generated by domain experts which are both expensive and
time-consuming. This urges the emergence of automatic distractor generation,
which can benefit various standard tests in a wide range of domains. In this
paper, we propose a question and answer guided distractor generation (EDGE)
framework to automate distractor generation. EDGE consists of three major
modules: (1) the Reforming Question Module and the Reforming Passage Module
apply gate layers to guarantee the inherent incorrectness of the generated
distractors; (2) the Distractor Generator Module applies attention mechanism to
control the level of plausibility. Experimental results on a large-scale public
dataset demonstrate that our model significantly outperforms existing models
and achieves a new state-of-the-art.Comment: accepted by COLING202
Comparative Study of Different Techniques for Automatic Evaluation of English Text Essays
 Automated essay evaluation keeps to attract a lot of interest because of its educational and commercial importance as well as the related research challenges in the natural language processing field. Automated essay evaluation has the feature of halves, less cost of human resource, and gives the results directly and timely feedback compared with the human evaluator which requires more time and it depends on his /her mood at certain times. This paper has focused on automated evaluation of English text which was performed using various algorithms and techniques by making comparison between these techniques that applied with different size of dataset and length essays as well as the performance of algorithms was assessed using different metrics. The results uncovered that the performance of each technique has affected by the size of dataset and the length of essays. Finally, for future research directions building a standard dataset containing different types of question-answer pair to be able to compare the performance of different techniques fairly
Biomedical knowledge graph-enhanced prompt generation for large language models
Large Language Models (LLMs) have been driving progress in AI at an
unprecedented rate, yet still face challenges in knowledge-intensive domains
like biomedicine. Solutions such as pre-training and domain-specific
fine-tuning add substantial computational overhead, and the latter require
domain-expertise. External knowledge infusion is task-specific and requires
model training. Here, we introduce a task-agnostic Knowledge Graph-based
Retrieval Augmented Generation (KG-RAG) framework by leveraging the massive
biomedical KG SPOKE with LLMs such as Llama-2-13b, GPT-3.5-Turbo and GPT-4, to
generate meaningful biomedical text rooted in established knowledge. KG-RAG
consistently enhanced the performance of LLMs across various prompt types,
including one-hop and two-hop prompts, drug repurposing queries, biomedical
true/false questions, and multiple-choice questions (MCQ). Notably, KG-RAG
provides a remarkable 71% boost in the performance of the Llama-2 model on the
challenging MCQ dataset, demonstrating the framework's capacity to empower
open-source models with fewer parameters for domain-specific questions.
Furthermore, KG-RAG enhanced the performance of proprietary GPT models, such as
GPT-3.5 which exhibited improvement over GPT-4 in context utilization on MCQ
data. Our approach was also able to address drug repurposing questions,
returning meaningful repurposing suggestions. In summary, the proposed
framework combines explicit and implicit knowledge of KG and LLM, respectively,
in an optimized fashion, thus enhancing the adaptability of general-purpose
LLMs to tackle domain-specific questions in a unified framework.Comment: 28 pages, 5 figures, 2 tables, 1 supplementary fil
Experiments in neural question answering
In this thesis, we apply deep learning methods to tackle the tasks of finding duplicate questions, learning to rank the answers of a Multiple Choice Question (MCQ) and classifying the answers to a question coming from the context of a paragraph. We draw our attention toward the problems related to sentence-sentence similarity. We used siamese architecture for better representation of the question and answers. The basic aim of all the methods proposed in this thesis is to build the word embeddings of question and answers and feed them to a deep neural architecture. We used several such architectures like Long-Short term memory (LSTM) and Convolutional Neural Network (CNN). We have also implemented an attention mechanism to put more focus on the sentence-word relationship. Our goal was to extract a refined representation of the question and answers through different combination of these deep learning techniques. We generated a representation of a sentence according to the context of another sentence for solving our tasks. We provide some simple but efficient deep learning models to solve our tasks. As neural models are data-driven, we train our model extensively by making pairs such as question-question and question-answer over a large-scale real-life dataset. We used three different datasets to solve our three different tasks. The Quora dataset of several question-answer pairs was used for the task of finding duplicate question. The OpenTriviaQA question answering dataset for the ranking of multiple answers. Lastly, we use the SQuAD dataset for the answer classification of reading comprehension task. We evaluate our models based on metrics like accuracy, precision, recall, F1 scores. Our methods and experiments demonstrate some significant improvements over the state-of-the-art methods
- …