155 research outputs found
A diverse user model in the context of an intelligent tutoring system.
No abstract available.The original print copy of this thesis may be available here: http://wizard.unbc.ca/record=b123727
Detecting grammatical errors with treebank-induced, probabilistic parsers
Today's grammar checkers often use hand-crafted rule systems that define acceptable language. The development of such rule systems is labour-intensive and has to be repeated for each language. At the same time, grammars automatically induced from syntactically annotated corpora (treebanks) are successfully employed in other applications, for example text understanding and machine translation. At first glance, treebank-induced grammars seem to be unsuitable for grammar checking as they massively over-generate and fail to reject ungrammatical input due to their high robustness. We present three new methods for judging the grammaticality of a sentence with probabilistic, treebank-induced grammars, demonstrating that such grammars can be successfully applied to automatically judge the grammaticality of an input string. Our best-performing method exploits the differences between parse results for grammars trained on grammatical and ungrammatical treebanks. The second approach builds an estimator of the probability of the most likely parse using grammatical training data that has previously been parsed and annotated with parse probabilities. If the estimated probability of an input sentence (whose grammaticality is to be judged by the system) is higher by a certain amount than the actual parse probability, the sentence is flagged as ungrammatical. The third approach extracts discriminative parse tree fragments in the form of CFG rules from parsed grammatical and ungrammatical corpora and trains a binary classifier to distinguish grammatical from ungrammatical sentences. The three approaches are evaluated on a large test set of grammatical and ungrammatical sentences. The ungrammatical test set is generated automatically by inserting common grammatical errors into the British National Corpus. The results are compared to two traditional approaches, one that uses a hand-crafted, discriminative grammar, the XLE ParGram English LFG, and one based on part-of-speech n-grams. In addition, the baseline methods and the new methods are combined in a machine learning-based framework, yielding further improvements
Corrective Feedback in the EFL Classroom: Grammar Checker vs. Teacher’s Feedback.
The aim of this doctoral thesis is to compare the feedback provided by the teacher to that obtained by the software called Grammar Checker on grammatical errors in the written production of English as a foreign language students. Traditionally, feedback has been considered as one of the three theoretical conditions for language learning (along with input and output) and, for this reason, extensive research has been carried out on who should provide it, when and the level of explicitness. However, there are far fewer studies that analyse the use of e-feedback programs as a complement or alternative to those offered by the teacher. Participants in our study were divided into two experimental groups and one control group, and three grammatical aspects that are usually susceptible to error in English students at B2 level were examined: prepositions, articles, and simple past-present/past perfect dichotomy. All participants had to write four essays. The first experimental group received feedback from the teacher and the second received it through the Grammar Checker program. The control group did not get feedback on the grammatical aspects of the analysis but on other linguistic forms not studied. The results obtained point, first of all, to the fact that the software did not mark grammatical errors in some cases. This means that students were unable to improve their written output in terms of linguistic accuracy after receiving feedback from the program. In contrast, students who received feedback from the teacher did improve, although the difference was not significant. Second, the two experimental groups outperformed the control group in the use of the grammatical forms under analysis. Thirdly, regardless of the feedback offered, the two groups showed improvement in the use of grammatical aspects in the long term, and finally, no differences in attitude towards the feedback received and its impact on the results were found in either of the experimental groups. Our results open up new lines for investigating corrective feedback in the English as a foreign language classroom, since more studies are needed that, on the one hand, influence the improvement of electronic feedback programs by making them more accurate and effective in the detection of errors. On the other hand, software such as Grammar Checker can be a complement to the daily practice of the foreign language teacher, helping in the first instance to correct common and recurring mistakes, even more so when our research has shown that attitudes towards this type of electronic feedback are positive and does not imply an intrusion into the classroom, thus helping in the acquisition of the English language.Programa de Doctorat en Llengües Aplicades, Literatura i Traducci
LLM-FuncMapper: Function Identification for Interpreting Complex Clauses in Building Codes via LLM
As a vital stage of automated rule checking (ARC), rule interpretation of
regulatory texts requires considerable effort. However, interpreting regulatory
clauses with implicit properties or complex computational logic is still
challenging due to the lack of domain knowledge and limited expressibility of
conventional logic representations. Thus, LLM-FuncMapper, an approach to
identifying predefined functions needed to interpret various regulatory clauses
based on the large language model (LLM), is proposed. First, by systematically
analysis of building codes, a series of atomic functions are defined to capture
shared computational logics of implicit properties and complex constraints,
creating a database of common blocks for interpreting regulatory clauses. Then,
a prompt template with the chain of thought is developed and further enhanced
with a classification-based tuning strategy, to enable common LLMs for
effective function identification. Finally, the proposed approach is validated
with statistical analysis, experiments, and proof of concept. Statistical
analysis reveals a long-tail distribution and high expressibility of the
developed function database, with which almost 100% of computer-processible
clauses can be interpreted and represented as computer-executable codes.
Experiments show that LLM-FuncMapper achieve promising results in identifying
relevant predefined functions for rule interpretation. Further proof of concept
in automated rule interpretation also demonstrates the possibility of
LLM-FuncMapper in interpreting complex regulatory clauses. To the best of our
knowledge, this study is the first attempt to introduce LLM for understanding
and interpreting complex regulatory clauses, which may shed light on further
adoption of LLM in the construction domain
Neural Machine Translation for Code Generation
Neural machine translation (NMT) methods developed for natural language
processing have been shown to be highly successful in automating translation
from one natural language to another. Recently, these NMT methods have been
adapted to the generation of program code. In NMT for code generation, the task
is to generate output source code that satisfies constraints expressed in the
input. In the literature, a variety of different input scenarios have been
explored, including generating code based on natural language description,
lower-level representations such as binary or assembly (neural decompilation),
partial representations of source code (code completion and repair), and source
code in another language (code translation). In this paper we survey the NMT
for code generation literature, cataloging the variety of methods that have
been explored according to input and output representations, model
architectures, optimization techniques used, data sets, and evaluation methods.
We discuss the limitations of existing methods and future research directionsComment: 33 pages, 1 figur
Character-based Neural Semantic Parsing
Humans and computers do not speak the same language. A lot of day-to-day tasks would be vastly more efficient if we could communicate with computers using natural language instead of relying on an interface. It is necessary, then, that the computer does not see a sentence as a collection of individual words, but instead can understand the deeper, compositional meaning of the sentence. A way to tackle this problem is to automatically assign a formal, structured meaning representation to each sentence, which are easy for computers to interpret. There have been quite a few attempts at this before, but these approaches were usually heavily reliant on predefined rules, word lists or representations of the syntax of the text. This made the general usage of these methods quite complicated. In this thesis we employ an algorithm that can learn to automatically assign meaning representations to texts, without using any such external resource. Specifically, we use a type of artificial neural network called a sequence-to-sequence model, in a process that is often referred to as deep learning. The devil is in the details, but we find that this type of algorithm can produce high quality meaning representations, with better performance than the more traditional methods. Moreover, a main finding of the thesis is that, counter intuitively, it is often better to represent the text as a sequence of individual characters, and not words. This is likely the case because it helps the model in dealing with spelling errors, unknown words and inflections
- …